http://danblee.com/exchange-2013-mail-stuck-in-outbox-or-drafts-have-to-restart-transport-service-to-resume-mail-flow/
Oh man, this was a tough one. A real doozy. First, here are my environment details where this was happening.
- MS Exchange 2013 CU6 (Issue was happening as far back as CU4)
- One Exchange Server managing all Exchange services and responsibilities
- Less than 50 mailboxes
Overview
Basically, every few days we’d get a call in the middle of the day from customers telling us that mail was being stuck in the Outbox in Outlook and mail would be put into the drafts folder in OWA. At first, we found that rebooting the server fixed the issue. Later, we learned that restarting the Topology Service (The service that will restart all of your Exchange Services) fixed the issue. Finally, we narrowed it down the restarting just the Transport Service. Once the Transport Service restarted, all mail would push from the Outboxes and mail would work again for a few days.
There were no logged errors on the Transport Service. The Transport Service was always On, never stopped.
Resolution
Adjust or disable Resource Pressure Monitor.
What is the Resource Pressure Monitor?
In perfect form, the resource pressure monitor collects information about your environment and will delay mail (called “tarpitting”) until things get back to normal. If the severity of resource usage gets too high, it will actually STOP mail flow of all kinds. This is what was happening to us. If your system is unhappy with the amount of resources available, you’ll have to restart the Transport Service to reduce the monitor’s severity level.
It wasn’t until we found this warning that we noticed that this was going on:
For my Exchange environment, Bucket Versions were getting to too high a level. So look at all the things that it stopped:
The resource pressure increased from Medium to High.
The following resources are under pressure:
Version buckets = 332 [High] [Normal=80 Medium=120 High=200]
The following components are disabled due to back pressure:
Inbound mail submission from Hub Transport servers
Inbound mail submission from the Internet
Mail submission from Pickup directory
Mail submission from Replay directory
Mail submission from Mailbox server
Mail delivery to remote domains
Content aggregation
Mail resubmission from the Message Resubmission component.
Mail resubmission from the Shadow Redundancy Component
The following resources are in normal state:
Queue database and disk space (“C:\Program Files\Microsoft\Exchange Server\V15\TransportRoles\data\Queue\mail.que”) = 71% [Normal] [Normal=95% Medium=97% High=99%]
Queue database logging disk space (“C:\Program Files\Microsoft\Exchange Server\V15\TransportRoles\data\Queue\”) = 74% [Normal] [Normal=95% Medium=97% High=99%]
Private bytes = 3% [Normal] [Normal=71% Medium=73% High=75%]
Physical memory load = 61% [limit is 94% to start dehydrating messages.]
Submission Queue = 0 [Normal] [Normal=2000 Medium=4000 High=10000]
Temporary Storage disk space (“C:\Program Files\Microsoft\Exchange Server\V15\TransportRoles\data\Temp”) = 74% [Normal] [Normal=95% Medium=97% High=99%]
How to Disable the Back Pressure Resource Monitor
Luckily, there’s a handy config file that can be adjusted:
- Open the config file for the executable here: %ExchangeInstallPath%Bin\EdgeTransport.exe.config
- Find the entry called “EnableResourceMonitoring” and change the value to “false”
- Restart the Transport Service during your normal maintenance window.\
- Breathe
A small disclaimer here: You may want to actually look into why your Exchange environment thinks it’s having trouble with resources. An easier fix to all of this may be to give your machine more disk space or RAM, etc. Here’s a screenshot of the config file entry:
Some things that did NOT work
Here are some of the thing that I tried before finding this issue:
- Disabling Malware Detection on the connector
- Adding the servername and the domain controller name to the hosts file
- Updating to CU6 entirely
Please let me know if there are other things that you have tried that aren’t working. Also, please let me know if this fixes the issue for you. The error you get will tell you exactly what needs to be adjusted, so as I said earlier, you may want to fix the issue instead of just disabling this feature. My environment is small, so this service is not needed.
Microsoft’s Explanation:
Back pressure is a system resource monitoring feature of the Microsoft Exchange Transport service that exists on Microsoft Exchange 2013 Mailbox servers and Edge Transport servers.
Exchange can detect when vital resources, such as available hard drive space and memory, are under pressure, and take action in an attempt to prevent service unavailability. Back pressure prevents the system resources from being completely overwhelmed, and the Exchange server tries to process the existing messages before accepting any new messages. When utilization of the system resource returns to a normal level, the Exchange server gradually resumes normal operation and starts accepting new messages again.
In Exchange 2013, when the Transport service on a Mailbox server or an Edge Transport server is under resource pressure, incoming connections are accepted, but incoming messages over those connections are either accepted at a slower rate or are rejected. When an SMTP host attempts to connect to an Exchange server that’s under resurce pressure, the connection will succeed. However, when the host issues the MAIL FROM command to submit a message, depending on the resource that’s under pressure, the Transport service either delays the acknowledgement of the MAIL FROM command or rejects the connection.
For my issue, the Bucket Version was building up:
A list of changes that are made to the message queue database is kept in memory until those changes can be committed to a transaction log. Then the list is committed to the message queue database itself. These outstanding message queue database transactions that are kept in memory are known as version buckets. The number of version buckets may increase to unacceptably high levels because of an unexpectedly high volume of incoming messages, spam attacks, problems with the message queue database integrity, or hard drive performance.
When Exchange starts receiving messages, these messages are grouped together in batches and then prepared as version buckets. If an incoming message has a large attachment, it can be separated into multiple batches. These batches that are being processed are known as batch points. The number of outstanding batch points can exceed the set thresholds, especially when there’s an unexpectedly high volume of incoming messages with large attachments.
When version buckets or batch points are under pressure, the Exchange server will start throttling incoming connections by delaying acknowledgement to incoming messages. Exchange will reduce the rate of inbound message flow by tarpitting, which introduces a delay to the MAIL FROM commands. If the resource pressure condition continues, Exchange will gradually increase the tarpitting delay. After the resource utilization returns to normal, Exchange will gradually start reducing the acknowledgement delay and ease into normal operation. By default, Exchange will start delaying message acknowledgements 10 seconds when under resource pressure. If the resources continue to be under pressure, the delay is increased in 5-second increments up to 55 seconds.
Exchange keeps a history of version bucket and batch point resource utilization. If the resource utilization doesn’t go down to normal level for a specific number of polling intervals, known as the history depth, Exchange will stop the tarpitting delay and start rejecting incoming messages until the resource utilization goes back to normal. By default, the history depths for version buckets and batch points are in 10 and 300 polling intervals respectively.
Mail will STOP flowing when the severity reaches high. No matter what the issue, the mail flowing stopping seems to be the same:
- Reject incoming messages from other Exchange servers
- Reject message submissions from mailbox databases by the Mailbox Transport Submission service on Mailbox servers
- Reject incoming messages from non-Exchange servers
- Reject message submissions from Pickup and Replay directories
So please, please let me know if you have experienced the same trauma. Hopefully Exchange will address this or at least make it more apparent to end users in the future.
Cheers!