Running exchange server version 5.5 service pac 3 With NBU 3.4 patched to
645 patch level
Problem summary:
When performing individual mailbox backups on a large information
store, the
throughput decreases to the point they run too slowly to be practical.
On a
smaller information store, the throughput is slow, but it remains high
enough to be usable.
Throughput on these same two clients is absolutely fine during normal
backups. Following are some sample throughput averages:
Job Type SmallExchangeServer BigExchangeServer
IS/DS backup 3.6 meg/sec 3.1 meg/sec
NT file system backup 2.2 meg/sec 1.9 meg/sec
HKLM backup 1.3 meg/sec 1.3 meg/sec
Mailbox backup 700 k/sec 350 k/sec
It should be noted that all the results are obtained from backups
during
off-peak hours. There are generally other backup jobs which utilize
the
same library and master server which run during all but the
BigExchangeServer mailbox backup, so the above throughputs for
everything
but the mailbox backups of BigExchangeServer would be quicker if they
were
running alone. The information store on SmallExchangeServer is 4 gig,
and
the information store on BigExchangeServer is just over 26 gig.
Both the backup server and BigExchangeServer have two dual-port NICs
(forced
100/full) and have an ether-channeled bandwidth of 400 mb/sec.
SmallExchangeServer is a single NIC set at 100/full.
General mailbox backup behavior:
Mailbox jobs on BigExchangeServer start out at roughly 700 k/sec;
however,
they slow down to 350 k/second (and at times less) as the backup
progresses.
I also have problems with timeouts. I've bumped both the client and
server
read timeouts up to 15 minutes, but have continued to experience
timeout
problems. They seem to manifest themselves more frequently on larger
mailboxes. I just bumped both timeout levels up to 20 minutes;
however, I
will need to wait until next weekend to test this setting.
Mailbox jobs on SmallExchangeServer start out over 1 meg/sec and
eventually
slow to 700 k/sec. I have not experienced timeouts on this server,
and have
kept it at the default timeout setting of 5 minutes.
Troubleshooting:
One suggestion I was given was to use multiple data streams. The
problem
with this is that multiple data streams will not work with the
standard job
directive of "Microsoft Exchange Mailboxes:\". I have not found a
wildcard
method, such as "back up all mailboxes beginning with 'a'" that will
work to
split the job up in a reasonable manner. Thus, the only way to get
multiple
data streams to work is to reference each mailbox individually as a
separate
line item in the directive list and intersperse that with new stream
directives. This isn't practical with over 400 mailboxes!
Unfortunately, due to the excessively slow throughput on the large
information store's mailbox backups, I cannot perform a complete
mailbox
backup in a reasonable period of time. It would take over three days
at the
average rate of 350 k/sec for this job to complete, which means I
can't even
set it up for the weekend. I also cannot perform incrementals on
mailboxes,
per the Admin Guide. Thus, I have no options for getting a complete
mailbox
backup on this server, other than defining each mailbox individually.
It should be noted that when multiple data streams are employed by
providing
specific directives, the overall throughput does increase somewhat.
Two of
the largest mailboxes which, when backed up by themselves, have a
throughput
of roughly 700 k/sec on BigExchangeServer have a combined throughput
(via
concurrent data streams) of 950 k/sec (not a huge improvement, but
enough to
show that the server itself it capable of handling more). I've also
run
perfmon on that server during backups to ensure it isn't being pegged
too
hard, and everything is running under acceptable levels.
Backups of a handful of mailboxes at a time can result in much better
throughput on both servers (for obvious reasons); however, anytime the
amount of data increases, the overall throughput decreases
significantly. I
do understand that mailbox backups access the database in such a way
that
they will be slower than IS/DS backups, and I also understand the
reasoning
behind why the slowness gets worse the larger the amount of data
backed up.
My questions are not in regard to why this is happening, but rather,
why the
slowness is so severe. Another question would be what Veritas's
comments
are on what throughput that should be expected with what size
information
store. Is what I'm seeing standard, or is it below normal? It seems
as
though I shouldn't be running into limitations with only a 26 gig
information store, as this really isn't excessively large for an IT
shop.
Thanks in advance for any help you can provide.
Fdenn AT cranel DOT com
|