Exchange 2010, full backup of 970 Gbytes takes ~60 hours

FloydATC

ADSM.ORG Member
Joined
May 11, 2010
Messages
21
Reaction score
0
Points
0
Location
Fet, Norway
PREDATAR Control23

Exchange 2010 with 2 HUBCAS servers + 2 DB servers, each running in Windows 2008 R2. HUBCAS servers have 2 virtual CPUs and 12 GBytes RAM, DB servers have 8 virtual CPUs and 32 GBytes RAM. VMware ESXi 5.1 with 2 out of 15 hosts practically dedicated to Exchange during weekends. (Each host has 12 cores and 256 Gbytes RAM)

26 databases, total of 970 Gbytes, distributed 50/50 but all backed up on one DB server*
No FCM or other fancy tricks.
Running TDP version 6.3.0.2 and BA version 6.3.0.14

Full backup starts on fridays at 17:00 and doesn't finish until 03:00-06:00 on monday morning. That's about 60 hours, or just over 16 Gbytes per hour. Gigabit network with very little utilization, TSM server is on the same IP subnet so there's no L3 routing or firewalling involved.

TSM server running version 6.3.3.200 is specced with dual quad core xeon and 32 Gbytes RAM. The TSM database is just over 150 Gbytes and is on SSD. STGpools are on disk, RAID6 with a total of 48 SATA drives minus spares, connected via FC. The TSM server also backs up just under 150 clients, but those are incremental only and finish well within their 2 hour nightly window, to STGpools on another disk system.

I have a hard time seeing where the bottleneck is. All servers and network components are monitored and show no congestions/utilization spikes. I have tried tuning the TCP options in TDP without seeing any real improvement. (Difference is within 2-3 hours)

The exact command used to run the full backup is
Code:
tdpexcc backup * full /skipintegritycheck /tsmoptfile=dsm.opt /logfile=excsch.log
Config files are shown below.


My question is this: Would it be a good idea to try running the 26 backups in paralell rather than sequentially? I'm thinking something along the lines of 26 concurrent TDP sessions, each doing one database. Or 13 sessions, each doing two databases. I imagine I would have to increase the resourceutilization on the client and mountpoints on the server, but are there any obvious problems with such an approach?

*) Yes, I have considered running the backup on both DB servers, this might cut the backup time in half but the current setup means if one DB server fails, backup on the other DB server can be activated by simply changing a Windows service from "disabled" to "automatic". If we have to rely on both servers to run the backup then we would no longer have a redundant setup. Sort of. It's not that we absolutely can't, but I would prefer not to.


tdpexc.cfg:

Code:
LASTPRUNEDate     10/14/2013 09:48:33
MOUNTWait     Yes
BACKUPMETHod     VSS
TEMPLOGRestorepath     D:\Restore_temp
TEMPDBRestorepath     D:\Restore_temp
LOGFile     tdpexc.log
LOGPrune     60
DATEformat     1
TIMEformat     1
LANGuage     ENU
VSSPOLICY * * * TSM EXCHANGE 
BACKUPDESTination     TSM
LOCALDSMAgentnode     INT-EXCHDB-01
REMOTEDSMAgentnode

dsm.opt:

Code:
NODename          INT-EXCHDB-01_EXCH
CLUSTERnode       no
COMPRESSIon       Off
PASSWORDAccess    Generate
DATEformat 1
COMMMethod        TCPip
TCPPort           1500
TCPServeraddress  10.80.2.30

*TCPWindowsize     63
*TCPBuffSize       32
*TCPWindowsize     255
*TCPBuffSize       127
TCPWindowsize      2048 
TCPBuffSize        512
TCPNoDelay         yes

SCHEDMODE             Polling
SCHEDLOGRetention     14
ERRORLOGRetention    14
MANAGEDSERVICES WEBCLIENT SCHEDULE
 
PREDATAR Control23

I have run into the same issue trying to backup about 6TB of databases. Even though Microsoft recommends using SATA with Exchange 2010, they forget to mention that their new DB model greatly slows down the performance for backups. The issue is explained well here - http://www.virtualtothecore.com/en/...-requirements-and-vsphere-vadp-based-backups/

To speed things up I ended up creating a 4 parallel jobs. It still takes almost 2 days for a full backup to complete, but its better than 4 days. My plan in the future it to move to SAS. SATA works great for smaller databases.
 
PREDATAR Control23

Thanks for sharing :)

After the OP, I have experimented with paralell backups and found that the sweet spot was indeed at 4 jobs in paralell, yielding just over 4 x combined throughput. At that point, I have a known bottleneck; aggregated 1 gig ethernet links. Streams between two IP specific addresses will never use more than one physical link.
 
PREDATAR Control23

I had the same problem with a big Exchange server. We ended up splitting the Exchange database in smaller chunks like 250 GB and created a 7 day rotation schedules like this:

Monday: Full for DB1, DB2, DB3, DB4, incr for the rest
Tuesday Full for DB5, DB6, DB7, DB8, incr for the rest
etc.

Still wasn't fast enough. Like Emerman, after running instrument traces, I found out that most of the time, the Exchange server disks, which were SATA, were the bottleneck.
 
Last edited:
Top