Large Filesystem backups-over 7 million files

jethro66

ADSM.ORG Member
Joined
Aug 4, 2007
Messages
14
Reaction score
0
Points
0
We are having trouble backing up a large NTFS san lun of about 1.5 TB but its the very large number of objects that is killing backups not the size per se.. To date, the only way I can get an incremental job to complete is running with MEMORYEFFICIENTBACKUP set to diskcachemethod which takes over 12 hours to run and pegs the CPU. Dsmc always crashes with an operating system refused to allocate more memory message after about 30 mins of grinding on the filesystem, if MEMORYEFFICIENTBACKUP is set to no or yes. This is a clustered system so I'm not really wanting to do alot of rework on this due to the clustered (X2) setup but any help is appreciated. The obvious solution would be to setup multiple nodes and multiple schedules (divide and conquer) but I was hoping for a lazy way out :).
 
I would probably try the journal based backup as the engine will scan the files all day, work out what has changed then back them up during the backup window.Instead of scanning the whole lot during the backup windows which is too much work.
 
I have a filesystem of 30 million files, about 1.2 TB. You're going to want to setup journaling. Without it, my backup would take 15-18 hours. With journaling, it takes 20-30 minutes.
 
Thanks for the replies. We are using journaling but the problem is the machine runs out of ram unless we force disk caching for the pre-backup processing--which is obviously much slower. It could be worse, taking around six hours here, I was just hoping that doubling the physical ram to 4 GB recently would have gotten me past this issue. The next step is obviously moving this filesystem to Win 64 server or split it up into different LUNs to accomodate TSM/Windows memory limits.

Here is the output from the sched log:
05/27/2008 01:15:45 --- SCHEDULEREC STATUS BEGIN
05/27/2008 01:15:45 Total number of objects inspected: 9,758,936
05/27/2008 01:15:45 Total number of objects backed up: 4,092
05/27/2008 01:15:45 Total number of objects updated: 1
05/27/2008 01:15:45 Total number of objects rebound: 0
05/27/2008 01:15:45 Total number of objects deleted: 0
05/27/2008 01:15:45 Total number of objects expired: 66
05/27/2008 01:15:45 Total number of objects failed: 0
05/27/2008 01:15:45 Total number of subfile objects: 0
05/27/2008 01:15:45 Total number of bytes transferred: 17.86 GB
05/27/2008 01:15:45 Data transfer time: 593.18 sec
05/27/2008 01:15:45 Network data transfer rate: 31,586.56 KB/sec
05/27/2008 01:15:45 Aggregate data transfer rate: 1,011.09 KB/sec
05/27/2008 01:15:45 Objects compressed by: 0%
05/27/2008 01:15:45 Subfile objects reduced by: 0%
05/27/2008 01:15:45 Elapsed processing time: 05:08:51
05/27/2008 01:15:45 --- SCHEDULEREC STATUS END
05/27/2008 01:15:45 --- SCHEDULEREC OBJECT END DAILY2000 05/26/2008 20:00:00
05/27/2008 01:15:45 Scheduled event 'DAILY2000' completed successfully.
05/27/2008 01:15:45 Sending results for scheduled event 'DAILY2000'.
05/27/2008 01:15:45 Results sent to server for scheduled event 'DAILY2000'.
 
You can also split the backup in several parts.
If there are several directories at root level create a schedule per directories, this might overcome the memory limitations.
Scanning the files is something that only can be overcome by implementing journaling.
 
Here is the output from the sched log:
05/27/2008 01:15:45 --- SCHEDULEREC STATUS BEGIN
05/27/2008 01:15:45 Total number of objects inspected: 9,758,936
05/27/2008 01:15:45 Total number of objects backed up: 4,092


Do you have any entries in your jbberror.log? It acts like it's not using journaling, though you have it turned on. Your number of objects inspected appears to be all objects, which would be odd unless all of those are changing everyday, and that would be unlikely.

In my 30 million file drive, I get this in dsmsched.log.

05/27/2008 23:53:41 Total number of objects inspected: 199,981
05/27/2008 23:53:41 Total number of objects backed up: 7,643

I am guessing that you have something in your jbberror.log that's showing a journaling issue. Post some of that log and also some of the dsmerror.log. Before you go into drastic measures changing your configuration, let's look at some of this logging info.
 
Last edited:
Hi,

Like said before you need to use journaling.

Seems like me a few weeks ago you're limited by a 32 bits OS.
Before switching to Windows 2003 64 bits I tried using the 3Go switch which permits a process to use 3GB of ram (instead of 2 by default) but allows only 1GB for the kernel (instead of 2 by default).

You can try to backup in multiple parts (like said before) or faster : run multiple dsmc processes (if you have multiple filespaces one by filespace should be cool) but this time you risk to reach the kernel paged and/or non paged memory limits.

So good luck, 32 bits OS memory limitations suck a lot :/

PS : in a few years journaling should be less useful with SSD ;)
 
Last edited:
Mate are these files changing or is it an image system with lots of files that never ever change? We had a system with multiple filesystems each with millions and millions of files. We limited the size of each filesystem and after backing up, performed a backupset and excluded from future backups. If it is changing each day, then the issues change and you an look at using journalling or even tarring the filesystem
 
I'm pretty sure journaling either is not working at all or it doesn't work with file caching enabled for the pre-processing enumeration. I did a little reading and found I need to add
JournalPipe=\\.\pipe\jnlService2
to run two journal services on a cluster...will try now and post my results.
 
I figured out two problems with Journaling, one was adding the second pipe statement in jdbb.ini, the second more elusive one was adding a journalpipe entry pointing the name of the pipe in the dsm.opt for the clustered scheduler. Will let it build tonight and post results.
 
Got the journal built, what a huge difference. The backup scan is no longer an issue and no more out-of-memory issues. That whole filesystem now backups in just a few seconds ! :)

Edit:would like to point out GregE was correct, the "objects inspected" was the giveaway that journaling wasn't working. Its a very small number now. Thanks Greg.
 
Last edited:
Cool deal. I know how much of a pain it is to have it run thru the entire filesystem on backup. I've had a situation on my 30 million file drive where the image backup failed, corrupted the journal, then when the daily file backup took place the first time after that, it had to build the journal new again. What a pain. Glad you got it working.
 
Back
Top