ADSM-L

Re: [ADSM-L] 2 Windows 2003 clients with huge # of files consistently failing

2009-01-16 09:07:08
Subject: Re: [ADSM-L] 2 Windows 2003 clients with huge # of files consistently failing
From: "Staubach, Justin (OFT)" <Justin.Staubach AT OFT.STATE.NY DOT US>
To: ADSM-L AT VM.MARIST DOT EDU
Date: Fri, 16 Jan 2009 09:04:09 -0500
John,

Perhaps you could get creative here...

Assuming there is a way of grouping files by date...  Let's say all files are 
saved to:
D:\20090114\...
D:\20090115\...
D:\20090116\...
Etc..

You could at the cost of some disk space do the following:
1. Run a batch/script before the backup window which tars/zips that day's files 
and saves them to some backup directory, and also cleans up old tars/zips in 
the backup directory
2. Exclude the directories where your files reside
2. Let the backup run and backup the backup directory

Kind of treat it like you would dumping a db to a flat file so it can be backed 
up.

Then the restore will be such that you restore the backup directory and then 
have an additional step to unzip/untar.

Hope this helps to give you another way of looking at the problem.

Justin


-----Original Message-----
From: ADSM: Dist Stor Manager [mailto:ADSM-L AT VM.MARIST DOT EDU] On Behalf Of 
John C Dury
Sent: Friday, January 16, 2009 8:45 AM
To: ADSM-L AT VM.MARIST DOT EDU
Subject: Re: [ADSM-L] 2 Windows 2003 clients with huge # of files consistently 
failing

Basically the app is recording all calls into our customer service reps
(CSR) via WAV and AVI (for screen activity). The files are all very small
but there are alot of them and because there are about 100 CSRs, they
create a new directory for each call. As an example, on Jan 14 2009, there
were 592 AVI and 592 WAV files but they live in in 4043 directories.
Ridiculous! Unfortunately as per regulations, we need to keep the call
data. I've contact Verint (maker of Ultra, the app) to see if they have
alternatives to how they're data is stored. I can't imagine ever having to
restore this server, ever. I'll go back and start researching image backups
again, although I couldn't get them to work the first time. It backed up
about 12G and then just hung and never progressed any further. No errors
anywhere I looked (actlog,event viewer,tsmerror.log etc).
I also thought about possibly using Tivoli Continuos Data Protection (CDP).
Think that is an option?
Thanks for all your help and ideas,
John


ANR0481W Session 16603 for node <SERVERNAME> (WinNT) terminated
- client did not respond within 9000 seconds. (SESSION: 16603)


If TSM is struggling to get through the directories, then applications
associated with the data may be suffering the same problem.  This may
be the result of indifferent directory layout (far too many files in
directories) or disk hardware issue or contention or file system issue
(where chkdsk or equivalent might be run).  The hardware may simply be
underpowered for the amount of data involved (e.g., 5400 rpm disks or
perhaps older ATA pathing).  Or the file system type may be an
inefficient choice.  Large-scale data deployments cry out for a
knowledgeable data architect in order to be successful and to scale -
and that skill is often absent.

The owners of the data should be strongly advised to regard the backup
problem as a proportional indication of how very painful a file-
oriented restoral would be, where reconstructing Windows directory
entries is notoriously time-consuming.

   Richard Sims


I have two separate Windows 2003 boxes both running running  v5.5.1.10
client that are both failing their incrementals every night. Both of
these
boxes have hundreds of thousand of files all spread into multiple
directories. In fact, each day, a new directory is created and then
multiple subdirectories are created under it and thousand of files
in each
of those subdirectories. The reason I say this is because I don't
think it
is a candidate for multiple virtual nodes because of the new
directories
that are created every day.



I do have journaling turned on although it doesn't seem to help with
the
large number of files either as when I run an incremental
manually,it takes
forever and never seems to finish.



are you sure that the journal is running and has enough space? In
these cases, having the journals on a separate filesystem might be a
very good idea. I have the feeling that there is not enough space for
the TSM journal database...


I thought about doing image backups of the drive where the thousands
of
files live but when I tried it, it backed up about 14g and then just
hung
and never continued. I had to cancel it after waiting for an hour or
so.



and to what type of storage do these images go? I'd think that in case
of an image backup you'd want a management class that makes them go
directly to tape. My guess is that these were going to disk volumes?



What is my best strategy for dealing with these two boxes that are
generating thousands of new files in new directories every day? The
huge
number of objects in the TSM DB are starting to cause quite a few
problems
with daily processing also as expiration is running longer and
longer since
I think it is choking on the number of objects.



I'd say that image backups are a good idea in cases of very active
filesystems. Filesystems on windows with huge numbers of files are
always a cause of problems, not only with TSM.



And to make it even weirder, they both fail incrementals at night
and the
only error I can find is:

ANR0481W Session 16603 for node <SERVERNAME> (WinNT) terminated
- client did not respond within 9000 seconds. (SESSION:
16603)



meaning that indeed the client is indeed choking on the size of the
directories.


I'm starting to think that TSM is just not the backup solution for
either
of these boxes.



I'm also thinking that if you have a piece of software creating 1000's
of files per day in a filesystem, that this is a very big workload.
I'm very sure that with VSS snapshots and image backups, you are on
the right track and no other product could do a better job of backing
up these filesystems.


This e-mail, including any attachments, may be confidential, privileged or 
otherwise legally protected. It is intended only for the addressee. If you 
received this e-mail in error or from someone who was not authorized to send it 
to you, do not disseminate, copy or otherwise use this e-mail or its 
attachments.  Please notify the sender immediately by reply e-mail and delete 
the e-mail from your system.