Re: [ADSM-L] Slow backup

On 08/06/2012 04:12 PM, Arbogast, Warren K wrote:

There is a Linux fileserver here that serves web content. It has 21 million 
files in one filesystem named /ip. There are over 4,500 directories at the 
second level of the filesystem. The server is running the 6.3.0.0 client, and 
has 2 virtual cpus and 16 GB of RAM.  Resourceutilization is set to 10, and 
currently there are six client sessions running.  I am looking for ways to 
accelerate the backup of this server since currently it never ends.

The filesystem is NFS mounted so a journal based backup won't
work. Recently, we added four proxy agents, and are splitting up the
one big filesystem among them using include/exclude statements. Here
is one of the agent's include/exclude files.

exclude /ip/[g-z]*/.../*
include /ip/[a-f]*/.../*


... You say "proxy agents", but it's not clear what you mean by this.

__Since we added the proxies the proxy backups are copying many
thousands of files, as if this were the first backup of the server
as a whole. Is that expected behavior?


I see two possible arrangements you might have implemented.

Possibility one: where you had BIGFS-NODE which was taking a long
time, you now have BIGFS-NODEAF with the config above, and
BIGFS-NODEGL which is defined to include G through L, etc.

In this case, each BIGFS subnode will need to re-back-up its initial
incremental, and thereafter you should see normal change rates.

I will note that this is unlikely to accelerate your wall-clock time;
if you've got resourceutilization 10, you've probably got 5+ threads
walking the FS, you've probably moved your bottleneck to IOPS on your
NAS as it tries to pull the metadata to satisfy the FS walk.  20
threads won't do that faster.

Possibility two: you have four agents with different include/exclude
statements but using the same BIGFS node.

In this case you are running a charlie-foxtrot formation.  One process
is backing stuff up while another is expiring the very same stuff.  If
you're doing this, you should stop it immediately, go back to one
agent, and complete an incremental, because you have your backups in
an indeterminate state.

__Recently, the TSM server database is growing faster than it
usually does, and I'm wondering whether there could be any
correlation between the ultra long running backup, many thousands of
files copied, and the faster pace of the database growth.


This symptom is what makes me think you're doing the latter.  If one
process is adding stuff while another throws it away, the result is a
rapidly growing tail of inactive versions, capped by (I think)
VERDELETED.  I don't know off the top of my head if an excluded file
is retained at VEREXISTS or VERDELETED.  Interesting.

- Allen S. Rout