Re: Speed up 400GB backup?

On Mon, 2004-07-19 at 22:41, Frank Smith wrote:

> 420GB is not the total amount per night. Something is bogging this down

> though and I don't know what. I am not using holding disks because the
> majority of data is being backed up from one set of disks to another on
> the same machine. This one machine has a set of RAID 10 disks. These
> disks are backed up by amanda and put onto a set of RAID 5 disks. 

OK, I was assuming a different setup.  Having a holding disk would let
you run multiple dumps in parallel.  Wouldn't help much (if any) when
its all on one machine, but can really speed up your overall time if
you have multiple clients.

> As far
> as assigning spindle #s goes I don't quite understand why I would set
> that. I have inparallel set to 4  and then didn't define maxdumps, so I
> would assume that not more than 1 dumper would get started on a machine
> at once. Am I getting this right? 

I think maxdumps defaults to 2 but I may be wrong (someone else should
jump in here).  I usually define everything so I know for sure how its
defined without digging into the source.
 You're right, spindle numbers are only really useful with maxdumps > 1.

> Here is my email log from the backup
> this morning. 
> 
> STATISTICS:
>                           Total       Full      Daily
>                         --------   --------   --------
> Estimate Time (hrs:min)    7:30

Here's your runtime problem, 7.5 hours for estimates .

> Run Time (hrs:min)        10:35
> Dump Time (hrs:min)        2:52       0:29       2:23

Three hours for dumps doesn't seem too bad.  It could probably
be improved some, but the estimates are what's killing you.

> Output Size (meg)       12163.2     9094.3     3068.9
> Original Size (meg)     29068.4    19177.4     9891.0
> Avg Compressed Size (%)    41.8       47.4       31.0   (level:#disks
> ...)
> Filesystems Dumped            3          1          2   (1:1 5:1)
> Avg Dump Rate (k/s)      1207.5     5366.4      366.3
> 
> Tape Time (hrs:min)        0:17       0:13       0:05
> Tape Size (meg)         12163.3     9094.3     3069.0
> Tape Used (%)               1.8        1.3        0.4   (level:#disks
> ...)
> Filesystems Taped             3          1          2   (1:1 5:1)
> Avg Tp Write Rate (k/s) 11980.6    12287.9    11153.9
> \--------
> 
> 
> NOTES:
>   driver: WARNING: /tmp: not 102400 KB free.
>   planner: Incremental of venus.xxxx:/home bumped to level 5.
>   planner: Full dump of bda1.xxxx:/home specially promoted from 13 days
> ahead.
>   taper: tape DailySet111 kb 12455232 fm 3 [OK]
> 
> 
> DUMP SUMMARY:
>                                      DUMPER STATS            TAPER STATS
> 
> HOSTNAME     DISK        L ORIG-KB OUT-KB COMP% MMM:SS  KB/s MMM:SS
> KB/s
> -------------------------- ---------------------------------
> ------------
> bda1.xxxx /home       0 196376909312576  47.4  28:555366.4  12:3812287.9
> bda2.xxxx /var/www    1    3210    480  15.0   0:01 364.4   0:0028399.0
> venus.xxxx /home       5 101251603142176  31.0 142:59 366.3
> 4:4211152.8

I'd suggest adding columnspec to your config and adjusting it so that
all the columns don't run together. It makes it much easier to read.

Good idea, done!

I'm guessing that bda1:/home wrote 9.3GB to 'tape', taking about 26 min
to dump and almost 13 min. to tape.
venus:home wrote 3GB, taking over 2 hours to dump and 5 min. to dump.
Which (if any) of these is the backup server itself?

The backup server itself as well as the fileserver the data is coming from is called venus

The taper rates (about 12MB/sec if I'm parsing it right) seem ok, but
the 142 min dump time seems somewhat high for only 3GB of data.
Is that the 400GB filesystem you were talking about, and is it local
or remote?

Those disk are local to the backup server.

  As for the estimates, are you using dump or tar?  Look in the 
*debug files on the clients and see which one was taking all the time
(I'm guessing venus since it looks like you did a force on bda1).
Does that filesystem have millions of small files?

I am using tar to do this. The bda1 system is a CVS server which gets hammered on all day long and does have tons of smaller files as well as a decent amount of larger ones.

  I'm not sure of the best way to speed up estimates, other than a
faster disk system.

The disks in the venus box are all SATA 150 drives, SCSI is way out of the price range for this amount of space. If venus is the machine that is taking forever to do the estimates, is it possible that 1. estimates start on all machines, 2. the estimates finish on the smaller remote file systems first; these systems begin to dump. 3. now along with the backup server trying to do an estimate on its own disks, its also dealing with a dump coming in from remote systems and all of this together is slowing it down? Do I have any valid ideas here?
-Kris

 Perhaps someone else on the list has some ideas.

Frank

> 
> On Mon, 2004-07-19 at 15:20, Frank Smith wrote: 
> 
> --On Monday, July 19, 2004 14:07:40 -0700 Kris Vassallo
> <kris AT linuxcertified DOT com> wrote:
> 
> 
> 
>>   I am looking for some assistance in tweaking the bumpsize, bumpdays,
> 
>> and bumpmult items in amanda.conf. I am backing up 420GB + worth of
> home
> 
>> directories to hard disks every night and the backup is taking about
> 11
> 
>> hours. I just changed the backup of one 400GB home drive from client
> 
>> compress best to client compress fast, which did seem to shave a bit
> of
> 
>> time off the backup. The disks that are being backed up are on the
> same
> 
>> RAID controller as the backup disks.
> 
>> I really need to make the backup take a lot less time because the
> 
>> network crawls when the developers come in to work in the morning
> 
>> because the home directory server is blasting away with the backup.
> So,
> 
>> with a filesystem this large, what would be some good settings for the
> 
>> bump options. Also, are there any other things I can do to get this
> 
>> backup done any faster without turning off disk compression all
> 
>> together?
> 
> 
> 
> Are you actually writing 420GB per night, or is that just the total
> 
> amount to be backed up?  If most of your data isn't changing daily
> 
> then breaking up your DLEs to not have a 400GB chunk could spread
> 
> the level 0s across more nights and shorten your nightly backup time.
> 
>    Are you sure its the compression using up most of the time?  You
> 
> probably need to add spindle numbers to your disklist to serialize
> 
> the accesses to the DLEs that share common disks.  Using a holding
> 
> disk not on the same controller would speed things up also.
> 
>    If your DLS and file backups share the same disks and not just
> 
> the same controller then the disks will waste quite a bit of time
> 
> seeking back and forth.  You might also want to do some performance
> 
> testing on your RAID controller, perhaps it is the bottleneck as
> 
> the model of controller (and the RAID level) can have a big impact
> 
> on throughput.
> 
>    Perhas posting your daily report and more details of the physical
> 
> layout would give us a better idea of where to start on suggestions
> 
> for improving your backup times.
> 
> 
> 
> Frank
> 
> 
>