Amanda-Users

Troubleshooting a slowdown problem?

2004-06-17 12:05:51
Subject: Troubleshooting a slowdown problem?
From: KEVIN ZEMBOWER <KZEMBOWE AT jhuccp DOT org>
To: amanda-users AT amanda DOT org
Date: Thu, 17 Jun 2004 12:02:29 -0400
A couple of months ago, I added a server, centernet, to my Amanda backups. 
Since that time, about once a week or two, but not everyday, the backup runs 
for 16-20 hours, instead of its normal less than 8. It's running right now 
since last night at 8:00pm:
amanda@admin:~ > amstatus DailySet1
Using /var/log/amanda/DailySet1/amdump from Wed Jun 16 20:00:00 EDT 2004

admin://db/c$                       0   462720k finished (20:56:41)
admin://db/e$                       1       10k finished (20:16:16)
admin://db/f$                       1  2559660k finished (20:51:25)
admin://db/f$/inetsrv/webpub/images 1       30k finished (20:16:07)
admin:sda1                          0     3410k finished (20:17:09)
admin:sda3                          0  3683924k wait for dumping 
admin:sdb1                          0    24270k finished (20:17:05)
centernet:sda1                      0     4846k finished (20:06:03)
centernet:sda2                      0   715715k finished (2:15:47)
centernet:sda3                      0   110883k finished (20:58:42)
centernet:sda5                      0  1812031k dumping  1332992k ( 73.56%) 
(2:02:34)
centernet:sda6                      1       73k finished (20:03:37)
centernet:sda7                      0      564k finished (20:04:03)
centernet:sda9                      0    30391k finished (20:18:59)
mailinglists:hda1                   0     2198k finished (20:04:30)
mailinglists:hda2                   0   399339k finished (23:03:41)
mailinglists:hda7                   0   743827k finished (4:41:56)

SUMMARY          part      real  estimated
                           size       size
partition       :  17
estimated       :  17             11566387k
flush           :   0         0k
failed          :   0                    0k           (  0.00%)
wait for dumping:   1              3683924k           ( 31.85%)
dumping to tape :   0                    0k           (  0.00%)
dumping         :   1   1332992k   1812031k ( 73.56%) ( 11.52%)
dumped          :  15   5057936k   6070432k ( 83.32%) ( 43.73%)
wait for writing:   0         0k         0k (  0.00%) (  0.00%)
wait to flush   :   0         0k         0k (100.00%) (  0.00%)
writing to tape :   0         0k         0k (  0.00%) (  0.00%)
failed to tape  :   0         0k         0k (  0.00%) (  0.00%)
taped           :  15   5057936k   6070432k ( 83.32%) ( 43.73%)
7 dumpers idle  : no-hold
taper idle
network free kps:     26362
holding space   :  34661948k ( 95.03%)
 dumper0 busy   :  8:13:03  ( 95.12%)
 dumper1 busy   :  6:24:10  ( 74.11%)
 dumper2 busy   :  2:52:40  ( 33.31%)
   taper busy   :  1:07:38  ( 13.05%)
 0 dumpers busy :  0:00:00  (  0.00%)
 1 dumper busy  :  0:13:42  (  2.64%)             no-hold:  0:13:42  (100.00%)
 2 dumpers busy :  7:57:50  ( 92.18%)  client-constrained:  5:31:58  ( 69.47%)
                                                  no-hold:  2:25:40  ( 30.49%)
                                               start-wait:  0:00:11  (  0.04%)
 3 dumpers busy :  0:26:50  (  5.18%)  client-constrained:  0:26:47  ( 99.77%)
                                               start-wait:  0:00:03  (  0.23%)
amanda@admin:~ > 

What can be determined from this status regarding the reasons for the backup of 
centernet:sda5 to be so slow? Actually, I guess it was either centernet:sda1 or 
centernet:sda3 which took over 20 hours (am I reading this correctly, or is it 
20 minutes, or is this a time-of-day?).

ps on centernet doesn't show anything abnormal:
cn2:~# ps aux |grep amanda
amanda   27333  0.0  0.2  1680  636 ?        S    02:01   0:00 
/usr/local/libexec/sendbackup
amanda   27335  1.9  0.2  1596  600 ?        S    02:01  11:08 /bin/gzip --fast
amanda   27336  0.0  0.1  1904  364 ?        S    02:01   0:00 dump 0usf 
1048576 - /dev/sda5
amanda   27337  0.0  0.2  1956  664 ?        S    02:01   0:07 dump 0usf 
1048576 - /dev/sda5
amanda   27338  0.0  0.1  1904  488 ?        S    02:01   0:12 dump 0usf 
1048576 - /dev/sda5
amanda   27339  0.0  0.1  1904  504 ?        S    02:01   0:11 dump 0usf 
1048576 - /dev/sda5
amanda   27340  0.0  0.1  1904  480 ?        S    02:01   0:12 dump 0usf 
1048576 - /dev/sda5
root     29630  0.0  0.1  1336  436 pts/1    S    11:43   0:00 grep amanda
cn2:~# 

And the partitions on this host aren't outrageous:
cn2:~# df -h
Filesystem            Size  Used Avail Use% Mounted on
/dev/sda2             7.3G  1.5G  5.5G  21% /
/dev/sda1             7.6M  5.6M  1.6M  78% /boot
/dev/sda3             4.6G  291M  4.0G   7% /usr
/dev/sda5             4.6G  2.0G  2.3G  46% /opt/analog/logdata
/dev/sda6             3.7G  794M  2.7G  23% /var/www/centernet/htdocs
/dev/sda7             3.7G  5.3M  3.4G   1% /var/lib/mysql
/dev/sda9             2.8G  572M  2.0G  22% /var/www/centernet/logs
cn2:~# 

Here's the relevant parts of disklist and amanda.config:
amanda@admin:/etc/amanda/DailySet1 > grep centernet disklist
centernet sda1 comp-user                # /boot
centernet sda2 comp-user                # /
centernet sda3 comp-user                # /usr
centernet sda5 comp-user                # /opt/analog/logdata
centernet sda6 comp-user                # /var/www/centernet/htdocs
centernet sda7 comp-user                # /var/lib/mysql
centernet sda9 comp-user                # /var/www/centernet/logs
amanda@admin:/etc/amanda/DailySet1 > 

amanda@admin:/etc/amanda/DailySet1 > egrep -v "(^( |\t)*#|^$)" amanda.conf 
org "JHU/CCP"           # your organization name for reports
mailto "isgalert AT jhuccp DOT org"            # space separated list of 
operators at your site
dumpuser "amanda"       # the user to run dumps under
inparallel 8            # maximum dumpers that will run in parallel (max 63)
dumporder "tttttttt"    # specify the priority order of each dumper
netusage  25000 Kbps    # maximum net bandwidth for Amanda, in KB per sec
dumpcycle 3             # the number of days in the normal dump cycle
runspercycle 3          # the number of amdump runs in dumpcycle days
tapecycle 25 tapes      # the number of tapes in rotation
bumpsize 20 Mb          # minimum savings (threshold) to bump level 1 -> 2
bumpdays 1              # minimum days at each level
bumpmult 4              # threshold = bumpsize * bumpmult^(level-1)
etimeout 300            # number of seconds per filesystem for estimates.
dtimeout 1800           # number of idle seconds before a dump is aborted.
ctimeout 30             # maximum number of seconds that amcheck waits
tapebufs 20
tapedev "/dev/nst0"     # the no-rewind tape device to be used
rawtapedev "/dev/null"  # the raw device to be used (ftape only)
tapetype Python-DDS3            # what kind of tape it is (see tapetypes below)
labelstr "^DailySet1[0-9][0-9]*$"       # label constraint regex: all tapes 
must match
holdingdisk hd1 {
    comment "main holding disk"
    directory "/var/amanda"     # where the holding disk is
    use -0Mb            # how much space can we use on it. Use everything.
    chunksize 1Gb       # size of chunk if you want big dump to be
    }
holdingdisk hd2 {
    directory "/dumps2/amanda"
    use -0 Mb
    }
reserve 50 # percent
autoflush yes #
infofile "/var/log/amanda/DailySet1/curinfo"    # database DIRECTORY
logdir   "/var/log/amanda/DailySet1"            # log directory
indexdir "/var/log/amanda/DailySet1/index"      # index directory
define tapetype Python-DDS3 {
    comment "Dell Python with DDS-3 tapes"
    length 11570 mbytes
    filemark 0 kbytes
    speed 1078 kps 
    lbl-templ "/usr/local/etc/amanda/DailySet1/3holeJHUCCP.ps"
}
define dumptype global {
    comment "Global definitions"
}
define dumptype comp-user {
    global
    comment "Non-root partitions on reasonably fast machines"
    compress client fast
    priority medium
}
define interface local {
    comment "a local disk"
    use 1000 kbps
}
define interface le0 {
    comment "10 Mbps ethernet"
    use 400 kbps
}
amanda@admin:/etc/amanda/DailySet1 > 

Thanks for any suggestions on what's happening, and how to fix it. Please let 
me know if there's some other diagnostic I should run to further define this 
problem.

-Kevin Zembower


-----
E. Kevin Zembower
Unix Administrator
Johns Hopkins University/Center for Communications Programs
111 Market Place, Suite 310
Baltimore, MD  21202
410-659-6139