Amanda-Users

Re: amdump performance problem

2007-03-07 14:11:34
Subject: Re: amdump performance problem
From: Gene Heskett <gene.heskett AT verizon DOT net>
To: amanda-users AT amanda DOT org, kberry AT microbeef DOT com
Date: Wed, 07 Mar 2007 14:02:57 -0500
On Wednesday 07 March 2007, Kenneth Berry wrote:
>Hello list,
>
>I have been working on a performance problem for about a month and I am
>out of ideas.  I hope someone can help me.  What is happening is that
>backups of large DLE that start very early seem to dump at a very slow
>rate.  Where as dumps of DLE on the same host, physical drive, etc. that
>start hours latter dump at an expected rate.  The amstatus snapshot
>below was taken at 9:30:00.  The backups start at midnight and notice
>that the dump rate for mbthome:/home/public is about 830K/sec.  But DLE
>mbthome:/home_a_i which has been running for 21 mins. is at a dump rate
>of about 9500K/sec.
>
>-------------------------------------------------
>mbthome:/                          0    51860k finished (0:32:54)
>mbthome:/boot                      0     8431k finished (0:29:27)
>mbthome:/home/public               0 46747710k dumping 26966780k
>( 57.69%) (0:28:31)
>mbthome:/home_a_i                  0 20529050k dumping 11973280k
>( 58.32%) (9:09:34)
>mbthome:/home_j_k                  0 39011140k writing to tape (9:09:34)
>mbthome:/home_l_o                  0 13542210k wait for dumping
>mbthome:/home_p                    0   781310k finished (1:01:03)
>mbthome:/home_q_z                  0 10639260k wait for dumping
>mbthome:/home_rest                 0       10k finished (0:28:59)
>mbthome:/usr                       0   606581k finished (0:58:04)
>mbthome:/var                       0   330491k finished (0:38:43)
>-------------------------------------------------
>
>I have changed many things and not found the solution yet.  I have moved
>all the data from an hardware RAID5 to a single large drive on a
>different SCSI controller.  Played with maxdumps parameter it is set at
>two for the above snap shot, for the data below the maxdump was set 1
>(default).  Top utility shows very high CPU sys levels which settles
>when dumps are finished, are mine higher than expected when amdump is
>running?
>
>-------------------------------------------------
>top - 09:48:04 up 2 days, 18:43,  2 users,  load average: 5.89, 5.49,
>5.37
>Tasks: 152 total,   1 running, 151 sleeping,   0 stopped,   0 zombie
> Cpu0 :  9.3% us, 61.3% sy,  2.3% ni,  0.0% id, 20.9% wa,  0.0% hi,
>6.3% si
> Cpu1 :  1.0% us, 91.1% sy,  0.7% ni,  0.0% id,  6.0% wa,  0.0% hi,
>1.3% si
>Mem:   2075988k total,  2069940k used,     6048k free,    14444k buffers
>Swap:  1020116k total,        0k used,  1020116k free,  1934516k cached
>
>  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
>21343 amanda    16   0  2716  948 1780 S 89.8  0.0 197:24.39 dumper
>21344 amanda    16   0  2976 1096 1868 S 18.5  0.1  76:27.43 dumper
>28982 amanda    16   0  3028  804 1664 S 13.7  0.0   0:18.33 sendbackup
>21341 amanda    15   0  2728 1536 2432 D 12.0  0.1  20:17.89 taper
>28984 root      16   0  6660 1900 4304 D 10.1  0.1   0:14.84 tar
>21342 amanda    15   0  2720 1532 2432 S  7.5  0.1  10:57.67 taper
>28985 amanda    16   0  2532  420 1372 S  4.6  0.0   0:06.96 tar
>-------------------------------------------------
>
>Finally, so you can see the data for a full backup run I have included
>below the report from last night's backup run.  Notice that most DLE
>that have long dump times are dumped at a very slow rate.  I am sure I
>didn't include something that will be need to solve this problem, please
>email me back and I will provide the requested info.
>
>
>
>-------------------------------------------------
>DUMP SUMMARY:
>                                     DUMPER STATS            TAPER
>STATS
>HOSTNAME     DISK        L ORIG-KB OUT-KB COMP% MMM:SS  KB/s MMM:SS
>KB/s
>-------------------------- ---------------------------------
>------------
>mbtas00      /           0 2177990 816539  37.5   8:121659.0
>1:0313057.1
>mbtas00      /back01     0 138367104785625  34.6  28:372786.4
>6:2612392.8
>mbtas00      /boot       0   10880   6346  58.3   0:023898.6
>0:022919.1
>mbtas00      /opt        0 182746208971682  49.1  50:242966.9
>6:1423968.4
>mbtas00      /var        0  165760  22406  13.5   0:35 637.0
>0:0210429.8
>mbtas01      /           0 34362501438229  41.9   7:043392.3
>1:5912131.7
>mbtas01      /boot       0   10880   6346  58.3   0:024023.7
>0:041659.7
>mbtas01      /opt        0 3140260015084957  48.0  64:583869.4
>15:1716454.1
>mbtas01      /u          0      10      1  10.0   0:00   0.0   0:01
>0.9
>mbtas01      /var        0  221740  29565  13.3   0:201488.1
>0:038748.2
>mbtdb01      /           0 2090070 806698  38.6   4:432848.7
>0:1748362.3
>mbtdb01      /boot       0   10880   6344  58.3   0:015897.3
>0:023488.9
>mbtdb01      /opt        0 235088507524669  32.0  29:324246.4
>2:5343470.9
>mbtdb01      -ata/arch01 0 268880407923108  29.5  30:494285.1
>3:0243647.0
>mbtdb01      -ata/arch02 0 268880407923108  29.5  30:534275.3
>3:0343404.1
>mbtdb01      -ata/back01 0 7335320819585192  26.7 394:16 827.9
>24:0613543.6
>mbtdb01      -ata/back02 0 7054363217685384  25.1  70:474163.9
>5:5549858.2
>mbtdb01      -ata/back03 0 7045140018625938  26.4 190:151631.7
>12:5923901.9
>mbtdb01      -ata/back04 0 7047288018959600  26.9 164:061925.6
>8:1838060.7
>mbtdb01      -ata/data24 0  225760  79360  35.2   0:164897.4
>0:0157126.9
>mbtdb01      -ata/data25 0 2627960 801797  30.5   2:574527.8
>0:2040786.2
>mbtdb01      /var        0  178450  22837  12.8   0:141618.9
>0:0115461.3
>mbtdbmon     /           0 2933423014565741  49.7 329:34 736.6
>14:3916575.8
>mbtdbmon     /boot       0   26240   8218  31.3   0:024734.5
>0:023558.7
>mbtddb01     /           0 124253807549686  60.8 207:27 606.6
>9:4412921.1
>mbtddb01     /boot       0   32340  18843  58.3   0:082280.6
>0:028575.3
>mbtdev       /           0  200110  96922  48.4   0:273527.7
>0:0426065.5
>mbtdev       /boot       0    6100   3814  62.5   0:016704.0
>0:012595.2
>mbtdev       /opt        0 91523407164046  78.3  19:386080.9
>9:0513156.2
>mbtdev       /usr        0 1861820 695789  37.4   3:383187.3
>0:5512582.7
>mbtdev       /var        0  286040  66504  23.2   0:213224.4
>0:0232729.4
>mbthome      /           0  162160  51489  31.8   0:381345.5
>0:0228477.5
>mbthome      /boot       0    9020   8432  93.5   0:023672.4
>0:023553.0
>mbthome      -ome/public 0 4674511237080424  79.3 437:571411.1
>25:5423858.1
>mbthome      /home_a_i   0 2096223015266312  72.8  63:244013.7
>10:1824709.6
>mbthome      /home_j_k   0 3901114021733976  55.7 106:023416.2
>14:4224635.9
>mbthome      /home_l_o   0 1354221010397536  76.8  41:004226.0
>6:5325175.6
>mbthome      /home_p     0  781120 358604  45.9   1:553108.2
>0:1133516.1
>mbthome      /home_q_z   0 106384606287909  59.1  29:233565.9
>3:0534043.1
>mbthome      /home_rest  0      10      1  10.0   0:00   0.0   0:01
>0.8
>mbthome      /usr        0 1628430 606585  37.2   5:371800.7
>0:1832936.3
>mbthome      /var        0  589220 312805  53.1   1:462951.8
>0:1128095.3
>mbtns1       /           0  109530  41446  37.8   0:331261.4
>0:0220104.2
>mbtns1       /boot       0    5640   3368  59.7   0:021926.8
>0:021594.6
>mbtns1       /usr        0 1377340 520768  37.8  17:16 502.5
>0:4212254.3
>mbtns1       /var        0  432520 284235  65.7   5:54 802.2
>0:1716970.1
>mbtns2       /           0  244140 132484  54.3   0:552427.6
>0:1111976.6
>mbtns2       /boot       0    5050   4060  80.4   0:015025.0
>0:021927.8
>mbtns2       /opt        0  103040  34858  33.8   0:122896.9
>0:0214753.5
>mbtns2       /usr        0 1462440 595361  40.7  20:42 479.4
>0:4712767.9
>mbtns2       /var        0 1250850 603571  48.3  28:12 356.7
>0:4114751.6
>mbtpas02     /           0 40305402190711  54.4  79:56 456.8
>2:2714904.4
>mbtpas02     /boot       0   10260   8524  83.1   0:032582.8
>0:023824.2
>mbttdb02     /           0 30578601398209  45.7  11:342014.6
>1:0023217.1
>mbttdb02     /boot       0   26950  22452  83.3   0:073279.7
>0:038598.5
>mbttdb03     /           0 39120702074878  53.0  16:512053.0
>2:3013841.5
>mbttdb03     /boot       0   20200  16808  83.2   0:053122.1
>0:036549.9
>mbtts00      /           0 1962845010877320  55.4  96:321878.0
>15:1111940.7
>mbtts00      /boot       0   10780   6243  57.9   0:022998.0
>0:022886.2
>mbtts01      /           0 2344288011531924  49.2 113:191696.1
>7:4025089.6
>mbtts01      /boot       0   10780   6243  57.9   0:07 881.1
>0:032456.7
>mbtvm01      /           0 1268570 495797  39.1   2:263406.7
>1:107090.0
>mbtvm01      /boot       0   25150  22218  88.3   0:046238.0
>0:029008.5
>
>(brought to you by Amanda version 2.4.4p2)
>-------------------------------------------------

Warning, word-wrap turned off to preserve formatting below.

This somewhat waddles and quacks like a duck, so I expect it is, Kenneth.
The 'duck' in this case is not enough holding disk, so the bigger 
dle's are going direct to tape, one at a time only, and if they are also
being compressed, the gzip being done will eat a ton of cpu.  I see
several entries above with indicated compressions of over 80%.  Those
should be switched to a dumptype that doesn't compress as the gain from
all that gzip wheel spinning just isn't worth it.  This change alone will 
cut the runtime by possibly as much as 75%.

You could also adjust some of the parameters in the format string in 
your amanda.conf which controls the spacing and such of the above report 
so as to better separate the data & make it easier to read.  I have 
used this string for several years fairly successfully:
====from my amanda.conf======
columnspec 
"HostName=0:7,Disk=1:29,Level=1:2,OrigKB=1:6,OutKB=1:6,Compress=1:5,DumpTime=1:5,DumpRate=1:7,TapeTime=1:5,TapeRate=1:7"

# This is a recently added item, "displayunit".  The default has always been K 
till now
# One should set
displayunit "m"
# to make it use megabytes instead
===========
So I get emailed reports formatted like this:
===========
DUMP SUMMARY:
                                                   DUMPER STATS             
TAPER STATS 
HOSTNAME DISK                           L ORIG-MB OUT-MB COMP% MMM:SS    KB/s 
MMM:SS    KB/s
---------------------------------------- --------------------------------- 
-------------
coyote  /GenesAmandaHelper-0.5         0   1611    709  44.0   5:1  2319.5   
0:4 17141.6
coyote  /bin                           0      7      7   --    0:0  1149.8   
0:0 17806.5
coyote  /boot                          0     84     84   --    0:0 11677.5   
0:0 76451.7
coyote  /dev                           0      0      0   --    0:0   717.5   
0:0 13236.3
coyote  /etc                           0    146     17  11.8   0:3   499.1   
0:0 96074.5

Those entries with -- in the compression column are not being compressed.
But I haven't figured out howto adjust the headers formatting in reporter.c
to match the data and make it stick over version changes since I play the 
canary in the coal mine act in this play by running the latest snapshots of
the stable tree.  Currently the 20070222 snapshot of 2.5.1p3.

I hope this helps.

Cheers, Gene
-- 
"There are four boxes to be used in defense of liberty:
 soap, ballot, jury, and ammo. Please use in that order."
-Ed Howdershelt (Author)
We are Microsoft.  Unix is irrelevant.  Openness is futile.  Prepare
to be assimilated.

<Prev in Thread] Current Thread [Next in Thread>