Amanda-Users

Re: amanda dumps on Sun E250

2008-03-18 17:51:16
Subject: Re: amanda dumps on Sun E250
From: Brian Cuttler <brian AT wadsworth DOT org>
To: Chris Hoogendyk <hoogendyk AT bio.umass DOT edu>
Date: Tue, 18 Mar 2008 17:29:40 -0400
Chris,

On Tue, Mar 18, 2008 at 04:32:53PM -0400, Chris Hoogendyk wrote:
> hmm. a bit complicated.
> 
> I'd certainly like to have a couple of T2000's. ;-)

Sorry, mine are in use, but I understand they are still available :-)

> Anyway, I have an E250 running Amanda 2.5.1p3 with Solaris 9. I have an 
> AIT5 tape library. AIT5 is rated at 24MB/s, which is slower than your 
> faster tape drives. However, I'm able to drive it at nearly full speed 
> (when I am driving it). My bottlenecks are the activity on the other 
> servers that I'm pulling from and the network. Once I get things on the 
> holding disk, they zing out to tape. But, the tape experiences a lot of 
> idle time while data is being assembled for it.

If only... my holding drives are full and the tape time is the
slow point for me.

> I believe your speed to tape is less than mine. It looks like you are up 
> over 500G being backed up. If I plugged that into my tape speed, I would 
> be doing it in about 10 hours. Of course, that assumes other factors 
> aren't bottlenecking, which you say they aren't. And your tapes should 
> be faster.

Well, I'm getting data to the holding areas faster than I'm clearing
them, so I think I can eliminate the other issues. It looked like
simple logic to me, which usually means I've overlooked something.

> So, where to from there? I have two 300G ultraSCSI/320 10k-rpm Seagate 
> Cheetah holding drives. They are mounted internally. I have a PCI 
> Ultra320 dual SCSI expansion card that I added to the E250. The tape 
> library is connected through that. I bought it through our authorized 
> Sun reseller, and it's a Sun branded card.

My amanda work areas are all in a multipack connected to a Sun branded
HBA. Similarly the connection to the tapes, either the original ext bus
or an approved HBA (I have to look).

These are the drives, extracted from # format.

AVAILABLE DISK SELECTIONS:
       2. c2t1d0 <SEAGATE-ST373405LC-0002 cyl 29523 alt 2 hd 8 sec 607>
          /pci@1f,4000/scsi@4,1/sd@1,0
       3. c2t2d0 <SEAGATE-ST373307LC-0004 cyl 49780 alt 2 hd 4 sec 720>
          /pci@1f,4000/scsi@4,1/sd@2,0
       4. c2t3d0 <SUN18G cyl 7506 alt 2 hd 19 sec 248>
          /pci@1f,4000/scsi@4,1/sd@3,0
       5. c2t4d0 <SEAGATE-ST373207LC-0003 cyl 44304 alt 2 hd 4 sec 809>
          /pci@1f,4000/scsi@4,1/sd@4,0

> At the moment, we are running 100Mb/s ethernet using the onboard 
> connector (hme0). We are about to switch to GigE using PCI cards that we 
> bought from the same reseller. They are also Sun branded cards. In 
> discussing with their engineer how to configure this, we decided that we 
> would keep the Ultra320 SCSI card in PCI slot 3, which is 66MHz, and put 
> the GigE card (single) into one of the other slots, which are 33MHz.
> 
> I should note that I'm not backing up the volume that you are, and I'm 
> doing server side compression. Also, my backup server is just a backup 
> server, built from scratch for that purpose. I don't have any leftover 
> software/hardware/configuration stuff. Fresh install of Solaris 9. I'm 
> running mon (from kernel.org), but that is really minimalist, and you 
> can hardly tell it is running.

I disabled compression back when each of the two Lotus notes servers
was an E250 and each ran their own amanda server and had only themselves
as a client. We moved to HW compression at that time. We are still
running HW compression now.

> I'm inclined to think you should upgrade Amanda. While it might not 
> affect this particular issue, 2.4.4 is a bit old. Also, the email report 
> with taper stats by DLE and overall tape performance would be easier to 
> grab stats from than the amstatus report, but maybe that's just me.

The amstatus report was quick to grab, I will paste in the amdump
reports for both amanda runs, the daily and the weekly.

Beyond possible tuning issues, I believe amanda actually uses # DD
to move the data to tape. While I do want to upgrade amanda, and I
would do so tomorrow if someone said it was causing any sort of
performance issue, I don't know if that is the place to look.

I will be updating, we have an upcoming application for client-side
encryption...

> I'm stuck with hand-me-down E250's, because I can't get either 
> department to squeeze any money out of their budget for upgrades. While 
> I think a newer server would handle some things faster, I also think the 
> E250 ought to be able to drive the tape faster than you are experiencing.

I thought it would also, I just need to find out what to push on to
improve the situation.

Here are the amdump results.

This is from the once/week run - where we expect level 0 for 
all partitions, I would expect at least 10x the I/O rate to
the LTO drive.


                          Total       Full      Daily                      
                        --------   --------   --------                     
Estimate Time (hrs:min)    0:04                                             
Run Time (hrs:min)        41:37                                             
Dump Time (hrs:min)       61:29      61:29       0:00                       
Output Size (meg)      556921.8   556921.8        0.0                       
Original Size (meg)    556921.8   556921.8        0.0                       
Avg Compressed Size (%)     --         --         --                        
Filesystems Dumped           18         18          0                       
Avg Dump Rate (k/s)      2576.8     2576.8        --                        
                                                                            
Tape Time (hrs:min)       39:44      39:44       0:00                       
Tape Size (meg)        556921.9   556921.9        0.0                       
Tape Used (%)             278.5      278.5        0.0  
Filesystems Taped            18         18          0                     
Avg Tp Write Rate (k/s)  3986.3     3986.3        --   


USAGE BY TAPE:                                                        
  Label          Time      Size      %    Nb                                
  NOTESX29      15:51   87296.7   43.7    10                                
  NOTESX30       3:40  130923.7   65.5     2                                
  NOTESX31       8:56  127639.2   63.8     3                                
  NOTESX32       7:26  103730.4   51.9     2                                
  NOTESX33       3:52  107332.0   53.7     1   



HOSTNAME DISK           L   ORIG-KB  OUT-KB COMP% MMM:SS   KB/s MMM:SS   KB/s 
------------------------ ------------------------------------- -------------  
nwcapp  /              0   2104192   2104192  -   42:11  831.4 137:32  255.0  
nwcapp  /nexport       0     84096     84096  -   16:20   85.8   0:19 4379.6  
wcapp   /              0  11179424  11179424  -  304:30  611.9 178:52 1041.7  
wcapp   /db            0  71450624  71450624  -  322:24 3693.7  61:11 19461.3 
wcapp   /db2           0   9282400   9282400  -  309:19  500.2  26:14 5897.9  
wcapp   /export        0   5404608   5404608  -  106:06  849.0  33:50 2662.3  
wcnotes /              0  18889184  18889184  -  221:23 1422.0 103:17 3048.1  
wcnotes /export        0  16474880  16474880  -  102:10 2687.5 104:02 2639.4  
wcnotes /maildb2/five  0 109907968 109908000  -  428:56 4270.6 232:03 7894.1  
wcnotes /maildb2/four  0  15337540  15337568  -   89:45 2848.1 137:08 1864.1  
wcnotes /maildb2/one   0  62615192  62615200  -  464:50 2245.1 158:23 6589.1  
wcnotes /maildb2/three 0  43153420  43153440  -  185:45 3872.2 368:08 1953.7  
wcnotes /maildb2/two   0  56090928  56090944  -  390:29 2394.1 397:45 2350.4  
wcnotes /space         0    895424    895424  -   16:52  884.4  68:15  218.7  
wcnotes maildbAD       0  55292720  55292736  -  139:41 6597.3  49:01 18797.5 
wcnotes maildbEK       0  50128980  50128992  -  160:16 5213.0  48:34 17203.8 
wcnotes maildbLQ       0  32256300  32256320  -  224:49 2391.3 118:31 4536.1  
wcnotes maildbRZ       0   9740020   9740032  -  162:47  997.2 161:17 1006.5 

These are the results from the LTO3 drive, dumps performed 5x/week.
I'd hope for way more performance from this drive.

STATISTICS:                                                              
                          Total       Full      Daily                   
                        --------   --------   --------                      
Estimate Time (hrs:min)    0:03                                           
Run Time (hrs:min)        44:11                                             
Dump Time (hrs:min)       68:38      13:21      55:17                      
Output Size (meg)      514464.8   118889.8   395575.0                      
Original Size (meg)    514464.8   118889.8   395575.0                      
Avg Compressed Size (%)     --         --         --    (level:#disks ...)
Filesystems Dumped           18          3         15   (1:15)           
Avg Dump Rate (k/s)      2132.1     2532.5     2035.4                     
                                                                          
Tape Time (hrs:min)       39:37       4:09      35:28                     
Tape Size (meg)        514464.9   118889.8   395575.1                      
Tape Used (%)             133.3       30.8      102.5   (level:#disks ...)
Filesystems Taped            18          3         15   (1:15)    
Avg Tp Write Rate (k/s)  3693.4     8140.4     3172.5  

USAGE BY TAPE:  
  Label         Time      Size      %    Nb
  NOTES11      36:08  459823.1  119.1    17
  NOTES12       3:29   54641.9   14.2     1

HOSTNAME DISK           L   ORIG-KB    OUT-KB COMP% MMM:SS KB/s MMM:SS   KB/s
------------------------ ------------------------------------- -------------
nwcapp  /              1      2048      2048  -    0:44   46.4   0:02 1174.1
nwcapp  /nexport       1      2016      2016  -    0:06  355.6   0:02 1213.0
wcapp   /              1     32160     32160  -    2:43  197.1   0:03 10557.9
wcapp   /db            1  71322144  71322144  -  395:57 3002.2 370:01 3212.6
wcapp   /db2           1   8707744   8707744  -   58:00 2502.0  54:09 2680.2
wcapp   /export        1   1178752   1178752  -   10:35 1856.4 157:33  124.7
wcnotes /              1     11200     11200  -    3:06   60.3   0:02 4488.9
wcnotes /export        0  16467872  16467872  -  333:12  823.7  18:14 15057.0
wcnotes /maildb2/five  1 108743208 108743232  -  924:03 1961.4 287:49 6296.9
wcnotes /maildb2/four  1  15079680  15079680  -  185:53 1352.0 147:05 1708.8
wcnotes /maildb2/one   1  62524980  62524992  -  423:14 2462.2 232:13 4487.6
wcnotes /maildb2/three 1  39562648  39562656  -  385:03 1712.5 367:06 1796.2
wcnotes /maildb2/two   1  55953280  55953280  -  285:41 3264.2 209:05 4460.1
wcnotes /space         1       288       288  -    0:01  292.9   0:02  170.1
wcnotes maildbAD       0  55244340  55244352  -  244:24 3767.5 185:55 4952.5
wcnotes maildbEK       0  50030920  50030944  -  223:37 3728.9  45:07 18481.9
wcnotes maildbLQ       1  32239140  32239168  -  370:19 1450.9 186:36 2879.4
wcnotes maildbRZ       1   9709550   9709568  -  271:23  596.3 116:12 1392.6

> ---------------
> 
> Chris Hoogendyk
> 
> -
>   O__  ---- Systems Administrator
>  c/ /'_ --- Biology & Geology Departments
> (*) \(*) -- 140 Morrill Science Center
> ~~~~~~~~~~ - University of Massachusetts, Amherst 
> 
> <hoogendyk AT bio.umass DOT edu>
> 
> --------------- 
> 
> Erd�s 4
> 
> 
> 
> Brian Cuttler wrote:
> >Hi amanda users,
> >
> >I'm running dumps on a SUN E250 server, this system has been
> >demoted from Lotus notes and local amanda server to just an
> >amanda server, but has picked up additional clients.
> >
> >Rather than being client/server for itself it is server for
> >itself as well as two Lotus notes system, both Sun T2000 servers.
> >
> >The bottle neck in performance is apparently amanda-work area to
> >tape. It takes forever to dump the data to tape once its on the
> >work area. I don't think this is an amanda issue, I think its a
> >system bus issue, backplain seems to run at only 100Mhz.
> >
> >I am running both LTO (imbedded in Storedge L9 library) and LTO3
> >(imbedded in C2 library), I have had these drives on other system
> >or similar drives on other system with much better performance.
> >
> >Does anyone know where to look for the proof/smoking-gun that
> >says "this is the wrong platform" or of any tuning I can perform,
> >either system-wise or amanda feature, that might improve the
> >throuput to tape ?
> >
> >We seem to produce completed DLE on work area more quickly than
> >we can put to tape, the tape is busy constantly once the first
> >DLE starts to flush to tape. I could add more work-area, which
> >migh reduce I/O and CPU load on clients sooner, but will not
> >complete the amanda run any sooner since the bottleneck is the
> >tape drives.
> >
> >For reference, E250 is running Solaris 5.9, the T2000 systems run
> >Solaris 10, Amanda server and clients are 2.4.4. The C2/LTO3 runs
> >amanda 5x/week and the L9/LTO runs once on the weekend. Well, that
> >is what we wanted, the amanda jobs are exceeding 24 hours.
> >
> >  
> >>amstatus notes
> >>    
> >Using /usr/local/etc/amanda/notes/log/amdump from Mon Mar 17 19:30:00 EST 
> >2008
> >
> >nwcapp:/               0  2104256k finished (22:21:20)
> >nwcapp:/nexport        0    84192k finished (19:35:22)
> >wcapp:/                0 11179776k finished (12:54:54)
> >wcapp:/db              0 71562976k finished (8:18:18)
> >wcapp:/db2             0  9246208k finished (12:33:30)
> >wcapp:/export          0  5415904k finished (2:13:04)
> >wcnotes:/              0 18889568k writing to tape (12:54:55)
> >wcnotes:/export        1  7453632k finished (23:23:30)
> >wcnotes:/maildb2/five  0110095430k dumping 103184224k ( 93.72%) (19:35:22)
> >wcnotes:/maildb2/four  0 15375740k dump done (11:27:26), wait for writing 
> >to tape
> >wcnotes:/maildb2/one   0 62708540k wait for dumping 
> >wcnotes:/maildb2/three 0 43173850k dumping  2487392k (  5.76%) (12:54:55)
> >wcnotes:/maildb2/two   0 56212730k wait for dumping 
> >wcnotes:/space         0   895424k finished (21:52:57)
> >wcnotes:maildbAD       1 55338550k wait for dumping 
> >wcnotes:maildbEK       1 48909540k wait for dumping 
> >wcnotes:maildbLQ       0 32725550k dump done (12:33:07), wait for writing 
> >to tape
> >wcnotes:maildbRZ       0  9739760k finished (2:03:09)
> >
> >SUMMARY          part      real  estimated
> >                           size       size
> >partition       :  18
> >estimated       :  18            560839789k
> >flush           :   0         0k
> >failed          :   0                    0k           (  0.00%)
> >wait for dumping:   4            223169360k           ( 39.79%)
> >dumping to tape :   0                    0k           (  0.00%)
> >dumping         :   2 105671616k 153269280k ( 68.95%) ( 18.84%)
> >dumped          :  12 184672986k 184401149k (100.15%) ( 32.93%)
> >wait for writing:   2  48101290k  47701260k (100.84%) (  8.58%)
> >wait to flush   :   0         0k         0k (100.00%) (  0.00%)
> >writing to tape :   1  18889568k  18889517k (100.00%) (  3.37%)
> >failed to tape  :   0         0k         0k (  0.00%) (  0.00%)
> >taped           :   9 117682128k 117810372k ( 99.89%) ( 20.98%)
> >6 dumpers idle  : no-diskspace
> >taper writing, tapeq: 2
> >network free kps:    114152
> >holding space   :   2102533k (  0.95%)
> > dumper0 busy   :  8:56:50  ( 51.61%)
> > dumper1 busy   : 10:41:22  ( 61.65%)
> > dumper2 busy   :  9:47:27  ( 56.47%)
> > dumper3 busy   :  0:33:58  (  3.27%)
> > dumper4 busy   :  7:47:41  ( 44.96%)
> > dumper5 busy   :  0:39:35  (  3.81%)
> > dumper6 busy   : 17:19:47  ( 99.95%)
> > dumper7 busy   :  8:00:03  ( 46.15%)
> >   taper busy   : 15:22:25  ( 88.67%)
> > 0 dumpers busy :  0:00:00  (  0.00%)
> > 1 dumper busy  :  0:21:30  (  2.07%)        no-diskspace:  0:21:30  
> > (100.00%)
> > 2 dumpers busy :  5:47:38  ( 33.42%)        no-diskspace:  5:47:23  ( 
> > 99.93%)
> >                                               start-wait:  0:00:15  (  
> >                                               0.07%)
> > 3 dumpers busy :  2:58:30  ( 17.16%)        no-diskspace:  2:57:59  ( 
> > 99.72%)
> >                                               start-wait:  0:00:30  (  
> >                                               0.28%)
> > 4 dumpers busy :  1:43:47  (  9.98%)        no-diskspace:  1:43:32  ( 
> > 99.76%)
> >                                               start-wait:  0:00:15  (  
> >                                               0.24%)
> > 5 dumpers busy :  3:57:23  ( 22.82%)        no-diskspace:  3:57:03  ( 
> > 99.86%)
> >                                               start-wait:  0:00:20  (  
> >                                               0.14%)
> > 6 dumpers busy :  1:51:45  ( 10.74%)        no-diskspace:  1:51:20  ( 
> > 99.63%)
> >                                               start-wait:  0:00:24  (  
> >                                               0.37%)
> > 7 dumpers busy :  0:00:38  (  0.06%)        no-diskspace:  0:00:22  ( 
> > 59.68%)
> >                                               start-wait:  0:00:15  ( 
> >                                               40.32%)
> > 8 dumpers busy :  0:39:04  (  3.76%)            not-idle:  0:39:04  
> > (100.00%)
> >
> >
> >---
> >   Brian R Cuttler                 brian.cuttler AT wadsworth DOT org
> >   Computer Systems Support        (v) 518 486-1697
> >   Wadsworth Center                (f) 518 473-6384
> >   NYS Department of Health        Help Desk 518 473-0773
---
   Brian R Cuttler                 brian.cuttler AT wadsworth DOT org
   Computer Systems Support        (v) 518 486-1697
   Wadsworth Center                (f) 518 473-6384
   NYS Department of Health        Help Desk 518 473-0773



IMPORTANT NOTICE: This e-mail and any attachments may contain
confidential or sensitive information which is, or may be, legally
privileged or otherwise protected by law from further disclosure.  It
is intended only for the addressee.  If you received this in error or
from someone who was not authorized to send it to you, please do not
distribute, copy or use it or any attachments.  Please notify the
sender immediately by reply e-mail and delete this from your
system. Thank you for your cooperation.



<Prev in Thread] Current Thread [Next in Thread>