Amanda-Users

Re: tape throughput - lto1

2006-07-06 12:01:23
Subject: Re: tape throughput - lto1
From: Brian Cuttler <brian AT wadsworth DOT org>
To: Joshua Baker-LePain <jlb17 AT duke DOT edu>
Date: Thu, 6 Jul 2006 11:53:42 -0400
Joshua,

On Thu, Jul 06, 2006 at 11:12:39AM -0400, Joshua Baker-LePain wrote:
> On Thu, 6 Jul 2006 at 10:24am, Brian Cuttler wrote
> 
> >I've added more work area to amanda, have been trying to find
> >what other problems we may be seeing with the job, since it
> >still seems to take longer than it should.
> >
> >Upon looking more closely at the amanda report from amdump I see
> >that the tape I/O rate is around 1800 KP/s where as 2 months ago
> >15,000 was not unusual.
> >
> >The reduction does not seem to be tied to a system reboot (patches,
> >installation of HBA [host bus adapter] for the LTO3) nor any other
> >event that I can identify, and in fact I notice that we seem to have
> >two step downs in the I/O rate, separated by aprox one tape cycle.
> 
> I'm going to assume that the drive is LTO1, since you say that twice (in 
> the subject, and in the part I snipped below) and LTO3 only once.  :) 
> Based on some quick specs I found 16MB/s is native rate for LTO1, so your 
> 15K above was normal.

Yes, the drive is actually an LTO1, we've run the L9/LTO1 jukebox
in excess of 3 years and do not have a service contract for it. The
system library died a couple of months back, circuit failure. The
LTO3 is part of the StorEdge C2 Library that we have not yet put into
production... maybe tonight is the night.

> My first suspicion would be that your DLE(s) outgrew your holding space, 
> so now they're dumping straight to tape over the network.  But the amflush 
> you mention below would appear to speak against that.

This was last weeks problem, to many DLE not enough holding disk, though
I now know that this was do in large part to the fact that I have not
been able to clean off the work area by putting the DLEs to tape in a
timely fashion.

> >I have tried to clean the tape drive, have tried to relabel the tape
> >(amflush running as I write) and will next try a brand new tape with
> >the assumption that the max number of tape cycles has been reached
> >on all volumes at the same time. While that would mean remarkable
> >quality control in manufacture, the tapes where all purchased at the
> >same time and have been used an almost identical number of times.
> >
> >If the new tape doesn't help (I expect it will but who knows) I don't
> >know what else it might be, wear of the tape heads ?
> 
> Have you tested the tape performance outside of amanda?  amflush *should* 
> go as fast as the tape and disk drives will let it, but it never hurts to 
> take as many things out of the equation as you can.  Try 'dd'ing from 
> /dev/zero (or the Solaris equivalent) to the tape drive and see how fast 
> that goes.  Ditto for tar with various block sizes.

I haven't (yet) tested the tape performance outside of amanda, if/when
I get access to the drive (amflush completes) I need to archive my FW
and proxy logs. That is a fairly substantial quanitity of data and will
run outside of amanda.

> The blocksize settings didn't get mucked up, did they?

Great question, but I don't see how they could have been stepped down
2x, several weeks apart and the first occurance being aprox 2 weeks
after the most recent reboot. I will check that further if the current
set of write tests don't show any improvement, while not my first
guess it is one of the more readily fixable problems.

> If anything you do to the drive only goes at 1800KB/s, I'd say it's time 
> to call support.  Did I hear the word Dell?  *shudder*  Good luck with 
> that (from a fellow Dell "user").  ;)

A flush of a DLE to the new LTO3 showed the expected write rate, so I
don't see a bus or disk problem on that side of the CPU. We still have
the bus/tape on this side to worry about. Yes, the LTO1 and LTO3 are
actually on separate buses, for that matter the work areas don't share
either of the buses used by the tape libraries buses.

A relabel of the tape did nothing for performance. I am running an
amflush with a brand new tape now, though some sort of error on the
library console (an LCD window) or in the messages file would have been
nice to find if it where the heads or the media.

Will let you know but if my estimates of flush time are any good I'm
not getting the results I'd hoped for.

> -- 
> Joshua Baker-LePain
> Department of Biomedical Engineering
> Duke University
---
   Brian R Cuttler                 brian.cuttler AT wadsworth DOT org
   Computer Systems Support        (v) 518 486-1697
   Wadsworth Center                (f) 518 473-6384
   NYS Department of Health        Help Desk 518 473-0773


<Prev in Thread] Current Thread [Next in Thread>