Amanda-Users

Re: streaming

2004-06-02 21:59:35
Subject: Re: streaming
From: Gene Heskett <gene.heskett AT verizon DOT net>
To: amanda-users AT amanda DOT org
Date: Wed, 2 Jun 2004 21:52:55 -0400
On Wednesday 02 June 2004 18:21, Glenn English wrote:
>On Wed, 2004-06-02 at 14:12, Gene Heskett wrote:
>> Any current ide drive can do 30+ Mb/sec if left
>> alone by other tasks, often quite a ways on the + side.
>
>Is that just a burst out of the cache, or can they read
> dis-contiguous files, seek around to other files, wait for latency,
> and write all at the same time that fast? Or even half that fast?
> If so, and if Linux and Intel's IDE controllers lose another 25%
> moving bits around, it'd still be comfortably faster than the tape
> drive. I think I may have something horribly misconfigured.
>
Well, in fairness, thats the hdparm -tT rateings I'm quoting, which is 
generally a 1 or 2 second burst, either from the cache, or from the 
surface itself.  This does NOT take into consideration seek times and 
rotational latency, and probably shouldn't actually be a concern 
within a single file transfer from disk to tape.  And by 'file' I 
mean that whole, completed backup of the individual disklist entry, 
or as we call them, DLE's.

I'm inclined to ramble a bit, so bear with me folks.

I think the point here is that in doing a pure read, with no write 
interleaves in it, from an individual disk (and controller too), to 
an individual tape drive on its own, probably scsi controller, should 
be fast enough to stream even the most currant tape drive on the 
market.  None of these to my knowledge contain any black magic such 
as is used in modern digital video recorders.

The really fast data rates common in video formats such as the 
panasonic dvc-pro, originally a 25Mb/sec format, and then 50Mb/sec, 
and for hdtv is now at 100Mb/sec, have not made it into the data 
storage business, and probably never will.  This is primarily because 
all of these formats aren't "verbatum" formats, but formats that do 
error correction based on hideing the error from the human eye, and 
they are doing it to an already mpeg2'd (or better) video stream.  
And much of that is based on data shuffling and hashing wherein the 
burst of bad data that would cause you to ditch a data tape, goes 
right on by because that one, single, maybe 20 byte wide dropout on 
the tape, is shuffled around until its a one bit error in many pixels 
worth of data scattered out over the whole frame of video.  With data 
replacement techniques based on what the adjacent data is, you never 
see it until the error rate is more than 50 bytes per kilobyte.

Back to here, and now I'm trying to sound like an expert, but I'm 
neither carrying a briefcase, nor am I more than 50 miles from home, 
one wags definition of an expert. :-)

The ideal situation would be to have the backup thats being optionally 
gzipped (bring cpu horsepower, all you can get) and stored in holding 
disk two, would not be on the same disk, controller and cable as 
holding disk one, so that one could be doing a read and transfer to 
the tape, while two is receiving the backup from tar|gzip whatever.

One of the tools amnada uses to prevent disk access contentions is the 
spindle number given optionally in the DLE.  Each physical disk 
should have its own, unique spindle number.  This same number is used 
for all the DLE's that are on that disk.  The next disk gets a 
different number, etc etc.

Now, I know that you can give amanda more than one holding disk 
specification, but what I don't know is how amanda determines which 
holding disk to use for each DLE.

If someone more familiar with the code than I could bail me out here, 
it might become more obvious to this user what he must do to best 
alleviate his problem.

Currently I see it as needing a pair of individual disks on their own 
controller for use as holding disks, but I cannot advise how to make 
amanda do the correct ping-ponging to help end the shoeshining of his 
tape drive.  Of course such a scheme will probably be a bad puppy and 
make a mess on the rug when the DLE's are widely different in sizes 
(and compression useage)

One thing that hasn't been mentioned because its overshadowed by the 
larger picture, is that if the drive is using its internal 
compressor, then amanda has only a SWAG's (maybe + - 30% or more) 
idea of the tapes true capacity.  Amanda counts bytes fed down the 
cable to the drive, after any gzipping has been done if its used.  
Then amanda can know to well within a percent or so of how much data 
she can stuff onto that tape, making maximum use of the available 
resources.  This also exlains why we generally recommend that the 
drives compressor be turned off forever.  The nice thing about the 
way amanda does its compression is that each client can be told to do 
its own compression, thereby offloading that time consuming chore 
from the server.  Since each client can do its own compression, 
adding clients doesn't slow you down since they can all run in 
parallel with minimal or no interaction other than maybe cat5 
collisions.  But those are recovered so quickly in most cases that 
with 100baseT circuits and normal drives, its no big deal.  Just bare 
in mind that data fed straight to that drive off the network because 
of something fubar in the holding disk setup, will really make the 
drive shuffle tape.

I think I finally ran down...  Maybe someplace a light came on?

Funny, I can remember when we had exactly this same shoeshineing 
problem with 120 meg QIC drives running on 25 mhz 386sx boxes with 
7Mb/sec isa busses.  Then the only cure really was a faster box.

Please don't call me a dynosaur though, even if my temper resembles a 
T-Rexx's occasionally. :)

>> If you are not using spindle numbers in your disklist, maybe it
>> would help to prevent thrashing of seeks all over the place
>> because more than one dumper is attacking the drive
>> simultainiously.
>
>I am. It helped a lot.
>
>> This might mean that the tape would stop and do a bit of
>> shoeshining in between files, but a given file should be able to
>> be 'poured down the pipe' non-stop.
>
>That'd be one 'buzz-squinch-buzz' per dump file. That's a
> possibility. I'll look into it. Also an argument against thousands
> of partitions.
>
>> There is also an algorythm string in amanda.conf that adjusts the
>> dumporders a bit, I have mine set to to the largest dump first, so
>> that once its done, there is a good chance the rest of the thing
>> is already in the holding disk and I get the drives maximum speed
>> once it actually starts.
>
>That I didn't know about at all. I'll go find it.
>
>> In this case, it seems he needs two disks assigned as holding
>> disks, with the hope that amanda would write to one, then the
>> other, alternating such that the one being written was not being
>> read by a taper at the same time.
>
>Now that's silly :-) Amanda's creating big, contiguous files
> designed to stream a tape drive. Disk drives are supposed to be
> vastly faster than the tape. From what you said earlier, that's
> where I think I need to focus attention.
>
>There and maybe just a little on reducing SCSI snobbery :-) Very
>informative. Thanks.

-- 
Cheers, Gene
"There are four boxes to be used in defense of liberty:
 soap, ballot, jury, and ammo. Please use in that order."
-Ed Howdershelt (Author)
99.23% setiathome rank, not too shabby for a WV hillbilly
Yahoo.com attorneys please note, additions to this message
by Gene Heskett are:
Copyright 2004 by Maurice Eugene Heskett, all rights reserved.

<Prev in Thread] Current Thread [Next in Thread>