On Wednesday 02 June 2004 18:21, Glenn English wrote:
>On Wed, 2004-06-02 at 14:12, Gene Heskett wrote:
>> Any current ide drive can do 30+ Mb/sec if left
>> alone by other tasks, often quite a ways on the + side.
>
>Is that just a burst out of the cache, or can they read
> dis-contiguous files, seek around to other files, wait for latency,
> and write all at the same time that fast? Or even half that fast?
> If so, and if Linux and Intel's IDE controllers lose another 25%
> moving bits around, it'd still be comfortably faster than the tape
> drive. I think I may have something horribly misconfigured.
>
Well, in fairness, thats the hdparm -tT rateings I'm quoting, which is
generally a 1 or 2 second burst, either from the cache, or from the
surface itself. This does NOT take into consideration seek times and
rotational latency, and probably shouldn't actually be a concern
within a single file transfer from disk to tape. And by 'file' I
mean that whole, completed backup of the individual disklist entry,
or as we call them, DLE's.
I'm inclined to ramble a bit, so bear with me folks.
I think the point here is that in doing a pure read, with no write
interleaves in it, from an individual disk (and controller too), to
an individual tape drive on its own, probably scsi controller, should
be fast enough to stream even the most currant tape drive on the
market. None of these to my knowledge contain any black magic such
as is used in modern digital video recorders.
The really fast data rates common in video formats such as the
panasonic dvc-pro, originally a 25Mb/sec format, and then 50Mb/sec,
and for hdtv is now at 100Mb/sec, have not made it into the data
storage business, and probably never will. This is primarily because
all of these formats aren't "verbatum" formats, but formats that do
error correction based on hideing the error from the human eye, and
they are doing it to an already mpeg2'd (or better) video stream.
And much of that is based on data shuffling and hashing wherein the
burst of bad data that would cause you to ditch a data tape, goes
right on by because that one, single, maybe 20 byte wide dropout on
the tape, is shuffled around until its a one bit error in many pixels
worth of data scattered out over the whole frame of video. With data
replacement techniques based on what the adjacent data is, you never
see it until the error rate is more than 50 bytes per kilobyte.
Back to here, and now I'm trying to sound like an expert, but I'm
neither carrying a briefcase, nor am I more than 50 miles from home,
one wags definition of an expert. :-)
The ideal situation would be to have the backup thats being optionally
gzipped (bring cpu horsepower, all you can get) and stored in holding
disk two, would not be on the same disk, controller and cable as
holding disk one, so that one could be doing a read and transfer to
the tape, while two is receiving the backup from tar|gzip whatever.
One of the tools amnada uses to prevent disk access contentions is the
spindle number given optionally in the DLE. Each physical disk
should have its own, unique spindle number. This same number is used
for all the DLE's that are on that disk. The next disk gets a
different number, etc etc.
Now, I know that you can give amanda more than one holding disk
specification, but what I don't know is how amanda determines which
holding disk to use for each DLE.
If someone more familiar with the code than I could bail me out here,
it might become more obvious to this user what he must do to best
alleviate his problem.
Currently I see it as needing a pair of individual disks on their own
controller for use as holding disks, but I cannot advise how to make
amanda do the correct ping-ponging to help end the shoeshining of his
tape drive. Of course such a scheme will probably be a bad puppy and
make a mess on the rug when the DLE's are widely different in sizes
(and compression useage)
One thing that hasn't been mentioned because its overshadowed by the
larger picture, is that if the drive is using its internal
compressor, then amanda has only a SWAG's (maybe + - 30% or more)
idea of the tapes true capacity. Amanda counts bytes fed down the
cable to the drive, after any gzipping has been done if its used.
Then amanda can know to well within a percent or so of how much data
she can stuff onto that tape, making maximum use of the available
resources. This also exlains why we generally recommend that the
drives compressor be turned off forever. The nice thing about the
way amanda does its compression is that each client can be told to do
its own compression, thereby offloading that time consuming chore
from the server. Since each client can do its own compression,
adding clients doesn't slow you down since they can all run in
parallel with minimal or no interaction other than maybe cat5
collisions. But those are recovered so quickly in most cases that
with 100baseT circuits and normal drives, its no big deal. Just bare
in mind that data fed straight to that drive off the network because
of something fubar in the holding disk setup, will really make the
drive shuffle tape.
I think I finally ran down... Maybe someplace a light came on?
Funny, I can remember when we had exactly this same shoeshineing
problem with 120 meg QIC drives running on 25 mhz 386sx boxes with
7Mb/sec isa busses. Then the only cure really was a faster box.
Please don't call me a dynosaur though, even if my temper resembles a
T-Rexx's occasionally. :)
>> If you are not using spindle numbers in your disklist, maybe it
>> would help to prevent thrashing of seeks all over the place
>> because more than one dumper is attacking the drive
>> simultainiously.
>
>I am. It helped a lot.
>
>> This might mean that the tape would stop and do a bit of
>> shoeshining in between files, but a given file should be able to
>> be 'poured down the pipe' non-stop.
>
>That'd be one 'buzz-squinch-buzz' per dump file. That's a
> possibility. I'll look into it. Also an argument against thousands
> of partitions.
>
>> There is also an algorythm string in amanda.conf that adjusts the
>> dumporders a bit, I have mine set to to the largest dump first, so
>> that once its done, there is a good chance the rest of the thing
>> is already in the holding disk and I get the drives maximum speed
>> once it actually starts.
>
>That I didn't know about at all. I'll go find it.
>
>> In this case, it seems he needs two disks assigned as holding
>> disks, with the hope that amanda would write to one, then the
>> other, alternating such that the one being written was not being
>> read by a taper at the same time.
>
>Now that's silly :-) Amanda's creating big, contiguous files
> designed to stream a tape drive. Disk drives are supposed to be
> vastly faster than the tape. From what you said earlier, that's
> where I think I need to focus attention.
>
>There and maybe just a little on reducing SCSI snobbery :-) Very
>informative. Thanks.
--
Cheers, Gene
"There are four boxes to be used in defense of liberty:
soap, ballot, jury, and ammo. Please use in that order."
-Ed Howdershelt (Author)
99.23% setiathome rank, not too shabby for a WV hillbilly
Yahoo.com attorneys please note, additions to this message
by Gene Heskett are:
Copyright 2004 by Maurice Eugene Heskett, all rights reserved.
|