Bacula-users

Re: [Bacula-users] Quantum Scalar i500 slow write speed

2010-08-09 04:56:02
Subject: Re: [Bacula-users] Quantum Scalar i500 slow write speed
From: Christian Gaul <christian.gaul AT otop DOT de>
To: bacula-users AT lists.sourceforge DOT net
Date: Mon, 09 Aug 2010 10:53:31 +0200
Am 09.08.2010 08:55, schrieb Henry Yen:
> On Fri, Aug 06, 2010 at 10:48:10AM +0200, Christian Gaul wrote:
>   
>> Even when catting to /dev/dsp i use /dev/urandom.. Blocking on
>> /dev/random happens much too quickly.. and when do you really need that
>> much randomness.
>>     
> I get about 40 bytes on a small server before blocking.
>
>   
>>> Reason 1: the example I gave yields a file size for "tempchunk" of 512MB,
>>> not 1MB, as given in your counter-example.  I agree that (at least 
>>> now-a-days)
>>> catting 1MB chunks into a 6MB chunk is likely (although not assured)
>>> to lead to greatly reduced size during later compression, but I disagree
>>> that catting 512MB chunks into a 3GB chunk is likely to be compressible
>>> by any general-purpose compressor.
>>>       
>> Which is what i meant with "way bigger than the library size of the
>> algorithm".  Mostly my "Information" was pitfalls to look out for when
>> testing the speed of your equipment, if you went ahead and cat-ted 3000
>> x 1MB, i believe the hardware compression would make something highly
>> compressed out of it.
>> My guess is it would work for most chunks around half as large as the
>> buffer size of the drive (totally guessing).
>>     
> I think that the tape drive manufacturers don't make large buffer/CPU
> capacity in their drives yet.  I finally did a test on an SDLT2 (160GB)
> drive; admittedly, it's fairly old as tape drives go, but tape technology
> appears to be rather a bit slower than disk technology, at least as far
> as raw capacity is concerned.  I created two files from /dev/urandom;
> one was 1GB, the other a mere 10K.  I then created two identically-sized
> files corresponding to each of these two chunks (4 of the first and approx.
> 400k of the second).  Writing them to the SDLT2 drive using 60k blocksize,
> with compression on, yielded uncanny results: the writable capacity before
> hitting EOT was within 0.01%, and the elapsed time was within 0.02%.
> I see there's a reason to almost completely ignore the so-called "compressed
> capacity" claims by tape drive manufacturers...
>   

Wow.. i didn't expect that. For fun i tried LZMA with the library size
(dictionary size as it stands) increased to about a GB, as was expected
it gave a nice 5:1 compression ratio of your original "dd" example... of
course hardware doesn't ship with enough buffer, much less expends the
time to buffer a GB just to check if two repeating 500MB blocks are in
the buffer, but 10KB is still surprisingly small.

>   
>>>> Also, afaik tar
>>>> has an "optimization" when outputting to /dev/null, better output to
>>>> /dev/zero instead if using tar to check possible speeds.
>>>>     
>>>>         
>>> (Yes, although there is considerable disagreement over this (mis)feature;
>>>  my take is that the consensus is "yes, probably bad, definitely
>>>  under-documented (the behavior does match the "info" document), but
>>>  too late to change now".)
>>>   
>>>       
>> Pretty much. But new users starting with Bacula might not know this. If
>> the user followed the advice of a previous post of "tar -cf /dev/null
>> <path/to/directory>" he would most likely be surprised.
>>     
> You know what's interesting about that "optimization"?  The arguments
> for not "fixing" it revolve around another backup system -- amanda --
> depending on the behavior of this "feature"...
>   

Backwards compatibility hell.

>   
>> Your analysis of current hardware compression (and for large enough
>> chunk sizes software compression also) is most likely correct, i was
>> merely pointing out the "obvious" problems that can lead a new user to
>> misinterpret his testing.
>>
>> If a novice user tried creating a "random" file from /dev/random he
>> would most likely not wait multiple days to create 500MB chunks, and
>>     
> Multiple days?  Surely not!  I suspect it would take multiple years!
>
>   
>> creating a 1MB random chunk from /dev/random would lead to drastically
>> wrong speed estimate of the drive.
>>     
> As the tests on my older tape drive shows, even 10K blocks aren't
> compressible.  I am confident that even the very latest technology
> has not come even vaguely close to an improvement of 2 orders of magnitude.
>
>   
>> Getting back on track.
>>
>> The actual problem of the original post was the user expecting around
>> 50-100MB/s throughput from his drives.
>>
>> I believe the statistics of the job show the start and end times and the
>> total size written, the "Rate" is calculated by "SD Bytes Written / (End
>> time - start time)".
>>
>> There are possibly a lot of things going on in that time frame which do
>> not utilize the drive at all, spooling the data at the same speed the
>> tape drive is capable of would, for example, pretty much double the
>> running time of the job and would then show only half the speed in the
>> statistics. Also, inserting the attributes into the catalog can take
>> quite a while which will also up the run time of the job, thereby
>> "decreasing" the speed of the drive.
>>     
> These observations are very illuminating.  I think network/NFS was
> also mentioned as yet another possible slowdown.
>   

Yes, but i presume tar-ing from an NFS mount and spooling to bacula from
an NFS mount should be about the same speed. Unless the compression /
checksumm each do a separate read of the data, making the FS fetch the
data multiple times over the network.. but i hope nobody does that (i
didn't check though).

>   
>> If the user wants to test the maximum speed he can expect, a "tar -cf
>> /dev/zero /his/netapp/mountpoint" would allow him to guesstimate. I
>>     
> I think /dev/zero in this context is confusing, although obviously
> much less than the "cf /dev/null" "feature".  I wonder if the possible
> behavior of "seek" on "/dev/zero" might confuse tar.  On the other
> hand, "tar -cf - /mnt/pnt > /dev/null", although both obvious and functional,
> would rob tar of the ability to seek on the output file completely.
> I suppose one would have to look at the source for whatever version of "tar"
> was actually in use, and see what, if any, seek operations are even done
> on the output.
>
>   


$ du -hs exclude/
153G    exclude/

$ time tar -cf /dev/null exclude/
real    0m5.040s
user    0m2.250s
sys     0m2.790s

$ time tar -cf - exclude/ > /dev/null
real    0m5.082s
user    0m2.150s
sys     0m2.930s

but "tar -cf - exclude/ | cat > /dev/null" does work, tar "optimizes" on
STDOUT i believe. Using "-cf -" uses STDOUT for output, > redirects that
to /dev/null.. so you are back where you started.

Using /dev/zero, or actually any device which ignores input, is what i
"learned to do" some time ago.. its also easy to remember :-)

Not sure if /dev/zero has any disadvantages that i havn't noticed yet.

>> personally use spooling as much as i can, if the user did likewise and
>> the spooling speed from the netapp were ~80MB/s and he was using LTO3
>> drives, he could expect maximum speeds in the job statistics around
>> 35-40MB/s if all other components can keep up.
>>
>> Since he reports 22mb (i am hoping he means 22MB/s), and since he thinks
>> it is Baculas fault i guess he means the Rate in the job statistics. I
>> believe with spooling, maybe software compression and MD5/SHA checksumms
>> that speed is a little low, but not terrible. Depending on the FD doing
>> the checksumms and compression, he might simply be CPU bound.
>>     
> Agreed.
>
>   


-- 
Christian Gaul
otop AG
D-55116 Mainz
Rheinstraße 105-107
Fon: 06131.5763.330
Fax: 06131.5763.500
E-Mail: christian.gaul AT otop DOT de
Internet: www.otop.de

Vorsitzender des Aufsichtsrats: Christof Glasmacher
Vorstand: Dirk Flug
Registergericht: Amtsgericht Mainz
Handelsregister: HRB 7647

Bundesweit Fachberater in allen Hilfsmittelfachbereichen gesucht! Bewerbung und
Infos unter www.otop.de

Hinweis zum Datenschutz:
Diese E-Mail ist ausschließlich für den in der Adresse genannten Empfänger
bestimmt, da sie vertrauliche firmeninterne Informationen enthalten kann. Soweit
eine Weitergabe oder Verteilung nicht ausschließlich zu internen Zwecken des
Empfängers geschieht, ist jede unzulässige Veröffentlichung, Verwendung,
Verbreitung, Weiterleitung und das Kopieren dieser E-Mail und ihrer verknüpften
Anhänge streng untersagt. Falls Sie nicht der beabsichtigte Empfänger dieser
Nachricht sind, löschen Sie diese E-Mail bitte und informieren unverzüglich den
Absender.


------------------------------------------------------------------------------
This SF.net email is sponsored by 

Make an app they can't live without
Enter the BlackBerry Developer Challenge
http://p.sf.net/sfu/RIM-dev2dev 
_______________________________________________
Bacula-users mailing list
Bacula-users AT lists.sourceforge DOT net
https://lists.sourceforge.net/lists/listinfo/bacula-users