Re: Tape Spanning - 105 hour backup of 1.3 Tb

Sean Walmsley wrote:
> This post documents our experience with the version 2.5.0p2 tape spanning
> option in the hopes that someone has suggestions for getting it working
> acceptably. Even a "me too" would be useful information.
I'm not familiar with tape spanning code in Amanda, so all I can suggest
is a performance tweak below, based on your observations.
> 
> System info:
> ============
> Solaris 8 (sparc) 64-bit
> SunFire V880 (4-way) with 8Gb of memory
> Amanda 2.5.0p2 compiled as a 32-bit executable using gcc 3.4.5
> SDLT 600 tape robot (~400Gb per tape x 13 tapes)
> 4 individual 36Gb holding disks
> most DLE's ~ 36Gb to ~72Gb, total ~300Gb
> 2 large DLE's of ~500Gb each
> 
>   NOTE: the 500Gb volumes are larger than one tape and also larger
>   than the total size of our holding disks
> 
> Goal:
> =====
> Our normal daily backup works fine with the above configuration, but excludes
> the 500Gb volumes which we backup via other means.  With the advent of the
> 2.5.0 tape spanning feature, we've been attempting to implement a second 
> weekly
> Amanda configuration to backup *EVERYTHING* onto one set of tapes for off-site
> disaster recovery purposes.
> 
> Conclusions:
> ============
> 1) Tape splitting does seem to (sort of) work with the default tiny
>    memory buffer size, but it doesn't seem possible to speed it up via
>    the use a more efficient larger buffer size (disk or memory).
> 2) The lack of a decent buffer makes backups as much as 10 times 
>    slower than a "normal" amanda backup. Our experience was 
>    1.3 Tb backed up in 105 hours (4.4 days!). 
> 3) Using the default buffer size results in many thousands of split files
>    on each tape rather than the preferred ~10 files.
> 4) Using a disk buffer of > 2Gb does not work as this seems to cause
>    memmap problems (I assume this is a 32-bit limitation). This means
>    that using a splitsize of ~10% of tape size as recommended is 
>    impossible.
> 5) Using a disk buffer of < 2Gb works for the first DLE, but after 
>    that amanda always falls back to using a memory buffer of size 
>    fallback_splitsize. 
> 6) Using a reasonable fallback_splitsize (e.g. 1Gb) causes amanda to
>    run out of memory after a few DLEs. 
> 7) Using the default fallback_splitsize of 10Mb does seem to work 
>    but it is very slow
> 8) It's possible to compile amanda as a 64-bit executable, but this
>    caused other problems.
> 9) The diskbuffer code seems to waste a lot of time recreating and
>    zeroing out the buffer for each DLE.

Have you tried using tmpfs for your diskbuffer?  If you are mounting
/tmp as tmpfs, you can create your buffers there, or make a new mount
of type tmpfs and use that.  Since tmpfs uses the VM subsystem, it
should be much faster than physical disk (unless you are short on
memory and it actually needs to always use swap).

Frank

> 
> In other words, our experience is that tape spanning in 2.5.0p2
> isn't currently workable for our 1.3 Tb system.
> 
> If anyone has sucessfully tape spanned a 1+ Tb backup at an acceptable
> speed with a reasonable tape split size, please let me know.
> 
> 
> Blow-by-blow details:
> =====================
> Since our largest volumes far exceed our holding disk space (and always will),
> we configured the new run with:
> 
> dumpcycle 0           i.e. level 0 every time
> runspercycle 0
> runtapes 13           i.e. use all tapes if needed
> 
> and used a dumptype containing:
> 
> estimate server               i.e. we're dumping everything so minimize plan 
> time
> record no             i.e. don't mess up our daily set
> strategy noinc                i.e. don't attempt incrementals
> holdingdisk no                i.e. don't try to use a holding disk
> tape_splitsize 30 Gb  i.e. write tape in 30Gb splits
> split_diskbuffer "/a/volume/with/enough/space"
> fallback_splitsize 2Gb        i.e. don't fall back to the tiny 10Mb default
> 
> TRY #1:
> ------
> With these settings, the run core dumped within a few minutes. Looking at the
> code, we noted that:
> 
>   - the tape split code uses memmap to map the diskbuffer
>   - our amanda is compiled 32-bit which implies a maximum address space
>     of ~4Gb
> 
> TRY #2:
> ------
> Based on this, we changed the following options to fit within the 32-bit
> limit:
> 
> tape_splitsize 2 Gb
> fallback_splitsize 1Gb
> 
> This dumped the first DLE in 2Gb chunks (hooray!) as the following
> amdump records indicate:
> 
>   driver: dumping megawatt:/vol02 directly to tape
>   driver: send-cmd time 2.348 to taper: PORT-WRITE 00-00001 [cont'd]
>     megawatt fffffeff9ffeffff07 /vol02 0 20060703 2097152 /amdump4 1048576
>   taper: r: buffering 2097152kb split chunks in mmapped file [cont'd]
>     /amdump4/splitdump_buffer_CQaaeV
> 
> Note that the diskbuffer size (2097152 kb) and fallback size (1048576 kb) are
> listed in the PORT-WRITE record (this is relevant to the 64-bit case below).
> 
> After the tape indicated that it sucessfully completed taping the first
> DLE (/vol02), the next DLE produced the following records in amdump:
> 
>   driver: error time 1600.885 serial gen mismatch
>   driver: dumping megawatt:/vol06 directly to tape
>   driver: send-cmd time 1601.080 to taper: PORT-WRITE 00-00002 [cont'd]
>      megawatt fffffeff9ffeffff07 /vol06 0 20060703 2097152 /amdump4 1048576
> 
> And in the log file:
> 
>   INFO taper mmap failed (Not enough space): using fallback split [cont'd]
>     size of 1048576kb to buffer megawatt:/vol06.0 in-memory
> 
> The second DLE (/vol06) *WAS* dumped to tape in 1Gb chunks as per the
> log message. Similarly, DLE's 3-5 were backed up in 1Gb chunks.
> 
> DLE 6 also fell back using a 1Gb memory buffer, but then failed with the
> following message in the log file:
> 
>   FATAL taper taper.c@511: memory allocation failed [cont'd]
>     (1073741824 bytes requested)
> 
> TRY #3:
> -------
> We were a bit puzzled by the memory allocation failure above since the
> machine has lots of unused memory, but decided to go ahead with another
> test using:
> 
> tape_splitsize 2 Gb
> fallback_splitsize 10Mb (i.e. the default value)
> 
> Again, the first volume dumped using the disk buffer of 2Gb, but subsequent
> DLE's were split up into 10Mb chunks (about 150 thousand in all) over 4
> tapes. The entire backup took 105 hours or about 4.4 days to complete, which
> is about 10 times slower (in terms of Gb/hr not total time) than our normal
> backups (we normally get ~100Gb/hr from our SDLT 600, which is fairly close
> to the manufacturer's advertised transfer rate).
> 
> Note that this (poor) performance is similar that that described by
> Paul Graf in his post of 29Jun2006 titled "77 hour backup of 850Gb".
> 
> TRY #4:
> -------
> Given the completely unacceptable performance using a 10Mb splitsize, we
> decided to try compiling amanda 64-bit based on the assumptions that:
> 
>   - 32-bit limitations might have been the cause of both large
>     disk and memory buffers failing
>   - lack of a proper buffer was the root cause of the performance
>     issue
> 
> To compile 64-bit, we used the following gcc flags:
> 
> setenv CFLAGS "-m64 -mcpu=ultrasparc3"
> 
> along with buffer sizes as follows:
> 
> tape_splitsize 2 Gb
> fallback_splitsize 1Gb
> 
> Although the 64-bit code compiled and ran, it seemed to get confused over the
> buffer sizes. For example, the amdump PORT-WRITE record was:
> 
>   driver: send-cmd time 3.512 to taper: PORT-WRITE 00-00001 [cont'd]
>     megawatt fffffeff9ffeffff07 /vol07 0 20060713 9007199254740992 [cont'd]
>     /holdvol01/MEGABAK2_DISKBUFFER 4503599627380736
> 
> This *seems* to indicate that the requested diskbuffer size is 9 exabytes
> (9 * 10^18 bytes) and that the requested fallback size is 4 exabytes despite
> the fact that we specified 2G/1G. I'm guessing that these numbers are because
> some portions of the code aren't 64-bit clean.
> 
> Since these buffer sizes were not available, the backup didn't seem to be
> splitting any of the DLE's so we terminated it.
> 
> MISC:
> -----
> Following these attempts, we had a look at the disk buffering code
> and noted that:
> 
>   - the disk buffer seems to be re-created for every DLE
>   - after creation, the entire buffer is zeroed out by writing 1024 byte
>     blocks of zeroes to it
> 
> Assuming that:
> 
>   - a single SCSI disk can maintain a througput of ~50Mbyte/s
>   - a reasonable split buffer size would be ~50Gb
> 
> then this means that just zeroing the diskbuffer would take about half an
> hour per DLE. In our case (~40DLEs), this means that over 20 hours would
> be spent just zeroing the buffer file before even doing any real work!
> 
> 
> =================================================================
> Sean Walmsley                 sean AT fpp.nuclearsafetysolutions DOT com
> Nuclear Safety Solutions Ltd.  416-592-4608 (V)  416-592-5528 (F)
> 700 University Ave M/S H04 J19, Toronto, Ontario, M5G 1X6, CANADA
> 


-- 
Frank Smith                                      fsmith AT hoovers DOT com
Sr. Systems Administrator                       Voice: 512-374-4673
Hoover's Online                                   Fax: 512-374-4501