Bacula-users

Re: [Bacula-users] slow performance copy/migrate disk to tape

2008-11-06 11:41:37
Subject: Re: [Bacula-users] slow performance copy/migrate disk to tape
From: Ulrich Leodolter <ulrich.leodolter AT obvsg DOT at>
To: Sebastian Lehmann <slehmann AT proficom-ag DOT de>
Date: Thu, 06 Nov 2008 17:37:14 +0100
On Thu, 2008-11-06 at 16:18 +0100, Sebastian Lehmann wrote:
> Hi,
> 
> Am Do 06.11.2008 15:54 schrieb Ulrich Leodolter
> <ulrich.leodolter AT obvsg DOT at>:
> 
> > Hi,
> > 
> > On Thu, 2008-11-06 at 14:57 +0100, Sebastian Lehmann wrote:
> > > Hi,
> > > 
> > > Am Di 04.11.2008 21:02 schrieb Ulrich Leodolter
> > > <ulrich.leodolter AT obvsg DOT at>:
> > > 
> > > > Hi,
> > > > 
> > > > Problem:  Migrate/Copy jobs from disk pool (DiskBackup)
> > > > to tape (DiskCopy) only get overall speed of 10-20MB/s.
> > > > 
> > > > full backup job size varies from 10-50GB.
> > > > 
> > > > Pool setup is simple, just one pool for full an incremental
> > > > backups
> > > > to disk (automatic recycle works good)
> > > > 
> > > > Pool {
> > > >   Name = DiskBackup
> > > >   Pool Type = Backup
> > > >   Recycle = yes
> > > >   RecyclePool = DiskBackup
> > > >   AutoPrune = yes
> > > >   Volume Retention = 15 days
> > > >   Volume Use Duration = 6 days
> > > >   Maximum Volume Bytes = 4G
> > > >   Label Format = Backup-
> > > >   Next Pool = DiskCopy
> > > > }
> > > > 
> > > > DiskCopy pool goes to LTO4 tape device.
> > > > DiskBackup goes to SATA external raid (6T ext3).
> > > > 
> > > > concurrency for DiskBackup jobs is 15,  jobs are spread
> > > > over DiskBackup Valumes (maybe thats the main problem)
> > > > 
> > > > i can read write/continuous on both devices at about 70MB/s
> > > > (SATA is not fast)
> > > > 
> > > > i tried spooling to local SAS raid, but overall speed is lower
> > > > than direct writing to tape.
> > > > despooling from SAS raid to Tape runs at Tape maximum spped.
> > > > 
> > > > I need some Performance tuning tips, maybe:
> > > > 
> > > 
> > > we use bacula in the same way, but with version 2.4.3, so we only
> > > use
> > > migration instead of copy job.
> > > 
> > > Maybe you have the same problem as we have. Performance looks ok, if
> > > i
> > > use tools other then bacula to test it.
> > > 
> > > We assume, that the seek process on disk based volumes not really
> > > works.
> > > 
> > > 2008-11-06 06:00:03 dss-bacula-dir JobId 39252: The following 1
> > > JobIds
> > > were chosen to be migrated: 38638
> > > 2008-11-06 06:00:03 dss-bacula-dir JobId 39252: Migration using
> > > JobId=38638 Job=TCB-PROJECT_DB-WEB.2008-11-03_22.00.14
> > > 2008-11-06 06:00:03 dss-bacula-dir JobId 39252: Bootstrap records
> > > written to /var/lib/bacula/dss-bacula-dir.restore.28.bsr
> > > 2008-11-06 06:00:03 dss-bacula-dir JobId 39252: 
> > > 2008-11-06 06:00:03 dss-bacula-dir JobId 39252: The job will require
> > > the
> > > following
> > >    Volume(s)                 Storage(s)                SD Device(s)
> > > 
> > >
> > > ===========================================================================
> > > 2008-11-06 06:00:03 dss-bacula-dir JobId 39252:    
> > > 2008-11-06 06:00:03 dss-bacula-dir JobId 39252: 00132 File
> > > FileAutoChanger
> > > 2008-11-06 06:00:03 dss-bacula-dir JobId 39252: 
> > > 2008-11-06 13:25:54 dss-bacula-dir JobId 39252: Start Migration
> > > JobId
> > > 39252, Job=MigrateToTape.2008-11-06_06.00.48
> > > 2008-11-06 13:25:54 dss-bacula-dir JobId 39252: Using Device
> > > "SL500-1-Drive-1"
> > > 2008-11-06 13:25:54 dss-bacula-sd JobId 39252: Ready to read from
> > > volume
> > > "00132" on device "FileStorage" (/stage0/drive0).
> > > 2008-11-06 13:25:54 dss-bacula-sd JobId 39252: Forward spacing
> > > Volume
> > > "00132" to file:block 21:974164913.
> > > 2008-11-06 13:37:49 dss-bacula-sd JobId 39252: End of Volume at file
> > > 45
> > > on device "FileStorage" (/stage0/drive0), Volume "00132"
> > > 2008-11-06 13:37:49 dss-bacula-sd JobId 39252: End of all volumes.
> > > 2008-11-06 13:37:52 dss-bacula-dir JobId 39252: Bacula
> > > dss-bacula-dir
> > > 2.4.2 (26Jul08): 06-Nov-2008 13:37:52
> > >   Build OS:               x86_64-pc-linux-gnu debian 4.0
> > >   Prev Backup JobId:      38638
> > >   New Backup JobId:       39253
> > >   Migration JobId:        39252
> > >   Migration Job:          MigrateToTape.2008-11-06_06.00.48
> > >   Backup Level:           Full
> > >   Client:                 BACULA-DIR
> > >   FileSet:                "LinuxFullSet" 2008-04-23 22:00:00
> > >   Read Pool:              "Disk" (From Job resource)
> > >   Read Storage:           "File" (From Pool resource)
> > > Write Pool: "MONTH-TAPE" (From Job Pool's NextPool resource)
> > > Write Storage: "SL500-1" (From Storage from Pool's NextPool
> > > resource)
> > >   Start time:             06-Nov-2008 13:25:54
> > >   End time:               06-Nov-2008 13:37:52
> > >   Elapsed time:           11 mins 58 secs
> > >   Priority:               10
> > >   SD Files Written:       51
> > >   SD Bytes Written:       37,420,967 (37.42 MB)
> > >   Rate:                   52.1 KB/s
> > >   Volume name(s):         000211
> > >   Volume Session Id:      3
> > >   Volume Session Time:    1225972440
> > >   Last Volume Bytes:      31,316,447,232 (31.31 GB)
> > >   SD Errors:              0
> > >   SD termination status:  OK
> > >   Termination:            Migration OK
> > > 
> > > 2008-11-06 13:37:52 dss-bacula-dir JobId 39252: Begin pruning Jobs.
> > > 2008-11-06 13:37:52 dss-bacula-dir JobId 39252: Pruned 2 Jobs for
> > > client
> > > BACULA-DIR from catalog.
> > > 2008-11-06 13:37:52 dss-bacula-dir JobId 39252: Begin pruning Files.
> > > 2008-11-06 13:37:52 dss-bacula-dir JobId 39252: No Files found to
> > > prune.
> > > 2008-11-06 13:37:52 dss-bacula-dir JobId 39252: End auto prune.
> > > 
> > > As you can see, bacula needs 12 minutes to migrate 37MB. Thats
> > > becuase
> > > is does not "jump" to the correct file position in the disk volume.
> > > It
> > > reads the entire volume from the begining to the end if the job was
> > > the
> > > last on it. You can see this with stat sd. Bacula reads the volume
> > > but
> > > does not write to the tape the most time.
> > 
> > stat sd ???
> > 
> At the bconsole "stat sd" then select your storage and device.
> ->
> Connecting to Director 10.92.79.168:9101
> 1000 OK: dss-bacula-dir Version: 2.4.2 (26 July 2008)
> Enter a period to cancel a command.
> *stat sd
> The defined Storage resources are:
>      1: File
>      2: SL500-1
> Select Storage resource (1-2): 1
> Connecting to Storage daemon File at dss-bacula:9103
> 
> dss-bacula-sd Version: 2.4.2 (26 July 2008) x86_64-pc-linux-gnu debian
> 4.0
> Daemon started 06-Nov-08 14:28, 10 Jobs run since started.
> Heap: heap=7,319,552 smbytes=6,878,613 max_bytes=6,938,811 bufs=366
>  max_bufs=545
> Sizes: boffset_t=8 size_t=8 int32_t=4 int64_t=8
> <snip>
> Device parameters:
> Archive name: /dev/nst0 Device name: SL500-1-Drive-1
> File=35 block=4725
> Min block=0 Max block=0
>     Total Bytes=28,337,799,168 Blocks=439,263 Bytes/block=64,512
>     Positioned at File=35 Block=4,725
> 
> Her you can see the total bytes bacula has read from the current volume.
> 
> > "Forward spacing Volume" is done by lseek() on File volumes,
> > dont think bacula does positioning by reading the disk volume.
> > You can time my little lseek program (see attachment)
> > on large disk volumes using different offset values.
> > For me large offsets (>1GB) make no difference in timing.
> > 
> 
> Ok, forgot to tell that we use a virtual changer with disk based
> volumes, so Media type is "File".
> 
> > > 
> > > We assume its a configuration problem but do not find any.
> > 
> > Maybe also check your tape Device in bacula-sd.conf
> > 
> > Device {
> >  ...
> >  Always Open = yes; 
> >  ,,,
> > }
> > 
> > If you don't have "Always Open = yes", tape devices are spooled back
> > and
> > forth, because bacula does open/close at least for each migration job.
> >
> As the documentation said this option will be ignored for media type
> file.
> 
> Ok, what i see is, that lseek at the begining of the job seems to work:
> 
> > dss-bacula-sd: askdir.c:368-0 >dird CatReq
> > Job=MigrateToTape.2008-11-06_06.00.27 UpdateMedia VolName=000212
> > VolJobs=13 VolFiles=38 VolBlocks=459721 VolBytes=29657585664
> > VolMounts=3 VolErrors=0 VolWrites=6764239 MaxVolBytes=0
> > EndTime=1225984266 VolStatus=Append Slot=13 relabel=0 InChanger=1
> > VolReadTime=0 VolWriteTime=2478623570 VolFirstWritten=0 VolParts=0
> dss-bacula-sd: askdir.c:182-0 <dird 1000 OK VolName=000212 VolJobs=13
> VolFiles=38 VolBlocks=459721 VolBytes=29657585664 VolMounts=3
> VolErrors=0 VolWrites=6764239 MaxVolBytes=0 VolCapacityBytes=0
> VolStatus=Append Slot=13 MaxVolJobs=0 MaxVolFiles=0 InChanger=1
> VolReadTime=0 VolWriteTime=2478623570 EndFile=37 EndBlock=4956
> VolParts=0 LabelType=0 MediaId=35
> dss-bacula-sd: askdir.c:205-0 do_reqest_vol_info return true slot=13
> Volume=000212
> dss-bacula-sd: reserve.c:490-0 Dec reserve=0 dev="SL500-1-Drive-1"
> (/dev/nst0)
> dss-bacula-sd: dev.c:1580-0 ===== lseek to -1686243990
> 
> Bacula starts with high transfer rates, then only reads the source
> volume up to his end.

but where is the unnecessary read in source code???
> 
> courious, i don't understand this.

me too :-)


Hi

i also noticed that bacula reads much more as needed.
sar output below is recorded during Copy disk to tape
while spooling to disk is enabled.

i would expect    bread/s ~ 2 * bwrtn/s 
but as u see its up to 10 times


12:00:01 AM       tps      rtps      wtps   bread/s   bwrtn/s
....
09:30:01 AM   2078.93    224.37   1854.56  40730.89  29267.22
09:40:02 AM   4043.36   1540.13   2503.23 275342.38  38271.24
09:50:01 AM   3595.00   1603.98   1991.02 306152.36  30414.74
10:00:01 AM   2383.88   1508.10    875.78 282534.59  13318.83
10:10:01 AM   5514.69   1502.04   4012.65 247868.19  61412.10
10:20:02 AM   6106.74   1577.26   4529.48 263784.18  69347.99
10:30:02 AM   4080.24   1637.17   2443.06 301743.85  37310.45
10:40:01 AM   1547.23    363.86   1183.37  57264.02  18028.25


Ulrich

> 
> Sebastian
> 

-- 
Ulrich Leodolter <ulrich.leodolter AT obvsg DOT at>
OBVSG


-------------------------------------------------------------------------
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK & win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100&url=/
_______________________________________________
Bacula-users mailing list
Bacula-users AT lists.sourceforge DOT net
https://lists.sourceforge.net/lists/listinfo/bacula-users