Bacula-users

Re: [Bacula-users] slow copy job from disk to LTO-6 (speed issue)

2017-03-26 04:44:17
Subject: Re: [Bacula-users] slow copy job from disk to LTO-6 (speed issue)
From: Laurent ALONSO <laualo AT gmail DOT com>
To: Caribe Schreiber <caribe AT auctionharmony DOT com>
Date: Sun, 26 Mar 2017 10:43:03 +0200
Two 400GB ssd drive in raid0 are in my storage server mounted in /data_spooling
My librairy is a SCALAR i500 with 2 LTO7- Fiber channel drives.

speed lan on servers : 10Gb/s
Fiber channel : 8Gb/s
speed read/write on raid 0 ssd : about 8 Gb/s

the average bandwith only depend on speed to read data on the client and on the time spent to despool data on tape.
In my case : about 20 min / 344GB to despool.
Your LTO drive should receive data for at least ten minutes.

Best regards

------------------------------------------------------------------
 

My bacula-sd .conf

Storage {                             # definition of myself
Name = pjg095srsav002-sd
SDPort = 9103 # Director's port
WorkingDirectory = "/data_spooling"
Pid Directory = "/var/run"
Maximum Concurrent Jobs = 2
HeartBeat interval = 1 minute

}

#
# List Directors who are permitted to contact Storage daemon
#
Director {
Name = PJG095SRBacula-dir
Password = ****************
}

Autochanger {
Name = scalar-i500
Device = Drive-1
Device = Drive-2
Changer Command = "/etc/bacula/mtx-changer %c %o %S %a %d"
Changer Device = /dev/tape/by-id/scsi-1ADIC_A0D0116611_LLA
}

Device {
Name = Drive-1
Drive Index = 0
Media Type = LTO-7
Archive Device = /dev/tape/by-id/scsi-3500308c3a347d000-nst
AutomaticMount = yes; # when device opened, read it
AlwaysOpen = yes;
RemovableMedia = yes;
RandomAccess = no;
Maximum Concurrent Jobs = 1
Maximum File Size = 32GB
Maximum block size = 393216
spool directory = /data_spooling
maximum spool size = 688G
maximum job spool size = 344G
Changer Command = "/etc/bacula/mtx-changer %c %o %S %a %d"
Changer Device = /dev/tape/by-id/scsi-1ADIC_A0D0116611_LLA
AutoChanger = yes
# Enable the Alert command only if you have the mtx package loaded
Alert Command = "sh -c 'tapeinfo -f %c |grep TapeAlert|cat'"
# If you have smartctl, enable this, it has more info than tapeinfo
# Alert Command = "sh -c 'smartctl -H -l error %c'"
}

Device {
Name = Drive-2
Drive Index = 1
Media Type = LTO-7
Archive Device = /dev/tape/by-id/scsi-3500308c3a347d004-nst
AutomaticMount = yes; # when device opened, read it
AlwaysOpen = yes;
RemovableMedia = yes;
RandomAccess = no;
Maximum Concurrent Jobs = 1
Maximum File Size = 32GB
Maximum block size = 393216
spool directory = /data_spooling
maximum spool size = 688G
maximum job spool size = 344G
Changer Command = "/etc/bacula/mtx-changer %c %o %S %a %d"
Changer Device = /dev/tape/by-id/scsi-1ADIC_A0D0116611_LLA
AutoChanger = yes
# Enable the Alert command only if you have the mtx package loaded
Alert Command = "sh -c 'tapeinfo -f %c |grep TapeAlert|cat'"
# If you have smartctl, enable this, it has more info than tapeinfo
}
Messages {
Name = Standard
director = PJG095SRBacula-dir = all
}

-------------------------------------------A job log -----------------------------------------

PJG095SRBacula-dir No prior Full backup Job record found. No prior or suitable Full backup found in catalog. Doing FULL backup. PJG095SRBacula-dir Start Backup JobId 5417, Job=B-ALL_partage-EXCEPT_PJGN.2017-03-20_17.41.40_31 Using Volume "000001" from 'Scratch' pool. Using Device "Drive-1" to write. pjg095srsav002-sd 3307 Issuing autochanger "unload slot 34, drive 0" command for vol 000001. pjg095srsav002-sd 3304 Issuing autochanger "load slot 1, drive 0" command for vol 000001. pjg095srsav002-sd 3305 Autochanger "load slot 1, drive 0", status is OK for vol 000001. pjg095srsav002-sd Wrote label to prelabeled Volume "000001" on tape device "Drive-1" (/dev/tape/by-id/scsi-3500308c3a347d000-nst) Spooling data ... pjg095srsav002-sd User specified Device spool size reached: DevSpoolSize=375,809,964,486 MaxDevSpoolSize=375,809,638,400 Writing spooled data to Volume. Despooling 375,809,964,486 bytes ... pjg095srsav002-sd Despooling elapsed time = 00:20:44, Transfer rate = 302.0 M Bytes/second pjg095srsav002-sd Spooling data again ... pjg095srsav002-sd User specified Device spool size reached: DevSpoolSize=375,809,964,644 MaxDevSpoolSize=375,809,638,400 Writing spooled data to Volume. Despooling 375,809,964,644 bytes ... pjg095srsav002-sd Despooling elapsed time = 00:21:52, Transfer rate = 286.4 M Bytes/second pjg095srsav002-sd Spooling data again ... pjg095srsav002-sd User specified Device spool size reached: DevSpoolSize=375,809,964,295 MaxDevSpoolSize=375,809,638,400 Writing spooled data to Volume. Despooling 375,809,964,295 bytes ... pjg095srsav002-sd Despooling elapsed time = 00:21:11, Transfer rate = 295.6 M Bytes/second pjg095srsav002-sd Spooling data again ... pjg095srsav002-sd User specified Device spool size reached: DevSpoolSize=375,809,964,285 MaxDevSpoolSize=375,809,638,400 Writing spooled data to Volume. Despooling 375,809,964,285 bytes ... pjg095srsav002-sd Despooling elapsed time = 00:18:00, Transfer rate = 347.9 M Bytes/second pjg095srsav002-sd Spooling data again ... pjg095srsav002-sd User specified Device spool size reached: DevSpoolSize=375,809,964,450 MaxDevSpoolSize=375,809,638,400 Writing spooled data to Volume. Despooling 375,809,964,450 bytes ... pjg095srsav002-sd Despooling elapsed time = 00:14:59, Transfer rate = 418.0 M Bytes/second pjg095srsav002-sd Spooling data again ... pjg095srsav002-sd Writing spooled data to Volume. Despooling 375,809,964,442 bytes ... User specified Device spool size reached: DevSpoolSize=375,809,964,442 MaxDevSpoolSize=375,809,638,400 pjg095srsav002-sd Despooling elapsed time = 00:17:54, Transfer rate = 349.9 M Bytes/second pjg095srsav002-sd Spooling data again ... pjg095srsav002-sd User specified Device spool size reached: DevSpoolSize=375,809,964,110 MaxDevSpoolSize=375,809,638,400 Writing spooled data to Volume. Despooling 375,809,964,110 bytes ... pjg095srsav002-sd Despooling elapsed time = 00:18:43, Transfer rate = 334.6 M Bytes/second pjg095srsav002-sd Spooling data again ... pjg095srsav002-sd User specified Device spool size reached: DevSpoolSize=375,809,964,665 MaxDevSpoolSize=375,809,638,400 Writing spooled data to Volume. Despooling 375,809,964,665 bytes ... pjg095srsav002-sd Despooling elapsed time = 00:16:50, Transfer rate = 372.0 M Bytes/second pjg095srsav002-sd Spooling data again ... pjg095srsav002-sd User specified Device spool size reached: DevSpoolSize=375,809,964,730 MaxDevSpoolSize=375,809,638,400 Writing spooled data to Volume. Despooling 375,809,964,730 bytes ... pjg095srsav002-sd Despooling elapsed time = 00:14:27, Transfer rate = 433.4 M Bytes/second pjg095srsav002-sd Spooling data again ... pjg095srsav002-sd Writing spooled data to Volume. Despooling 375,809,964,457 bytes ... User specified Device spool size reached: DevSpoolSize=375,809,964,457 MaxDevSpoolSize=375,809,638,400 pjg095srsav002-sd Despooling elapsed time = 00:17:35, Transfer rate = 356.2 M Bytes/second pjg095srsav002-sd Spooling data again ... pjg095srsav002-sd User specified Device spool size reached: DevSpoolSize=375,809,963,641 MaxDevSpoolSize=375,809,638,400 Writing spooled data to Volume. Despooling 375,809,963,641 bytes ... pjg095srsav002-sd Despooling elapsed time = 00:21:09, Transfer rate = 296.1 M Bytes/second pjg095srsav002-sd Spooling data again ... pjg095srsav002-sd Committing spooled data to Volume "000001". Despooling 288,229,988,549 bytes ... pjg095srsav002-sd Despooling elapsed time = 00:14:50, Transfer rate = 323.8 M Bytes/second pjg095srsav002-sd Elapsed time=19:13:54, Transfer rate=63.85 M Bytes/second pjg095srsav002-sd Sending spooled attrs to the Director. Despooling 2,025,520,288 bytes ... PJG095SRBacula-dir No Files found to prune. Bacula PJG095SRBacula-dir 7.4.2 (06Jun16): Build OS: x86_64-unknown-linux-gnu debian 7.8 JobId: 5417 Job: B-ALL_partage-EXCEPT_PJGN.2017-03-20_17.41.40_31 Backup Level: Full (upgraded from Incremental) Client: "PJG095SRFIC001-FD" 7.4.2 (06Jun16) amd64-portbld-freebsd10.1,freebsd,10.1-RELEASE-p37 FileSet: "FIC001-ALL_partage-EXCEPT_PJGN-FS" 2017-03-12 19:00:00 Pool: "FULL-POOL-SCALAR" (From Job FullPool override) Catalog: "MyCatalog" (From Client resource) Storage: "PJG095SRSAV002-STORAGE" (From Pool resource) Scheduled time: 20-mars-2017 17:41:20 Start time: 20-mars-2017 17:41:43 End time: 21-mars-2017 13:05:31 Elapsed time: 19 hours 23 mins 48 secs Priority: 8 FD Files Written: 4,738,350 SD Files Written: 4,738,350 FD Bytes Written: 4,419,508,257,191 (4.419 TB) SD Bytes Written: 4,420,646,851,428 (4.420 TB) Rate: 63291.3 KB/s Software Compression: None Snapshot/VSS: no Encryption: no Accurate: no Volume name(s): 000001 Volume Session Id: 1 Volume Session Time: 1490014938 Last Volume Bytes: 4,422,000,339,968 (4.422 TB) Non-fatal FD errors: 0 SD Errors: 0 FD termination status: OK SD termination status: OK Termination: Backup OK Begin pruning Jobs older than 4 months 14 days . No Jobs found to prune. Begin pruning Files. End auto prune.

2017-03-26 6:52 GMT+02:00 Laurent ALONSO <laualo AT gmail DOT com>:
Hi

With a sufficient spool directory size to despool for at least 10 minutes on the tape.

Best regards ! (Be indulgent for my english I am french ;-))

----------------------------------------------------------------------------------------------------

Le 26 mars 2017 05:29, "Caribe Schreiber" <caribe AT auctionharmony DOT com> a écrit :
*sigh*

Do you folks ever have one of those days where it seems like you just can't win?  I'm apparently having one of those days.

Hopefully the mail client doesn't flub the line endings this time.

Slow copy job output:

backup-sd JobId 64 Sending spooled attrs to the Director. Despooling 2,615 bytes ...

 Elapsed time=02:03:24, Transfer rate=7.346 M Bytes/second
 backup-dir JobId 64 No Files found to prune.
 End auto prune.
 Bacula backup-dir 7.4.4 (202Sep16):
  Build OS:               x86_64-pc-linux-gnu gentoo 
  JobId:                  64
  Job:                    BackupImage.2017-03-24_21.45.00_45
  Backup Level:           Full
  Client:                 "primary-fd" 7.4.4 (202Sep16) x86_64-pc-linux-gnu,gentoo,
  FileSet:                "BackupFileSet" 2017-03-24 21:45:00
  Pool:                   "file.full" (From Job resource)
  Catalog:                "MyCatalog" (From Client resource)
  Storage:                "file" (From Pool resource)
  Scheduled time:         24-Mar-2017 21:45:00
  Start time:             25-Mar-2017 00:12:38
  End time:               25-Mar-2017 02:16:03
  Elapsed time:           2 hours 3 mins 25 secs
  Priority:               13
  FD Files Written:       7
  SD Files Written:       7
  FD Bytes Written:       54,391,511,056 (54.39 GB)
  SD Bytes Written:       54,391,515,157 (54.39 GB)
  Rate:                   7345.2 KB/s
  Software Compression:   57.8% 2.4:1
  Snapshot/VSS:           no
  Encryption:             yes
  Accurate:               no
  Volume name(s):         full-Vol-0024|full-Vol-0029
  Volume Session Id:      49
  Volume Session Time:    1490104159
  Last Volume Bytes:      22,911,377,270 (22.91 GB)
  Non-fatal FD errors:    0
  SD Errors:              0
  FD termination status:  OK
  SD termination status:  OK
  Termination:            Backup OK
 Begin pruning Jobs older than 6 months .
 No Jobs found to prune.
 Begin pruning Files.
backup-dir Using Device "Drive-1" to write.
backup-sd Elapsed time=00:39:02, Transfer rate=23.22 M Bytes/second
backup-sd Alert: smartctl 6.4 2015-06-04 r4109 [x86_64-linux-4.4.39-gentoo] (local build)
 Alert: Copyright (C) 2002-15, Bruce Allen, Christian Franke, www.smartmontools.org
 Alert: TapeAlert: OK
 Alert: === START OF READ SMART DATA SECTION ===
 Alert:
 Alert: Error Counter logging not supported
 backup-sd Sending spooled attrs to the Director. Despooling 2,615 bytes ...
 Alert: Last n error events log page
 Alert:
 Alert:


dd read speed of one of the disk backup files in the job:

backup bacula # dd if=full-Vol-0029 of=/dev/null bs=4M
12799+1 records in
12799+1 records out
53687083817 bytes (54 GB, 50 GiB) copied, 111.637 s, 481 MB/s


Bonnie output:

backup ~ # bonnie++ -u root
Using uid:0, gid:0.
Writing a byte at a time...done
Writing intelligently...done
Rewriting...done
Reading a byte at a time...done
Reading intelligently...done
start 'em...done...done...done...done...done...
Create files in sequential order...done.
Stat files in sequential order...done.
Delete files in sequential order...done.
Create files in random order...done.
Stat files in random order...done.
Delete files in random order...done.
Version  1.97       ------Sequential Output------ --Sequential Input- --Random-
Concurrency   1     -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
Machine        Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP  /sec %CP
   backup      126G  1211  97 601130  47 240240  17  5506  94 759982  22 698.1  10
Latency              9305us     710ms     646ms   15409us     276ms     519ms
Version  1.97       ------Sequential Create------ --------Random Create--------
   backup           -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete--
              files  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP
                 16 16535  15 +++++ +++ +++++ +++ +++++ +++ +++++ +++ +++++ +++
Latency               385us     360us     714us     383us     370us     345us
1.97,1.97,backup,1,1490480328,126G,,1211,97,601130,47,240240,17,5506,94,759982,22,698.1,10,16,,,,,16535,15,+++++,+++,+++++,+++,+++++,+++,+++++,+++,+++++,+++,9305us,710ms,646ms,15409us,276ms,519ms,385us,360us,714us,383us,370us,345us



SD config:

#
# A file changer for disk based backups
#
Autochanger {
  Name = FileChgr1
  Device = FileChgr1-Dev1
  Changer Command = ""
  Changer Device = /dev/null
}

Device {
  Name = FileChgr1-Dev1
  Media Type = file
  Archive Device = /home/backups/bacula
  LabelMedia = yes;                   # lets Bacula label unlabeled media
  Random Access = Yes;
  AutomaticMount = yes;               # when device opened, read it
  RemovableMedia = no;
  AlwaysOpen = no;
  Maximum Concurrent Jobs = 5
}



#
# An HP 1/8 autochanger device with one drive
#
Autochanger {
  Name = AutochangerHP
  Device = Drive-1
  Changer Command = "/usr/libexec/bacula/mtx-changer %c %o %S %a %d"
  Changer Device = /dev/sg3
}

Device {
  Name = Drive-1                      #
  Media Type = LTO-6
  Archive Device = /dev/nst0
  AutomaticMount = yes;               # when device opened, read it
  AlwaysOpen = yes;
  RemovableMedia = yes;
  RandomAccess = no;
  AutoChanger = yes
  Maximum File Size = 10GB
  Alert Command = "sh -c 'smartctl -H -l error %c'"
}

I'm a little stumped about why the raw hardware read/write speeds are so different from the bacula copy job speeds.

I'm hoping the collective wisdom of the list can give me some ideas on other places to look for speed issues or config changes so that I don't prematurely wear out this drive head with shoe-shining (of course...faster backups are always welcome too!).

Thanks in advance!

~Caribe

--
Caribe Schreiber

------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Bacula-users mailing list
[email protected].net
https://lists.sourceforge.net/lists/listinfo/bacula-users


------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Bacula-users mailing list
Bacula-users AT lists.sourceforge DOT net
https://lists.sourceforge.net/lists/listinfo/bacula-users