Bacula-users

Re: [Bacula-users] bacula-5.2.6 slow full backup of localhost

2014-04-09 15:01:35
Subject: Re: [Bacula-users] bacula-5.2.6 slow full backup of localhost
From: Anthony Jenkins <ajenkins AT vtilt DOT com>
To: Martin Simmons <martin AT lispworks DOT com>, bacula-users AT lists.sourceforge DOT net
Date: Wed, 09 Apr 2014 14:56:12 -0400
Thanks for the great suggestions.  I /think/ I have a better handle on what's going on after running the 3 Bacula processes in the foreground with a high enough debug level and piping each's stdout to a logfile.  I noticed when the bacula-dir logfile showed it was backing up /var/log/bacula/bacula-sd.log (I might have the exact name wrong), the backup (from the Director's perspective) never went to a new file, but the fd component was busily writing ~64k chunks of data according to its logfile.  I've since added /var/log (and some other /var subdirectories) to the Exclude list, and with the help of the bacula-dir.dir log I should be able to figure out what file it's hung on.

This sounds to me like the 'Accurate = yes' job option in the Director config file...would that cause the hanging behavior I described above?  I'll try the strace -f option when it happens again.

This doesn't really address what I perceive to be a "slow" backup (rate of bytes backed up much less than max rate of data transfer for the underlying storage subsystem), but if I can get a full backup of the local system in a "reasonable" time, I can probably finally bless Bacula as our new backup system and work on performance later.

Thanks,
Anthony


On 04/08/2014 12:58 PM, Martin Simmons wrote:
Try using strace -f to examine all of the threads.

__Martin


On Tue, 08 Apr 2014 10:41:19 -0400, Anthony Jenkins said:
Well using dd(1) I get 103MB/s - this is while the Bacula backup of the
machine is ongoing:

root@venus:/home/ajenkins# dd if=/dev/zero of=/media/usb2/1GB_ZEROS
bs="$((1024*1024*1024))" count=1
1+0 records in
1+0 records out
1073741824 bytes (1.1 GB) copied, 10.4721 s, 103 MB/s

I don't think the bottleneck is the drive...

Anthony


On 04/08/2014 10:35 AM, Hans van Leeuwen wrote:
Hi Anthony,

What is the speed if you copy a big file with rsync --progress to the
same external USB drive?

Hans.

-----Original Message-----
*From*: Anthony Jenkins <ajenkins AT vtilt DOT com
<mailto:Anthony%20Jenkins%20%3cajenkins AT vtilt DOT com%3e>>
*To*: bacula-users AT lists.sourceforge DOT net
<mailto:bacula-users AT lists.sourceforge DOT net>
*Subject*: [Bacula-users] bacula-5.2.6 slow full backup of localhost
*Date*: Tue, 08 Apr 2014 09:33:38 -0400

I've skimmed the archive for hints, but most complain about remote
backups being slow...hopefully I just did something stupid in the config.

I'm getting 1,149,012 Bytes/sec on a full backup of several () GB of
data on the local Linux machine (dir, sd, fd on one host).  This first
attempt at a backup has been running for about 30 hours and has written
~132 GB to a USB 3.0 external drive formatted with EXT4.

*status storage
Automatically selected Storage: File
Connecting to Storage daemon File at 127.0.0.1:9103

venus.vtilt.com-sd Version: 5.2.6 (21 February 2012)
x86_64-pc-linux-gnu ubuntu 13.10
Daemon started 07-Apr-14 00:44. Jobs: run=0, running=0.
 Heap: heap=270,336 smbytes=172,237 max_bytes=237,811 bufs=117
max_bufs=123
 Sizes: boffset_t=8 size_t=8 int32_t=4 int64_t=8 mode=0,0

Running Jobs:
Writing: Full Backup job BackupVenus-Full JobId=13 Volume="20140407"
    pool="FullPool" device="FileStorage2" (/media/usb2)
    spooling=0 despooling=0 despool_wait=0
    Files=2,930,749 Bytes=132,395,015,432 Bytes/sec=1,149,012
    FDReadSeqNo=25,659,941 in_msg=17673253 out_msg=5 fd=5
====
CPU load on the box is negligible (8-core Intel Core i7-4770K @ 350GHz,
16GB RAM).

root@venus:/home/ajenkins# cat /proc/meminfo
MemTotal:       16309728 kB
MemFree:         1115508 kB
Buffers:         2702536 kB
Cached:          4711008 kB
SwapCached:         9768 kB
Active:          3552560 kB
Inactive:        4157596 kB
...
root@venus:/home/ajenkins# vmstat
procs -----------memory---------- ---swap-- -----io---- -system--
----cpu----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us
sy id wa
 1  2 202440 1137260 2704460 4704228    0    0    49    88    0    1 
1  1 97  1
root@venus:/home/ajenkins# mpstat
Linux 3.11.0-18-generic (venus.vtilt.com)     04/08/2014    
_x86_64_    (8 CPU)

08:56:22 AM  CPU    %usr   %nice    %sys %iowait    %irq   %soft 
%steal  %guest  %gnice   %idle
08:56:22 AM  all    1.25    0.00    0.82    1.21    0.00    0.00   
0.00    0.00    0.00   96.71
root@venus:/home/ajenkins# ps auxww | grep '\(bacula\|mysql\|PID\)' |
grep -v grep
USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
mysql      346  0.1  0.9 681340 150556 ?       Ssl  Mar31  12:42
/usr/sbin/mysqld
bacula   20022  0.3  0.0 211132  1636 ?        Ssl  Apr07   6:48
/usr/sbin/bacula-sd -c /etc/bacula/bacula-sd.conf -u bacula -g disk
root     20045  0.3  0.0 287516 15340 ?        Ssl  Apr07   6:31
/usr/sbin/bacula-fd -c /etc/bacula/bacula-fd.conf
bacula   20066  0.1  0.0 378976  2520 ?        Ssl  Apr07   2:42
/usr/sbin/bacula-dir -c /etc/bacula/bacula-dir.conf -u bacula -g bacula
The 'Ssl' says the bacula and mysql processes are asleep (at the time of
the ps(1) snapshot).  I did an strace(1) of the 3 bacula processes; the
sd and fd processes were blocked in a select(2) call for a good while
waiting on descriptor 4, which are sockets connected to other bacula
processes it seems:

root@venus:/home/ajenkins# strace -p20022 -p20045 -p20066 -t 2>&1 |
tee bacula.strace
Process 20022 attached
Process 20045 attached
Process 20066 attached
[pid 20022] 09:15:06 select(4, [3], NULL, NULL, NULL <unfinished ...>
[pid 20066] 09:15:06 restart_syscall(<... resuming interrupted call
...> <unfinished ...>
[pid 20045] 09:15:06 select(4, [3], NULL, NULL, NULL <unfinished ...>
[pid 20066] 09:16:00 <... restart_syscall resumed> ) = 0
[pid 20066] 09:16:00 stat("/etc/localtime", {st_mode=S_IFREG|0644,
st_size=3519, ...}) = 0
[pid 20066] 09:16:00 stat("/etc/localtime", {st_mode=S_IFREG|0644,
st_size=3519, ...}) = 0
...
ajenkins@venus:~$ sudo ls -l /proc/20022/fd | grep '\<4\>'
lrwx------ 1 root root 64 Apr  8 07:39 4 -> socket:[9896449]
ajenkins@venus:~$ sudo ls -l /proc/20045/fd | grep '\<4\>'
lrwx------ 1 root root 64 Apr  8 07:39 4 -> socket:[9896450]
ajenkins@venus:~$ sudo lsof -p 20022 | grep 98964
bacula-sd 20022 bacula    3u  IPv4 9896420          0t0     TCP
localhost:bacula-sd (LISTEN)
bacula-sd 20022 bacula    4u  IPv4 9896449          0t0     TCP
localhost:bacula-sd->localhost:45075 (ESTABLISHED)
bacula-sd 20022 bacula    5u  IPv4 9896451          0t0     TCP
localhost:bacula-sd->localhost:45077 (ESTABLISHED)
ajenkins@venus:~$ sudo lsof -p 20045 | grep 98964
bacula-fd 20045 root    3u  IPv4 9896430      0t0      TCP
localhost:bacula-fd (LISTEN)
bacula-fd 20045 root    4u  IPv4 9896450      0t0      TCP
localhost:bacula-fd->localhost:33953 (ESTABLISHED)
ajenkins@venus:~$ sudo lsof | grep '4507[57]'
bacula-sd 20022           bacula    4u     IPv4           
9896449          0t0        TCP localhost:bacula-sd->localhost:45075
(ESTABLISHED)
bacula-sd 20022           bacula    5u     IPv4           
9896451          0t0        TCP localhost:bacula-sd->localhost:45077
(ESTABLISHED)
bacula-sd 20022 20027     bacula    4u     IPv4           
9896449          0t0        TCP localhost:bacula-sd->localhost:45075
(ESTABLISHED)
bacula-sd 20022 20027     bacula    5u     IPv4           
9896451          0t0        TCP localhost:bacula-sd->localhost:45077
(ESTABLISHED)
bacula-sd 20022 20078     bacula    4u     IPv4           
9896449          0t0        TCP localhost:bacula-sd->localhost:45075
(ESTABLISHED)
bacula-sd 20022 20078     bacula    5u     IPv4           
9896451          0t0        TCP localhost:bacula-sd->localhost:45077
(ESTABLISHED)
bacula-fd 20045             root    5u     IPv4           
9899062          0t0        TCP localhost:45077->localhost:bacula-sd
(ESTABLISHED)
bacula-fd 20045 20047       root    5u     IPv4           
9899062          0t0        TCP localhost:45077->localhost:bacula-sd
(ESTABLISHED)
bacula-fd 20045 20080       root    5u     IPv4           
9899062          0t0        TCP localhost:45077->localhost:bacula-sd
(ESTABLISHED)
bacula-fd 20045 20082       root    5u     IPv4           
9899062          0t0        TCP localhost:45077->localhost:bacula-sd
(ESTABLISHED)
bacula-di 20066           bacula    9u     IPv4           
9895396          0t0        TCP localhost:45075->localhost:bacula-sd
(ESTABLISHED)
bacula-di 20066 20072     bacula    9u     IPv4           
9895396          0t0        TCP localhost:45075->localhost:bacula-sd
(ESTABLISHED)
bacula-di 20066 20073     bacula    9u     IPv4           
9895396          0t0        TCP localhost:45075->localhost:bacula-sd
(ESTABLISHED)
bacula-di 20066 20077     bacula    9u     IPv4           
9895396          0t0        TCP localhost:45075->localhost:bacula-sd
(ESTABLISHED)
bacula-di 20066 20079     bacula    9u     IPv4           
9895396          0t0        TCP localhost:45075->localhost:bacula-sd
(ESTABLISHED)
How do I increase the logging while the bacula daemons are running? 
bconsole 'setdebug' command?  Any advice on next debugging steps?

Thanks in advance,


-- 
Anthony Jenkins, Software Engineer
VTilt Digital, LLC.

------------------------------------------------------------------------------
Put Bad Developers to Shame
Dominate Development with Jenkins Continuous Integration
Continuously Automate Build, Test & Deployment 
Start a new project now. Try Jenkins in the cloud.
http://p.sf.net/sfu/13600_Cloudbees
_______________________________________________
Bacula-users mailing list
Bacula-users AT lists.sourceforge DOT net
https://lists.sourceforge.net/lists/listinfo/bacula-users

------------------------------------------------------------------------------
Put Bad Developers to Shame
Dominate Development with Jenkins Continuous Integration
Continuously Automate Build, Test & Deployment 
Start a new project now. Try Jenkins in the cloud.
http://p.sf.net/sfu/13600_Cloudbees
_______________________________________________
Bacula-users mailing list
Bacula-users AT lists.sourceforge DOT net
https://lists.sourceforge.net/lists/listinfo/bacula-users


--
Anthony Jenkins, Software Engineer
VTilt Digital, LLC.
------------------------------------------------------------------------------
Put Bad Developers to Shame
Dominate Development with Jenkins Continuous Integration
Continuously Automate Build, Test & Deployment 
Start a new project now. Try Jenkins in the cloud.
http://p.sf.net/sfu/13600_Cloudbees
_______________________________________________
Bacula-users mailing list
Bacula-users AT lists.sourceforge DOT net
https://lists.sourceforge.net/lists/listinfo/bacula-users
<Prev in Thread] Current Thread [Next in Thread>