Bacula-users

Re: [Bacula-users] how to debug a job

2015-01-22 04:10:46
Subject: Re: [Bacula-users] how to debug a job
From: Luc Van der Veken <lucvdv AT wimionline DOT com>
To: "bacula-users AT lists.sourceforge DOT net" <bacula-users AT lists.sourceforge DOT net>
Date: Thu, 22 Jan 2015 09:05:12 +0000
Are you sure bacula is at fault?
I can think of circumstances where the way the source data are organized is to 
blame.

1) Average file size: 1 GB as a million files of 1 KB will be much slower to 
read than a single 1 GB file.
2) Too many files in one directory can make access very slow.

The effect is multiplied if these two are walking hand in hand...


Look at the way some applications spread their data over many subdirectories to 
get around the second problem (Squid proxy comes to mind, with its cached data 
distributed over 4096 directories organized in 2 levels: 16 at the first level, 
each containing 256 subdirs at the second level, each of those containing up to 
256 files).

Another example: some time ago, I and about 200 others had to upload half a 
dozen files a day, each, to a government FTP server.
After several months during which it was never cleaned up, just getting a 
directory listing of the 'incoming' directory took more than 15 minutes. A 
*large* part of those 15 minutes was not transmission time, but a delay before 
anything started coming in.
Problem: all the usual FTP clients for Windows automatically do an 'ls' after 
every 'cd'. It started happening more and more that the server would time out 
the control channel while the client was still waiting for a response on the 
data channel...


-----Original Message-----
From: Dimitri Maziuk [mailto:dmaziuk AT bmrb.wisc DOT edu] 
Sent: 21 January 2015 23:13
To: bacula-users AT lists.sourceforge DOT net
Subject: [Bacula-users] how to debug a job

(Take 2)

I've a client with ~316GB to back up. Currently the backup's been
running for 5 days and wrote 33GB to the spool file. Previous runs
failed with

> User specified Job spool size reached: JobSpoolSize=49,807,365,050 
> MaxJobSpoolSize=49,807,360,000
> Writing spooled data to Volume. Despooling 49,807,365,050 bytes ...
> Error: Watchdog sending kill after 518401 secs to thread stalled reading File 
> daemon.

Why is it taking 5 days to write 33GB?

Load avg on the client is 0.9%. Iperf clocks the connection at 110MB/s.
Iostat shows zero wait and .25MB/s read on the client's disk. every few
seconds bacula-fd shows up in iotop w/ read speed around 200-300K/s.
This is a healthy standard sata drive capable of 100MB/s, with ext4
filesystem.

It's a linux (centos 6) x64 client v. 5.0 and server v. 5.2.13 from
slaanesh repo.

How do I find out what's taking so long? What's the debug level I should
give to bacula-fd? Where do debug messages go? Anyone knows?

-- 
Dimitri Maziuk
Programmer/sysadmin
BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu

------------------------------------------------------------------------------
New Year. New Location. New Benefits. New Data Center in Ashburn, VA.
GigeNET is offering a free month of service with a new server in Ashburn.
Choose from 2 high performing configs, both with 100TB of bandwidth.
Higher redundancy.Lower latency.Increased capacity.Completely compliant.
http://p.sf.net/sfu/gigenet
_______________________________________________
Bacula-users mailing list
Bacula-users AT lists.sourceforge DOT net
https://lists.sourceforge.net/lists/listinfo/bacula-users

<Prev in Thread] Current Thread [Next in Thread>