Bacula-users

[Bacula-users] [semi-solved] LTO-4 tape: only 20mb/sec when used with bacula

2010-06-21 05:59:47
Subject: [Bacula-users] [semi-solved] LTO-4 tape: only 20mb/sec when used with bacula
From: Lukas Kolbe <l-lists AT einfachkaffee DOT de>
To: richard <richard AT sauce.co DOT nz>
Date: Mon, 21 Jun 2010 11:56:31 +0200
Am Montag, den 21.06.2010, 01:20 +0200 schrieb Lukas Kolbe:

> Thanks for the hints. Changing the block sizes to the ones you mentioned
> doesn't help unfortunatly.
> 
> Maybe it is a completely different problem: The backups are made to the
> diskpool in parallel, from a few different clients. The SD writes them
> to disk with about 200MiB/second as a whole and from 5 to 50 MiB/second
> per job.  That means that a single job is fragmented in many Volumes.
> 
> I've just activated data spooling for the copy job, and even the
> spooling process doesn't get faster than 20MiB/second while still
> consuming one full cpu-core. 
> 
> The volume size is 32GiB if that matters (I don't want to have too many
> files in the diskpool).
> 
> I have no idea why bacula-sd consumes so much cpu-time and obviously
> limits the throughput here.

Okay, I have to correct the 20MiB/sec figure for the tape drive. After I
activated Data Spooling for the copy job, I installed sysstat and saw
this:

01:10:01 AM       tps      rtps      wtps   bread/s   bwrtn/s
01:20:01 AM    716.75    253.16    463.59  64086.25   7226.24
01:30:02 AM    494.51    166.13    328.38  42533.10   5090.18
01:40:01 AM    308.48    142.18    166.30  36369.89   2565.75
01:50:01 AM    301.74    127.34    174.39  32536.13   2694.53
02:00:01 AM    306.21    125.27    180.94  31923.61   2791.23
02:10:01 AM    403.41    147.37    256.04  37593.61   3979.05

the sd reads the volumes from disk with 20-30MiB yet writes to the spool
with only 1.5 to 4MiB/sec. This night he got a whopping 33GiB into the
spool. 

For comparison, I dd'ed a volume to /dev/null while the copy job was
running:
[root@shepherd ~]# dd if=/var/bacula/dp/fs1/Vol0070 of=/dev/null bs=1M
9175040000 bytes (9.2 GB) copied, 12.0225 seconds, 763 MB/s

But dd'ing it to another file reveals a problem with the storage
subsystem I believe:

[root@shepherd ~]# dd if=/var/bacula/dp/fs1/Vol0070 of=/var/bacula/dp/fs2/xxx 
bs=1M
849346560 bytes (849 MB) copied, 32.665 seconds, 26.0 MB/s

- and dd is also consuming one full core while doing this.

Both filesystems are LVM-Volumes with ext4 on the same RAID-Array on an
Adaptec 52445. This just shouldn't happen. I'll report back when I find
a solution to this problem, thanks for your patience and sorry for
accusing bacula-sd of being at fault here!

-- 
Lukas



------------------------------------------------------------------------------
ThinkGeek and WIRED's GeekDad team up for the Ultimate 
GeekDad Father's Day Giveaway. ONE MASSIVE PRIZE to the 
lucky parental unit.  See the prize list and enter to win: 
http://p.sf.net/sfu/thinkgeek-promo
_______________________________________________
Bacula-users mailing list
Bacula-users AT lists.sourceforge DOT net
https://lists.sourceforge.net/lists/listinfo/bacula-users