Bacula-users

Re: [Bacula-users] Cryptic SD status: zero despooling speed

2014-09-14 03:27:35
Subject: Re: [Bacula-users] Cryptic SD status: zero despooling speed
From: Ivan Adzhubey <iadzhubey AT rics.bwh.harvard DOT edu>
To: bacula-users AT lists.sourceforge DOT net
Date: Sun, 14 Sep 2014 03:25:17 -0400
Hi Kern,

Thank you for taking your time and trying to help me. Please scroll down for 
my comments.

On Sunday, September 14, 2014 08:26:23 AM Kern Sibbald wrote:
> Hello,
> 
> The rates that are shown in the status report are meant to be the
> incoming data rates, and as you say do not take into account
> spooling/despooling.

Hmm, I never said they do not take into account despooling.

> I will be very surprised if there is any decrease
> in throughput from version 2.4.x to 7.0.x.

Me too, but at this point all I was trying to do is to figure out how to track 
the transfer rates with 7.0.5 (which was easier with 2.4.4) since that's what 
I need to be able to do to debug the problem.

Since I am not sure I interpreted version 7.0.5 status reports correctly I 
can't tell what's going on at this point. One thing I am sure is that the 
backup jobs which used to take about 8 hours total with 2.4.4 took more than 
48 hours after the upgrade. But I am still not sure why this happened.

> Concerning reporting
> despooling rates: as far as I can see from your output, version 7.0.x is
> indeed reporting the despooling rate, which is larger than for 2.4.x.

That's correct for this particular job. I have specifically chosen the most 
simple single job for comparisons. This was done only to compare the format 
and contents of the status reports between 2.4.4 and 7.0.5.

> The despooling rate is only reported in the job log when the spool is
> committed typically at the end of the job, but if it despools multiple
> times, it can be reported multiple times.

Yep, I am aware of this fact. Here's another example where a large full backup 
job exhausted disk spooling limit and required two iterations of 
spooling/despooling (and a tape change). The despooling rates were reported 
twice in the logs, for each despooling operation separately. No spooling rates 
were ever reported for the spooling parts of the process:

12-Sep 01:24 rosalind-dir JobId 52615: Using Device "Drive-0" to write.
12-Sep 01:24 rosalind-sd JobId 52615: Volume "LI0004" previously written, 
moving to end of data.
12-Sep 01:25 rosalind-sd JobId 52615: Ready to append to end of Volume 
"LI0004" at file=106.
12-Sep 01:25 rosalind-sd JobId 52615: Spooling data ...
13-Sep 03:27 rosalind-sd JobId 52615: User specified Device spool size reached: 
DevSpoolSize=884,763,273,281 MaxDevSpoolSize=884,763,262,976

## Note: Spooling hits the configured limit and job switches to despooling but 
no spooling rates are reported

13-Sep 03:27 rosalind-sd JobId 52615: Writing spooled data to Volume. 
Despooling 884,763,273,281 bytes ...
13-Sep 09:17 rosalind-sd JobId 52615: End of Volume "LI0004" at 827:8233 on 
device "Drive-0" (/dev/nst0). Write of 64512 bytes got -1.
13-Sep 09:18 rosalind-sd JobId 52615: Re-read of last block succeeded.
13-Sep 09:18 rosalind-sd JobId 52615: End of medium on Volume "LI0004" 
Bytes=781,013,366,784 Blocks=12,106,481 at 13-Sep-2014 09:18.
13-Sep 09:18 rosalind-sd JobId 52615: 3307 Issuing autochanger "unload slot 4, 
drive 0" command.
13-Sep 09:20 rosalind-dir JobId 52615: There are no more Jobs associated with 
Volume "LI0007". Marking it purged.
13-Sep 09:20 rosalind-dir JobId 52615: All records pruned from Volume 
"LI0007"; marking it "Purged"
13-Sep 09:20 rosalind-dir JobId 52615: Recycled volume "LI0007"
13-Sep 09:20 rosalind-sd JobId 52615: 3304 Issuing autochanger "load slot 7, 
drive 0" command.
13-Sep 09:20 rosalind-sd JobId 52615: 3305 Autochanger "load slot 7, drive 0", 
status is OK.
13-Sep 09:21 rosalind-sd JobId 52615: Recycled volume "LI0007" on tape device 
"Drive-0" (/dev/nst0), all previous data lost.
13-Sep 09:21 rosalind-sd JobId 52615: New volume "LI0007" mounted on device 
"Drive-0" (/dev/nst0) at 13-Sep-2014 09:21.
13-Sep 11:04 rosalind-sd JobId 52615: Despooling elapsed time = 07:32:43, 
Transfer rate = 32.57 M Bytes/second

## First despooling iteration completed in 07:32 hours, at 32.57 M 
Bytes/second (despooling rate reported)

13-Sep 11:27 rosalind-sd JobId 52615: Spooling data again ...

## Note: No spooling rate reported
## Compare this to the following excerpt from v2.4.4 logs for
## the same stage of the same job:
##
## 01-Jan 01:05 rosalind-dir JobId 50189: Using Device "Drive-0"
## 01-Jan 01:05 rosalind-sd JobId 50189: Spooling data ...
## 01-Jan 01:17 rosalind-sd JobId 50189: Job write elapsed time = 00:12:02,
## Transfer rate = 1.869 K bytes/second
##
## As you can clearly see, previously, the spooling rate was reported

13-Sep 18:28 rosalind-sd JobId 52615: Committing spooled data to Volume 
"LI0007". Despooling 267,583,703,639 bytes ...
13-Sep 20:34 rosalind-sd JobId 52615: Despooling elapsed time = 02:05:41, 
Transfer rate = 35.48 M Bytes/second

## Second despooling iteration completed in 02:05 hours, at 35.48 M 
Bytes/second (despooling rate reported)

13-Sep 20:41 rosalind-sd JobId 52615: Elapsed time=43:12:48, Transfer 
rate=7.400 M Bytes/second

## Note: I assume this is the overall transfer rate calculated for the whole 
job duration

13-Sep 20:41 rosalind-sd JobId 52615: Sending spooled attrs to the Director. 
Despooling 723,209,221 bytes ...

However, the final job report includes yet another and different rate figure 
for 
this job:

Scheduled time:         12-Sep-2014 01:23:54
  Start time:             12-Sep-2014 01:24:18
  End time:               13-Sep-2014 21:03:41
  Elapsed time:           1 day 19 hours 39 mins 23 secs
  Priority:               9
  FD Files Written:       2,015,504
  SD Files Written:       2,015,504
  FD Bytes Written:       1,150,834,873,985 (1.150 TB)
  SD Bytes Written:       1,151,216,864,476 (1.151 TB)
  Rate:                   7322.6 KB/s

Well, the two figures: 7.400 M Bytes/second and 7322.6 KB/s (which equals 7.151 
M Bytes/sec) are close enough so this difference can be ignored.

So it looks like in my case the culprit is not the tape writes (despooling) 
but the disk writes (spooling). That's why i was trying to find the way to 
actually track disk write speed dynamically. It was possible with 2.4.4 but 
looks like no longer available in 7.0.5.

> If I remember right the status command will give you despooling rates if
> the command is done during despooling, but that rate is shown with the
> drive information not the job information.

Drive information does not provide rates, only bytes/files/blocks counts. So I 
would need to use a stopwatch and do the calculations myself to rate the 
transfers this way.

> Also it appears that the precise wording of the output has changed to
> make it more specific (accurate).  If you were previously running 5.2.x

No, I jumped directly from 2.4.4 to 7.0.5.

> I would check to see if there is any real change in the code, which I
> doubt because that part of the SD hasn't changed at least from 5.2.x --
> however, 2.4.x is too old any comparison of the code would probably not
> be useful.
> 
> Bottom line: I don't see a problem other than the labels for the
> statistics are slightly different.

The spooling to disk rate was definitely omitted from the 7.0.5 status output 
as well as from the logs.

--Ivan




------------------------------------------------------------------------------
Want excitement?
Manually upgrade your production database.
When you want reliability, choose Perforce
Perforce version control. Predictably reliable.
http://pubads.g.doubleclick.net/gampad/clk?id=157508191&iu=/4140/ostg.clktrk
_______________________________________________
Bacula-users mailing list
Bacula-users AT lists.sourceforge DOT net
https://lists.sourceforge.net/lists/listinfo/bacula-users