Bacula-users

Re: [Bacula-users] Cryptic SD status: zero despooling speed

2014-09-14 04:39:42
Subject: Re: [Bacula-users] Cryptic SD status: zero despooling speed
From: Kern Sibbald <kern AT sibbald DOT com>
To: Ivan Adzhubey <iadzhubey AT rics.bwh.harvard DOT edu>, bacula-users AT lists.sourceforge DOT net
Date: Sun, 14 Sep 2014 10:33:49 +0200
Hello,

My comments are below ...

On 09/14/2014 09:25 AM, Ivan Adzhubey wrote:
> Hi Kern,
>
> Thank you for taking your time and trying to help me. Please scroll down for 
> my comments.
>
> On Sunday, September 14, 2014 08:26:23 AM Kern Sibbald wrote:
>> Hello,
>>
>> The rates that are shown in the status report are meant to be the
>> incoming data rates, and as you say do not take into account
>> spooling/despooling.
> Hmm, I never said they do not take into account despooling.
Agreed, I wasn't contesting anything you said only trying to clarify.
>
>> I will be very surprised if there is any decrease
>> in throughput from version 2.4.x to 7.0.x.
> Me too, but at this point all I was trying to do is to figure out how to 
> track 
> the transfer rates with 7.0.5 (which was easier with 2.4.4) since that's what 
> I need to be able to do to debug the problem.
One of the problems I have is that you have clipped the output just
before the place where I would expect the information to be printed. 
With numerous clipped outputs, I cannot really follow.  If you want a
better opinion, I suggest that you send me the *full* job output report
from 2.4.4 and the equivalent from 7.0.5, then perhaps I can see what is
going on.  In a number of your listings, I have problems knowing whether
it is 2.4.4 output or 7.0.5 output, so if you include the *full* output,
it should be clearer to me.
>
> Since I am not sure I interpreted version 7.0.5 status reports correctly I 
> can't tell what's going on at this point. One thing I am sure is that the 
> backup jobs which used to take about 8 hours total with 2.4.4 took more than 
> 48 hours after the upgrade. But I am still not sure why this happened.
The most likely reason is some bottleneck in the catalog, especially if
you are using MySQL or you have updated the MySQL/Postgres versions. 
The default configuration files are horribly tuned for Bacula.  You can
verify this by looking at the amount of time that it takes to despool.
>
>> Concerning reporting
>> despooling rates: as far as I can see from your output, version 7.0.x is
>> indeed reporting the despooling rate, which is larger than for 2.4.x.
> That's correct for this particular job. I have specifically chosen the most 
> simple single job for comparisons. This was done only to compare the format 
> and contents of the status reports between 2.4.4 and 7.0.5.
>
>> The despooling rate is only reported in the job log when the spool is
>> committed typically at the end of the job, but if it despools multiple
>> times, it can be reported multiple times.
> Yep, I am aware of this fact. Here's another example where a large full 
> backup 
> job exhausted disk spooling limit and required two iterations of 
> spooling/despooling (and a tape change). The despooling rates were reported 
> twice in the logs, for each despooling operation separately. No spooling 
> rates 
> were ever reported for the spooling parts of the process:
>
> 12-Sep 01:24 rosalind-dir JobId 52615: Using Device "Drive-0" to write.
> 12-Sep 01:24 rosalind-sd JobId 52615: Volume "LI0004" previously written, 
> moving to end of data.
> 12-Sep 01:25 rosalind-sd JobId 52615: Ready to append to end of Volume 
> "LI0004" at file=106.
> 12-Sep 01:25 rosalind-sd JobId 52615: Spooling data ...
> 13-Sep 03:27 rosalind-sd JobId 52615: User specified Device spool size 
> reached: 
> DevSpoolSize=884,763,273,281 MaxDevSpoolSize=884,763,262,976
>
> ## Note: Spooling hits the configured limit and job switches to despooling 
> but 
> no spooling rates are reported
>
> 13-Sep 03:27 rosalind-sd JobId 52615: Writing spooled data to Volume. 
> Despooling 884,763,273,281 bytes ...
> 13-Sep 09:17 rosalind-sd JobId 52615: End of Volume "LI0004" at 827:8233 on 
> device "Drive-0" (/dev/nst0). Write of 64512 bytes got -1.
> 13-Sep 09:18 rosalind-sd JobId 52615: Re-read of last block succeeded.
> 13-Sep 09:18 rosalind-sd JobId 52615: End of medium on Volume "LI0004" 
> Bytes=781,013,366,784 Blocks=12,106,481 at 13-Sep-2014 09:18.
> 13-Sep 09:18 rosalind-sd JobId 52615: 3307 Issuing autochanger "unload slot 
> 4, 
> drive 0" command.
> 13-Sep 09:20 rosalind-dir JobId 52615: There are no more Jobs associated with 
> Volume "LI0007". Marking it purged.
> 13-Sep 09:20 rosalind-dir JobId 52615: All records pruned from Volume 
> "LI0007"; marking it "Purged"
> 13-Sep 09:20 rosalind-dir JobId 52615: Recycled volume "LI0007"
> 13-Sep 09:20 rosalind-sd JobId 52615: 3304 Issuing autochanger "load slot 7, 
> drive 0" command.
> 13-Sep 09:20 rosalind-sd JobId 52615: 3305 Autochanger "load slot 7, drive 
> 0", 
> status is OK.
> 13-Sep 09:21 rosalind-sd JobId 52615: Recycled volume "LI0007" on tape device 
> "Drive-0" (/dev/nst0), all previous data lost.
> 13-Sep 09:21 rosalind-sd JobId 52615: New volume "LI0007" mounted on device 
> "Drive-0" (/dev/nst0) at 13-Sep-2014 09:21.
> 13-Sep 11:04 rosalind-sd JobId 52615: Despooling elapsed time = 07:32:43, 
> Transfer rate = 32.57 M Bytes/second
>
> ## First despooling iteration completed in 07:32 hours, at 32.57 M 
> Bytes/second (despooling rate reported)
>
> 13-Sep 11:27 rosalind-sd JobId 52615: Spooling data again ...
>
> ## Note: No spooling rate reported
> ## Compare this to the following excerpt from v2.4.4 logs for
> ## the same stage of the same job:
> ##
> ## 01-Jan 01:05 rosalind-dir JobId 50189: Using Device "Drive-0"
> ## 01-Jan 01:05 rosalind-sd JobId 50189: Spooling data ...
> ## 01-Jan 01:17 rosalind-sd JobId 50189: Job write elapsed time = 00:12:02,
> ## Transfer rate = 1.869 K bytes/second
> ##
> ## As you can clearly see, previously, the spooling rate was reported
>
> 13-Sep 18:28 rosalind-sd JobId 52615: Committing spooled data to Volume 
> "LI0007". Despooling 267,583,703,639 bytes ...
> 13-Sep 20:34 rosalind-sd JobId 52615: Despooling elapsed time = 02:05:41, 
> Transfer rate = 35.48 M Bytes/second
>
> ## Second despooling iteration completed in 02:05 hours, at 35.48 M 
> Bytes/second (despooling rate reported)
>
> 13-Sep 20:41 rosalind-sd JobId 52615: Elapsed time=43:12:48, Transfer 
> rate=7.400 M Bytes/second
>
> ## Note: I assume this is the overall transfer rate calculated for the whole 
> job duration
>
> 13-Sep 20:41 rosalind-sd JobId 52615: Sending spooled attrs to the Director. 
> Despooling 723,209,221 bytes ...
>
> However, the final job report includes yet another and different rate figure 
> for 
> this job:
>
> Scheduled time:         12-Sep-2014 01:23:54
>   Start time:             12-Sep-2014 01:24:18
>   End time:               13-Sep-2014 21:03:41
>   Elapsed time:           1 day 19 hours 39 mins 23 secs
>   Priority:               9
>   FD Files Written:       2,015,504
>   SD Files Written:       2,015,504
>   FD Bytes Written:       1,150,834,873,985 (1.150 TB)
>   SD Bytes Written:       1,151,216,864,476 (1.151 TB)
>   Rate:                   7322.6 KB/s
>
> Well, the two figures: 7.400 M Bytes/second and 7322.6 KB/s (which equals 
> 7.151 
> M Bytes/sec) are close enough so this difference can be ignored.
>
> So it looks like in my case the culprit is not the tape writes (despooling) 
> but the disk writes (spooling). That's why i was trying to find the way to 
> actually track disk write speed dynamically. It was possible with 2.4.4 but 
> looks like no longer available in 7.0.5.
OK, if you cannot track something that you could before, the full
listings should help me see what changed.
>
>> If I remember right the status command will give you despooling rates if
>> the command is done during despooling, but that rate is shown with the
>> drive information not the job information.
> Drive information does not provide rates, only bytes/files/blocks counts. So 
> I 
> would need to use a stopwatch and do the calculations myself to rate the 
> transfers this way.
>
>> Also it appears that the precise wording of the output has changed to
>> make it more specific (accurate).  If you were previously running 5.2.x
> No, I jumped directly from 2.4.4 to 7.0.5.
>
>> I would check to see if there is any real change in the code, which I
>> doubt because that part of the SD hasn't changed at least from 5.2.x --
>> however, 2.4.x is too old any comparison of the code would probably not
>> be useful.
>>
>> Bottom line: I don't see a problem other than the labels for the
>> statistics are slightly different.
> The spooling to disk rate was definitely omitted from the 7.0.5 status output 
> as well as from the logs.
OK, that was not something that was intentionally changed to the best of
my recollection, and if I can "see" it from the full job output, I can
figure out what changed, and quite likely fix it.

By the way, the normal output of the status command shows spooling
status for each job, and you can also do just a status of only the
spooling information.

Finally, I looked at the code in 7.0.5 for despooling, and there is a
job report that specifically prints the Despooling information.  The
message is sent with the INFO message type, so perhaps you have not
included that in your Message resource, thus you are not getting all the
information.  This seems to me to be the most likely reason for why you
are not seeing the stats.  The INFO message class could be something
that was added after 2.4.4 ...  I recommend that you check both your SD
and your DIR Message conf files.

Best regards,
Kern
>
> --Ivan
>
>
>
>
> ------------------------------------------------------------------------------
> Want excitement?
> Manually upgrade your production database.
> When you want reliability, choose Perforce
> Perforce version control. Predictably reliable.
> http://pubads.g.doubleclick.net/gampad/clk?id=157508191&iu=/4140/ostg.clktrk
> _______________________________________________
> Bacula-users mailing list
> Bacula-users AT lists.sourceforge DOT net
> https://lists.sourceforge.net/lists/listinfo/bacula-users
>


------------------------------------------------------------------------------
Want excitement?
Manually upgrade your production database.
When you want reliability, choose Perforce
Perforce version control. Predictably reliable.
http://pubads.g.doubleclick.net/gampad/clk?id=157508191&iu=/4140/ostg.clktrk
_______________________________________________
Bacula-users mailing list
Bacula-users AT lists.sourceforge DOT net
https://lists.sourceforge.net/lists/listinfo/bacula-users