Bacula-users

Re: [Bacula-users] Issues With Quantum 160/320 SDLT - HP/Compaq Proliant G3

2008-06-24 17:03:41
Subject: Re: [Bacula-users] Issues With Quantum 160/320 SDLT - HP/Compaq Proliant G3
From: Arno Lehmann <al AT its-lehmann DOT de>
To: bacula-users AT lists.sourceforge DOT net
Date: Tue, 24 Jun 2008 22:55:43 +0200
Hi,

24.06.2008 18:37, RA Cohen wrote:
> 
> Arno Lehmann wrote:
>> Hi,
>>
>> 22.06.2008 21:15, RA Cohen wrote:
>>   
>>> Here is a current status snapshot of my problem installation: [It is now 
>>> Sunday afternoon June 22, a full backup was scheduled Friday, with 
>>> differential for Saturday and Sunday]
>>>
>>> Bacula seems to be waiting for a signal of some kind from the drive. 
>>> Does the output of sta sd look normal? I assume the maximum jobs 
>>> exceeded message is there because the tape did not recycle. It is in 
>>> fact defined for a max of 9 jobs, 3 jobs for each day.
>>>
>>>     
>> ...
>>   
>>> *sta sd=InternalQuantum
>>> Connecting to Storage daemon InternalQuantum at 10.150.8.240:9103
>>>
>>> fserv-sd Version: 2.2.8 (26 January 2008) i386-portbld-freebsd6.2 
>>> freebsd 6.2-RELEASE-p8
>>> Daemon started 20-Jun-08 16:04, 0 Jobs run since started.
>>>  Heap: heap=212,992 smbytes=87,479 max_bytes=87,991 bufs=106 max_bufs=108
>>> Sizes: boffset_t=8 size_t=4 int32_t=4 int64_t=8
>>>
>>> Running Jobs:
>>> Writing: Full Backup job fservNightlySave JobId=1899 Volume="Friday-0002"
>>>     pool="FridayPool" device="InternalQuantum" (/dev/sa0)
>>>     spooling=0 despooling=0 despool_wait=0
>>>     Files=0 Bytes=0 Bytes/sec=0
>>>     FDSocket closed
>>> ====
>>>
>>> Jobs waiting to reserve a drive:
>>>    3610 JobId=1899 Volume max jobs exceeded on drive "InternalQuantum" 
>>> (/dev/sa0).
>>> ====
>>>
>>> Terminated Jobs:
>>>  JobId  Level    Files      Bytes   Status   Finished        Name
>>> ===================================================================
>>>   1886  Full          1    171.7 M  OK       18-Jun-08 07:04 BackupCatalog
>>>   1887  Full          0         0   Cancel   19-Jun-08 08:47 
>>> fservNightlySave
>>>   1890  Full          0         0   Error    19-Jun-08 08:55 
>>> fservNightlySave
>>>   1891  Full        493    350.5 M  Error    19-Jun-08 08:56 
>>> mservNightlySave
>>>   1893  Full    184,332    76.88 G  OK       19-Jun-08 10:23 
>>> fservNightlySave
>>>   1894  Full     10,069    3.326 G  OK       19-Jun-08 10:30 
>>> mservNightlySave
>>>   1895  Full          1    173.3 M  OK       19-Jun-08 10:30 BackupCatalog
>>>   1896  Full    184,399    76.92 G  OK       20-Jun-08 09:50 
>>> fservNightlySave
>>>   1897  Full     10,061    3.333 G  OK       20-Jun-08 09:57 
>>> mservNightlySave
>>>   1898  Full          1    174.5 M  OK       20-Jun-08 09:58 BackupCatalog
>>> ====
>>>
>>> Device status:
>>> Device "FileStorage" (/tmp) is not open.
>>> Device "InternalQuantum" (/dev/sa0) is mounted with:
>>>     Volume:      Friday-0002
>>>     Pool:        *unknown*
>>>     Media type:  SDLT
>>>     Total Bytes Read=129,024 Blocks Read=2 Bytes/block=64,512
>>>     Positioned at File=0 Block=0
>>> ====
>>>
>>> In Use Volume status:
>>> Friday-0002 on device "InternalQuantum" (/dev/sa0)
>>>     Reader=0 writers=0 reserved=0 released=1
>>> ====
>>>
>>>     
>> The above looks all normal (like the SD working correctly), though I 
>> would have expected that Bacula asks for a useable tape in this situation.
>>
>> I don't know if you expect the loaded tape to get recycled; if you do 
>> and it doesn't recycle, check with 'llist volume=Friday-0002' if it 
>> should be recycleable.
>>
>> If that is not the problem, you have to find why Bacula doesn't use 
>> the tape drive; I find the combination of the drive being mounted with 
>> a volume and, at the same time, being released a bit astonishing.
>>
>> What happens if you issue a mount command?
>>
>> Arno
>>
>>   
> Arno,
> 
> What happens when mount is issued: it came back with the "waiting on 
> Storage" error.

Ok. Admittedly, not really ok, but at least that seems to confirm that 
what you see is really what Bacula thinks the state of the volumes 
should be.

> I cannot show this to you now as I have had to try other 
> things. The drive itself has been replaced with one that has Quantum's 
> latest and greatest firmware - guess what - same result.

Well, I'm not sure this has to do with the hardware. Rather it looks 
like Bacula itself deadlocks with the maximum volume jobs setting.

> So I am looking 
> at the Volume max jobs because of the message 'Volume max jobs 
> exceeded.' It seems bacula is looking first at that Max Jobs value 
> before proceeding to recycle the tape (even though the catalog was 
> marked for the tape/volume to be recycled) The example I presented in 
> this thread included three full backup jobs on Friday, three 
> differential on Saturday, and three on Sunday, a total of 9 jobs. Two of 
> the jobs are typical, the third is the catalog backup. It doesn't matter 
> whether it's the Friday tape, or the Monday or Tuesday etc. I always ran 
> this situation [with my Quantum 40/80 and no problems] with Max Vol Jobs 
> = 9 on Friday (really Friday thru Sunday 1 tape), and Max Vol Jobs = 3 
> Monday thru Thursday (4 tapes). When I set it to 2, the tape recycles 
> properly, but then I'm left waiting to mount a new tape for the third 
> job. The only reason for the Max Vol Jobs was to get the tape to become 
> Used. So now I am trying no value for Max Vol Jobs, and setting Volume 
> Use Duration for 23h to see if this solves what I believe is still 
> basically a hardware <-> bacula issue.

I'd like to see more details about the SD-tape drive interaction... 
what I recall from your mails is not related to drive problems, I 
believe. Still I understand that with your DLT drive you don't observe 
this behaviour.

> 
> These Quantum 160/320 SDLT are known to be tricky and finicky as 
> compared to the 40/80 DLT, unfortunately, but I have to find a way to 
> live with them as I have 4 of them!

Send them here ;-)

> I'll report on my success or failure - I appreciate any and all support. 
> It is sometimes hard to describe concisely just what is going on...

Yup.

Although I'm a bit short with time others might have some useful 
ideas, too, so I'd recommend to capture a debug trace of the SD when 
it gets into this state. Perhaps in the debug output something shows up...

Arno

-- 
Arno Lehmann
IT-Service Lehmann
www.its-lehmann.de

-------------------------------------------------------------------------
Check out the new SourceForge.net Marketplace.
It's the best place to buy or sell services for
just about anything Open Source.
http://sourceforge.net/services/buy/index.php
_______________________________________________
Bacula-users mailing list
Bacula-users AT lists.sourceforge DOT net
https://lists.sourceforge.net/lists/listinfo/bacula-users