Bacula-users

Re: [Bacula-users] Issues With Quantum 160/320 SDLT - HP/Compaq Proliant G3

2008-06-20 04:00:37
Subject: Re: [Bacula-users] Issues With Quantum 160/320 SDLT - HP/Compaq Proliant G3
From: Arno Lehmann <al AT its-lehmann DOT de>
To: bacula-users AT lists.sourceforge DOT net
Date: Fri, 20 Jun 2008 09:59:55 +0200
Hi,

19.06.2008 15:17, RA Cohen wrote:
> Here is an actual [edited] transcript of the problem:
> 
> Checking on things yesterday afternoon before the jobs run:
> 
> Scheduled Jobs:
> Level          Type     Pri  Scheduled          Name               Volume
> ===================================================================================
> Full           Backup    10  19-Jun-08 02:00    fservNightlySave   
> Wednesday-0229
> Full           Backup    10  19-Jun-08 02:00    mservNightlySave   
> Wednesday-0229
> Full           Backup    10  19-Jun-08 02:15    BackupCatalog      
> Wednesday-0229
> ====
> 
> +---------+----------------+-----------+---------+----------------+----------+--------------+---------+------+-----------+-----------+---------------------+
> | MediaId | VolumeName     | VolStatus | Enabled | VolBytes       | 
> VolFiles | VolRetention | Recycle | Slot | InChanger | MediaType | 
> LastWritten         |
> +---------+----------------+-----------+---------+----------------+----------+--------------+---------+------+-----------+-----------+---------------------+
> |     229 | Wednesday-0229 | Recycle   |       1 | 46,618,951,680 
> |       49 |  157,680,000 |       1 |    0 |         0 | SDLT      | 
> 2008-05-22 02:58:30 |
> |     261 | Wednesday-0010 | Used      |       1 | 80,370,662,400 
> |       82 |  157,680,000 |       1 |    0 |         0 | SDLT      | 
> 2008-06-12 03:29:58 |
> +---------+----------------+-----------+---------+----------------+----------+--------------+---------+------+-----------+-----------+---------------------+
> 
> This morning:
> 
> stat dir
> 
> Running Jobs:
>  JobId Level   Name                       Status
> ======================================================================
>   1887 Full    fservNightlySave.2008-06-19_02.00.12 is waiting on 
> Storage InternalQuantum
>   1888 Full    mservNightlySave.2008-06-19_02.00.13 is waiting execution
>   1889 Full    BackupCatalog.2008-06-19_02.15.15 is waiting execution
> ====
> 
> After cancelling the jobs:
> 
> *mess
> 19-Jun 08:47 fserv-sd JobId 1887: Fatal error:
>      Device "InternalQuantum" with MediaType "SDLT" requested by DIR not 
> found in SD Device resources.
> 19-Jun 08:47 fserv-dir JobId 1887: Fatal error:
>      Storage daemon didn't accept Device "InternalQuantum" because:
>      3924 Device "InternalQuantum" not in SD Device resources.
> 
> BUT IT IS IN SD Device resources - from this particular bacula-sd.conf 
> (which is an alternate config when compared to another system detailed 
> in my first posting):

Well, that looks interesting.

Try to check the internal state of the SD first:

'sta sd=InternalQuantum' or any other device configured on that SD.

If that looks normal - i.e. the InternalQuantum device is available I 
would check how the tape drive presents itself when no tape is loaded 
- under FreeBSD, I believe it usually is not usable. It might be 
necessary to wait a while until the drive has completely registered 
the tape.

Arno

> # A FreeBSD tape drive
> #
> Device {
>   Name = InternalQuantum
>   Description = "DDS-4 for FreeBSD"
>   Media Type = SDLT

I believe your description and media type are a bit misleading...

>   Archive Device = /dev/sa0

I would change that to nsa0, by the way.

>   AutomaticMount = yes
>   AlwaysOpen = yes
>   LabelMedia = yes
>   Offline On Unmount = no
>   Hardware End of Medium = no
>   BSF at EOM = yes
>   Backward Space Record = no
>   Backward Space File = no
>   Fast Forward Space File = no
>   TWO EOF = yes
> }
> 
> And yes I have tried the other configuration for FreeBSD tape drives - 
> same result.
> 
> There is something going on that is preventing bacula from properly 
> dealing with a tape catalog entry marked Recycle when the correct tape 
> is in the drive. Again, when I erase the tape (mt rewind mt weof), I can 
> run these three jobs in the same order as above without problem. I have 
> not yet updated the firmware but I have my doubts since the firmware is 
> fairly up-to-date. The peculiar error is the "waiting on Storage" -  I 
> have not seen this before and when I Google it there is almost nothing 
> found.
> 
> Kern, Dan, Bacula gurus, please have a look at this ... the problem is 
> wearing thin for both the users and myself ... and thank you for an 
> otherwise fantastic product that I have used for years.
> 
> **********************************************
> Bruno Friedmann wrote:
>> All I can talk to you, is that I've got last week a customer having lots of 
>> trouble with
>> their quantum sdlt (IBM release).
>>
>> We get a new tape 10 pack. Inside there's some bad tape. And if you try a 
>> simple operation
>> like loading tape and issue a mt -f /dev/st0 rewind or status or eject we 
>> ended a I/O error
>> Cartridge fault.
>> The bad was we have to stop the serveur, remove the electrical power. Then 
>> relauch.
>>
>> We update the server with lastest firmware. (There's one for the lsi 
>> internal scsi card).
>>
>> Now the tape load, indicate an I/O error cartridge fault, but we keep the 
>> drive working.
>> An eject command is working ( without shutdown ).
>>
>> So my advise would be, try to find the lastest firmware hp update CD ( which 
>> is 8.0 ).
>> And boot with it, to double-check if there's any update ....
>> It really help. (I see this on another customers with ML570 Proliant which 
>> have disk trouble last saturday night).
>>
>> Hope this help you a bit.
>>
>> RA Cohen wrote:
>>   
>>> I have used Bacula with great success with Compaq Proliant DL380 G3s and 
>>> the Compaq (Quantum) 20/40 and 40/80 DLT drives. These were mostly 
>>> external drives plugged into the same SCSI channel as the drives on 
>>> these machines. I never had problems that left me scratching my head 
>>> and, incidentally, never ran these drives with SCSI terminators.
>>>
>>> Now I have had to upgrade several of my customers to the Quantum 160/320 
>>> SDLT and there have been nothing but problems! OK, I understand it is 
>>> best practice to add a separate SCSI controller for tape drives in 
>>> general, and I have done so. I also believe these drives like to be 
>>> terminated, and I have done that. But what in the world is going on when:
>>>
>>> 1. Correct tape is in drive, tape marked Recycle in catalog.
>>> 2. Stat Dir reports the jobs scheduled to run that night correctly, and 
>>> directed to the tape in the drive.
>>> 3. Night comes - unfortunately I do not have the exact message here - 
>>> but the key is the first job is "waiting on device ExternalQuantum" 
>>> [ExternalQuantum] is obviously the drive. The other jobs are also just 
>>> piled up behind as expected. If is issue "mount" I get the same "waiting 
>>> on" message eventually.
>>> 4. If I cancel everything, umount the drive, quit bacula, and run:
>>>
>>> 5. mt -f /dev/sa0 rewind
>>> 6. mt -f  /dev/sa0 weof
>>>
>>> Re-enter bconsole and:
>>>
>>> 7. If I run each job manually (in the identical and correct order 
>>> because the first job loads and last job unloads) without purging and 
>>> deleting the volume I am in the same place exactly.
>>> 8. If I purge and delete the volume and then manually run each job they 
>>> all run perfectly.
>>>
>>> And I also should mention that this seems to be somewhat tape-dependent 
>>> - sometimes they will recycle and accept the job correctly. But, the 
>>> tapes that produce the error mentioned on step #3 seem also to be able 
>>> to accept the jobs when done in the manner of step #8, above, so I 
>>> cannot conclude these tapes are bad. And I also should note I am having 
>>> this problem on two somewhat different systems:
>>>
>>> -Proliant G3 FreeBSD 6.2 server Bacula 2.2.8_2 with drive on separate 
>>> SCSI controller and terminated. External drive array shelf attached to 
>>> on-board controller as are the internal drives.
>>> -Proliant G3 FreeBSD 6.2 server Bacula 2.2.8_2 drive on same on-board 
>>> controller as internal drives and not terminated.
>>>
>>> Here is the drive stuff from bacula-sd.conf:
>>>
>>> # A FreeBSD tape drive
>>> Device {
>>>  Name = Quantum160
>>>  Media Type = SDLT
>>>  Archive Device = /dev/sa0
>>>  AutomaticMount = yes
>>>  AlwaysOpen = yes
>>>  LabelMedia = yes
>>>  Offline On Unmount = no
>>>  Hardware End of Medium = no
>>>  BSF at EOM = no
>>>  Backward Space Record = no
>>>  Backward Space File = no
>>>  Fast Forward Space File = yes
>>>  TWO EOF = no
>>> }
>>>
>>> I have tried the alternate configuration in the Bacula docs with same 
>>> result.
>>>
>>> I'm stumped - any and all help welcomed and appreciated.
>>>
>>>     
>>
>>   
> 

-- 
Arno Lehmann
IT-Service Lehmann
www.its-lehmann.de

-------------------------------------------------------------------------
Check out the new SourceForge.net Marketplace.
It's the best place to buy or sell services for
just about anything Open Source.
http://sourceforge.net/services/buy/index.php
_______________________________________________
Bacula-users mailing list
Bacula-users AT lists.sourceforge DOT net
https://lists.sourceforge.net/lists/listinfo/bacula-users