Bacula-users

[Bacula-users] Problem on Solaris platform

2009-04-28 16:32:27
Subject: [Bacula-users] Problem on Solaris platform
From: Arno Lehmann <al AT its-lehmann DOT de>
To: bacula-users <bacula-users AT lists.sourceforge DOT net>
Date: Tue, 28 Apr 2009 22:25:48 +0200
Hello,

I'm experiencing a problem on a machine running Sun's Solaris. It's an 
  x86 machine running Solaris 10:

# uname -a
SunOS blah.domain.tld 5.10 Generic_137138-09 i86pc i386 i86pc

The machine is running Bacula version 2.2.8 (yes, I know it's 
outdated, but that's what blastwave offers, and compiling is not an 
option currently...). The tape library is a two-drive Qualstar one 
using AIT-5 drives connected to a parallel-SCSI HBA.

Some times - not easily reproduced - the tape drive or autochanger 
seems to have problems like these:

16-Apr 02:49 blah-sd JobId 2097: Error: block.c:569 Write error at 
472:3118 on device "AIT5-drv1" (/dev/rmt/1cbn). ERR=I/O\ error.
16-Apr 02:49 blah-sd JobId 2097: Error: Error writing final EOF to 
tape. This Volume may not be readable.
dev.c:1669 ioctl MTWEOF error on "AIT5-drv1" (/dev/rmt/1cbn). ERR=I/O 
error.
16-Apr 02:49 blah-sd JobId 2097: End of medium on Volume "A00131" 
Bytes=470,705,485,824 Blocks=7,296,401 at 16-Apr-2009 02\:49.
16-Apr 02:49 blah-sd JobId 2097: 3307 Issuing autochanger "unload slot 
19, drive 1" command.
16-Apr 02:50 blah-sd JobId 2097: 3995 Bad autochanger "unload slot 19, 
drive 1": ERR=Child exited with code 1
Results=/dev/rmt/1cbn: no tape loaded or drive offline
Unloading drive 1 into Storage Element 19...mtx: Request Sense: Long 
Report=yes
mtx: Request Sense: Valid Residual=no
mtx: Request Sense: Error Code=70 (Current)
mtx: Request Sense: Sense Key=Illegal Request
mtx: Request Sense: FileMark=no
mtx: Request Sense: EOM=no
mtx: Request Sense: ILI=no
mtx: Request Sense: Additional Sense Code = 3B
mtx: Request Sense: Additional Sense Qualifier = 90
mtx: Request Sense: Field in Error = 00
mtx: Request Sense: BPV=no
mtx

At first this looks like a "regular" tape or drive error. 
Unfortunately, as also seen, it also affects the autoloader which 
can't unload the tape.

In the system log, I find errors like these at that times:

Apr 27 20:40:04 blah.domain.tld itmpt: [ID 556182 kern.info] itmpt0: 
target 2 fallback from Ultra Wide to Ultra Narrow

*** this seems to indicate a serious SCSI problem, probably 
hardware-related.

Apr 27 20:40:04 blah.domain.tld scsi: [ID 107833 kern.warning] 
WARNING: 
/pci@0,0/pci8086,25e2@2/pci8086,3500@0/pci8086,3510@0/pci10b5,8114@0/pci103c,322a@8/st@2,0
 
(st7):
Apr 27 20:40:04 blah.domain.tld        SCSI transport failed: reason 
'reset': giving up

*** then the bus is reset

Apr 27 20:40:07 blah.domain.tld scsi: [ID 107833 kern.warning] 
WARNING: 
/pci@0,0/pci8086,25e2@2/pci8086,3500@0/pci8086,3510@0/pci10b5,8114@0/pci103c,322a@8/st@2,0
 
(st7):
Apr 27 20:40:07 blah.domain.tld        Error for Command: 
rezero/rewind           Error Level: Fatal
Apr 27 20:40:07 blah.domain.tld scsi: [ID 107833 kern.notice] 
Requested Block: 3460                      Error Block: 3460
Apr 27 20:40:07 blah.domain.tld scsi: [ID 107833 kern.notice]  Vendor: 
SONY                               Serial Number:
Apr 27 20:40:07 blah.domain.tld scsi: [ID 107833 kern.notice]  Sense 
Key: Not Ready
Apr 27 20:40:07 blah.domain.tld scsi: [ID 107833 kern.notice]  ASC: 
0x4 (LUN not ready), ASCQ: 0x0, FRU: 0x0

The cabling has been checked, double- and triple-checked by now. The 
HBA has been replaced. Mtx typically works correctly, only after 
things are broken some manual intervention is required, i.e. the 
library needs a pwer-cycle or manual unloading of the affected cartridge.

There is no pattern I can see regarding the tape cartridges, but I'm 
quite sure the problem always affects the same drive (though those two 
drives are not necessarily evenly loaded).

I think it's either a hardware problem with the tape library, or a 
SCSI related driver issue.

As I neither have another Qualstar or AIT device here, nor an other 
Sun machine for comparison, I'd like to know if any of you have seen 
similar problems. Or know better than I do what Solaris is actually 
complaining about :-)

Some advice which manufacturer - Sun or Qualstar - to first contact 
would also be appreciated!

Thanks,

Arno


-- 
Arno Lehmann
IT-Service Lehmann
Sandstr. 6, 49080 Osnabrück
www.its-lehmann.de

------------------------------------------------------------------------------
Register Now & Save for Velocity, the Web Performance & Operations 
Conference from O'Reilly Media. Velocity features a full day of 
expert-led, hands-on workshops and two days of sessions from industry 
leaders in dedicated Performance & Operations tracks. Use code vel09scf 
and Save an extra 15% before 5/3. http://p.sf.net/sfu/velocityconf
_______________________________________________
Bacula-users mailing list
Bacula-users AT lists.sourceforge DOT net
https://lists.sourceforge.net/lists/listinfo/bacula-users

<Prev in Thread] Current Thread [Next in Thread>
  • [Bacula-users] Problem on Solaris platform, Arno Lehmann <=