Bacula-users

Re: [Bacula-users] Is there any scsi list ? (MSL6000 problem)

2009-11-03 11:44:17
Subject: Re: [Bacula-users] Is there any scsi list ? (MSL6000 problem)
From: Javier Barroso <javibarroso AT gmail DOT com>
To: javibarroso AT gmail DOT com
Date: Tue, 3 Nov 2009 17:37:18 +0100
Hi again,

On Tue, Nov 3, 2009 at 11:54 AM, Javi Barroso <javibarroso AT gmail DOT com> 
wrote:
> Hi,
>
> I'm having trouble with a MSL6000 tape cabine.
>
> I would like to ask to help, but I don't known which is the appropiate
> list.
>
> So, I'm asking you about the problem, and I'll be appreciate any
> pointer.
>
> My problem:
>
> I have a -old (debian sarge)- bacula system with a MSL6000 attached
> with fiber channel. This system was working until 2 weeks ago.
>
> The logs are (a lot of times):
> Bacula:
> 28-Oct 01:47 backup-sd: BackupCatalog.2009-10-27_20.00.22 Fatal error:
> 3992 Bad autochanger "load slot 5, drive 0": ERR=Child died from
> signal 15: Termination.
> 28-Oct 01:47 backup-fd: BackupCatalog.2009-10-27_20.00.22 Fatal error:
> job.c:1617 Bad response to Append Data command. Wanted 3000 OK data
> , got 3903 Error append data
>
> /var/log/messages:
> Oct 23 15:31:04 backup kernel: st0: Error 70000 (sugg. bt 0x0, driver
> bt 0x0, host bt 0x7).
>
> Command line:
> # mtx -f /dev/autochanger1 status
> cannot open SCSI device '/dev/autochanger1' - No such file or
> directory
> # ls -l /dev/autochanger1
> lrwxrwxrwx 1 root root 3 2009-10-30 09:47 /dev/autochanger1 -> sg4
>
> The device MSL6000 seems to be present in the system:
> # lsscsi -g
> [0:0:0:0]    storage HP       HSV200           5110  -         /dev/
> sg0
> [0:0:1:0]    storage HP       HSV200           5110  -         /dev/
> sg1
> [0:0:2:0]    mediumx HP       MSL6000 Series   0520  /dev/sch0  /dev/
> sg4
> [0:0:2:1]    tape    HP       Ultrium 3-SCSI   G63W  /dev/st2  /dev/
> sg5
> [0:0:2:2]    tape    HP       Ultrium 3-SCSI   G63W  /dev/st3  /dev/
> sg6
> [0:0:2:3]    storage HP       NS E1200-320     571f  -         /dev/
> sg7
> [1:0:3:0]    tape    HP       Ultrium 3-SCSI   G63W  /dev/st0  /dev/
> sg2
> [1:0:4:0]    tape    HP       Ultrium 3-SCSI   G54W  /dev/st1  /dev/
> sg3
>
> When I try to remove and add, it dissappear:
> # echo "scsi remove-single-device 0 0 2 0" > /proc/scsi/scsi
> # echo "scsi add-single-device 0 0 2 0" > /proc/scsi/scsi
> # dmesg
> lpfc 0000:03:01.0: 0:0713 SCSI layer issued LUN reset (2, 0) Data:
> x2002 x3 x2
> lpfc 0000:03:01.0: 0:0714 SCSI layer issued Bus Reset Data: x2002
> scsi 0:0:2:0: scsi: Device offlined - not ready after error recovery
>
> # lsscsi -g
> [0:0:0:0]    storage HP       HSV200           5110  -         /dev/
> sg0
> [0:0:1:0]    storage HP       HSV200           5110  -         /dev/
> sg1
> [0:0:2:1]    tape    HP       Ultrium 3-SCSI   G63W  /dev/st2  /dev/
> sg5
> [0:0:2:2]    tape    HP       Ultrium 3-SCSI   G63W  /dev/st3  /dev/
> sg6
> [0:0:2:3]    storage HP       NS E1200-320     571f  -         /dev/
> sg7
> [1:0:3:0]    tape    HP       Ultrium 3-SCSI   G63W  /dev/st0  /dev/
> sg2
> [1:0:4:0]    tape    HP       Ultrium 3-SCSI   G54W  /dev/st1  /dev/
> sg3
>
> If then I reboot the system, 2 tapes , NS E1200-320 and MSL6000
> dissappear (all connected by fiber channel):
> # lsscsi
> [0:0:0:0]    storage HP       HSV200           5110  -
> [0:0:1:0]    storage HP       HSV200           5110  -
> [1:0:3:0]    tape    HP       Ultrium 3-SCSI   G63W  /dev/st0
> [1:0:4:0]    tape    HP       Ultrium 3-SCSI   G54W  /dev/st1
>
> Two tapes are connected by scsi wire.
>
> If I power off - power on MSL6000 linux get the device:
>
> # dmesg
>  Vendor: HP        Model: MSL6000 Series    Rev: 0520
>  Type:   Medium Changer                     ANSI SCSI revision: 02
>  Vendor: HP        Model: Ultrium 3-SCSI    Rev: G63W
>  Type:   Sequential-Access                  ANSI SCSI revision: 03
> st 0:0:2:1: Attached scsi tape st2
> st2: try direct i/o: yes (alignment 512 B)
>  Vendor: HP        Model: Ultrium 3-SCSI    Rev: G63W
>  Type:   Sequential-Access                  ANSI SCSI revision: 03
> st 0:0:2:2: Attached scsi tape st3
> st3: try direct i/o: yes (alignment 512 B)
>  Vendor: HP        Model: NS E1200-320      Rev: 571f
>  Type:   RAID                               ANSI SCSI revision: 04
> SCSI Media Changer driver v0.25
> ch0: type #1 (mt): 0x0+1 [medium transport]
> ch0: type #2 (st): 0x20+58 [storage]
> ch0: type #3 (ie): 0x1c0+2 [import/export]
> ch0: type #4 (dt): 0x1e0+4 [data transfer]
> ch0: dt 0x1e0: READ ELEMENT STATUS failed
> ch0: dt 0x1e1: READ ELEMENT STATUS failed
> ch0: dt 0x1e2: READ ELEMENT STATUS failed
> ch0: dt 0x1e3: READ ELEMENT STATUS failed
> ch0: INITIALIZE ELEMENT STATUS, may take some time ...
> ch0: ... finished
> ch 0:0:2:0: Attached scsi changer ch0
> st 1:0:3:0: Attached scsi generic sg0 type 1
> st 1:0:4:0: Attached scsi generic sg1 type 1
> scsi 0:0:0:0: Attached scsi generic sg2 type 12
> scsi 0:0:1:0: Attached scsi generic sg3 type 12
> ch 0:0:2:0: Attached scsi generic sg4 type 8
> st 0:0:2:1: Attached scsi generic sg5 type 1
> st 0:0:2:2: Attached scsi generic sg6 type 1
>
> Then I can remove and add the device and MSL6000 rescan the labels
>
> # echo "scsi remove-single-device 0 0 2 0" > /proc/scsi/scsi
> # echo "scsi add-single-device 0 0 2 0" > /proc/scsi/scsi
> scsi 0:0:2:3: Attached scsi generic sg7 type 12
>  Vendor: HP        Model: MSL6000 Series    Rev: 0520
>  Type:   Medium Changer                     ANSI SCSI revision: 02
> ch0: type #1 (mt): 0x0+1 [medium transport]
> ch0: type #2 (st): 0x20+58 [storage]
> ch0: type #3 (ie): 0x1c0+2 [import/export]
> ch0: type #4 (dt): 0x1e0+4 [data transfer]
> ch0: dt 0x1e0: ID 1, LUN 0, name: HP       HSV200           5110
> ch0: dt 0x1e1: ID 2, LUN 0, name: HP       MSL6000 Series   0520
> ch0: dt 0x1e2: ID 3, LUN 0, Huh? device not found!
> ch0: dt 0x1e3: ID 4, LUN 0, Huh? device not found!
> ch0: INITIALIZE ELEMENT STATUS, may take some time ...
> ch0: ... finished
> ch 0:0:2:0: Attached scsi changer ch0
> ch 0:0:2:0: Attached scsi generic sg4 type 8
>
> Now I'm going to program a cron job executing
> mtx -f /dev/autochanger1 status
>
> every 5 minutes, so I can know when it fails again :(

It failed again, one hour latter.

I'm seeing in fiber channel switch, and port where cabine is attached
is in "In_Sync" state all the time. Now I'm searching what is it mean
...

Thanks

------------------------------------------------------------------------------
Come build with us! The BlackBerry(R) Developer Conference in SF, CA
is the only developer event you need to attend this year. Jumpstart your
developing skills, take BlackBerry mobile applications to market and stay 
ahead of the curve. Join us from November 9 - 12, 2009. Register now!
http://p.sf.net/sfu/devconference
_______________________________________________
Bacula-users mailing list
Bacula-users AT lists.sourceforge DOT net
https://lists.sourceforge.net/lists/listinfo/bacula-users