Veritas-bu

Re: [Veritas-bu] make_scsi_dev woes under Linux

2008-04-09 18:36:48
Subject: Re: [Veritas-bu] make_scsi_dev woes under Linux
From: Kathryn Hemness <kfhemness AT ucdavis DOT edu>
To: veritas-bu AT mailman.eng.auburn DOT edu
Date: Wed, 9 Apr 2008 15:25:28 -0700 (PDT)
Hi Dan and NetBackup advisors,

I realize this subject is from a very old thread (Tue, 15 Aug 2006)
but the situation from 2006 most closely resembles my current problem.

I've been upgrading my RHEL3 (32-bit) Linux NetBackup 5.1MP6 servers to RHEL4
(64-bit).  All of the servers are Sun X4200 servers.

All of the upgrades on my media servers went very well.  My
colleague and I devised a disk-cloning mechanism for building out a
set of RHEL4 disks using spare drive bays and customize them for
the server we wish to upgrade.  We then put the pre-built disks in the
server, boot the server with the new disks.  The last media server
we upgraded using this process only took about 2 hours of downtime.

Now all of my media servers have been upgraded and we are attempting to
use the same process for upgrading my Master server.  The problem
we are having with this master upgrade is in NetBackup's inability to
control the tape library robotic.

The tape drives and the changer are all visible.  We have managed
to get our udev/rules.d/20-local.rules file so that it detects
the tape drives and changer even if the drives have tapes mounted.
The drives and changer are configured via tpconfig; the global database
synchronizes without error; the robtest utility can move tapes, load and
unload them.

But when I tried a test restore to test NetBackup's ability to load and
unload tapes, I got the following segfault:

Apr  9 11:57:11 errol tldd[1432]: TLD(0) MountTape 040943 on drive 5, from slot 
182
Apr  9 11:57:11 errol kernel: tldd[1869]: segfault at 0000000000000000 rip 
0000000000807f4d rsp 00000000ffffca20 error 6
Apr  9 11:57:11 errol tldd[1432]: DecodeMount(): TLD(0) drive 5, Actual status: 
Process killed by signal
Apr  9 11:57:11 errol tldd[1432]: Unexpected response status (11) in DecodeMount


I'd appreciate any advice at this point. Could this segfault be caused by my
OS upgrade from a 32-bit to a 64-bit OS?  The udev/haldaemon device handling
was a huge difference between RHEL3 and RHEL4 too.



> Today's Topics:
>
>    6. make_scsi_dev woes under Linux (Daniel Cox)
>
>
>
> ------------------------------
>
> Message: 6
> Date: Tue, 15 Aug 2006 11:31:38 -0500
> From: "Daniel Cox" <DCox AT nyse DOT com>
> Subject: [Veritas-bu] make_scsi_dev woes under Linux
> To: <veritas-bu AT mailman.eng.auburn DOT edu>
> Message-ID:
>       <BCBCA4E637F53049A896D8350ADF78960423C256 AT arcachmail05.tradearca DOT 
> com>
> Content-Type: text/plain; charset="us-ascii"
>
>
> We've got a few media servers running NetBackup 5.1 MP5 under Linux
> (RedHat AS4) and we're having no end of problems with FC attached tape
> drive device mappings. I see when NB starts it runs make_scsi_dev, which
> creates the following devices:
>
>
>
>  [ROOT@arcachnbmm03] ~ # ls -l /dev/st
>
> total 0
>
> lrwxrwxrwx  1 root root 8 2006-08-15 12:28 h0c0t0l0 -> /dev/st5
>
> lrwxrwxrwx  1 root root 8 2006-08-15 12:28 h0c0t1l0 -> /dev/st4
>
> lrwxrwxrwx  1 root root 8 2006-08-15 12:28 h0c0t2l0 -> /dev/st3
>
> lrwxrwxrwx  1 root root 8 2006-08-15 12:28 h1c0t0l0 -> /dev/st1
>
> lrwxrwxrwx  1 root root 8 2006-08-15 12:28 h1c0t1l0 -> /dev/st0
>
> lrwxrwxrwx  1 root root 8 2006-08-15 12:28 h1c0t2l0 -> /dev/st2
>
> lrwxrwxrwx  1 root root 9 2006-08-15 12:28 nh0c0t0l0 -> /dev/nst5
>
> lrwxrwxrwx  1 root root 9 2006-08-15 12:28 nh0c0t1l0 -> /dev/nst4
>
> lrwxrwxrwx  1 root root 9 2006-08-15 12:28 nh0c0t2l0 -> /dev/nst3
>
> lrwxrwxrwx  1 root root 9 2006-08-15 12:28 nh1c0t0l0 -> /dev/nst1
>
> lrwxrwxrwx  1 root root 9 2006-08-15 12:28 nh1c0t1l0 -> /dev/nst0
>
> lrwxrwxrwx  1 root root 9 2006-08-15 12:28 nh1c0t2l0 -> /dev/nst2
>
>
>
> There seems to be 2 big problems with this. The devices as created by
> the OS (st*, nst*) can change due to HBA driver upgrades, PCI bus
> detection order changes, somebody moving an HBA around on the system or
> somebody moving a drive around in the SAN for various reasons (port
> based zoning). Another problem is if any of the previous scenarios occur
> then NB creates entirely different /dev/st/*, /dev/sg/* entries to
> represent the new host/controller/target/lun detection order. Naturally
> either of these scenarios results in drive and robotic library id
> mismatches and either netbackup refusing to start or drives going into
> perm DOWN state.
>
>
>
> We can use 2.6 kernel udev rules to map WWNs to OS devices and always
> have consistent /dev/st*, /dev/sg* device names to get around the first
> problem; however the NB auto-created devices can still change so we are
> stuck with things occasionally breaking and then we waste a fare amount
> of time putting it all back together again.
>
>
>
> Is there some better way of handling this?
>
>
>
> Dan-
>


--Kathy

============================================================
Kathryn Hemness                        kfhemness AT ucdavis DOT edu
Infrastructure Services                phone: 530.752.6547
Campus Data Center & Client Services   fax:   530.752.9154
_______________________________________________
Veritas-bu maillist  -  Veritas-bu AT mailman.eng.auburn DOT edu
http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu

<Prev in Thread] Current Thread [Next in Thread>
  • Re: [Veritas-bu] make_scsi_dev woes under Linux, Kathryn Hemness <=