Veritas-bu

[Veritas-bu] RE:Veritas-bu] End of Tape (from Nov. 2004)

2005-01-12 10:04:23
Subject: [Veritas-bu] RE:Veritas-bu] End of Tape (from Nov. 2004)
From: kfhemness AT ucdavis DOT edu (Kathryn Hemness)
Date: Wed, 12 Jan 2005 07:04:23 -0800 (PST)
I'm current up to 113277-26. I'll be looking at the HBA drivers next.

Here's what the lsiutil reports for my HBAs:

         Port Name         Chip Vendor/Type    MPT Rev  Firmware Rev
  1.  itmpt0            LSI Logic FC919X      103      01020000
  2.  itmpt1            LSI Logic FC919X      103      01020000

These HBAs are also only 2 months old, so I'd expect the FW to
be current.



On Wed, 12 Jan 2005, Tim Hoke wrote:

> Date: Wed, 12 Jan 2005 08:43:36 -0600
> From: Tim Hoke <thoke AT northpeak DOT org>
> To: Kathryn Hemness <kfhemness AT ucdavis DOT edu>
> Cc: "Chapman, Scott" <Scott.Chapman AT icbc DOT com>
> Subject: Re: [Veritas-bu] RE:Veritas-bu] End of Tape (from Nov. 2004)
>
> When you say VERITAS Drivers, that would typically mean Windows
> platforms.  The only "VERITAS Driver" that is provided for Solaris is
> the sg driver (scsi passthru).
>
> So, you should be using the SUN st driver (SCSI Tape).
>
> According to my records (Sunsolve), the st driver is in SUN patch
> 113277 and the current revision is -26.  Native support for your IBM
> Ultrium-TD2 drives was introduced in the -10 release.  So, as long as
> you are at -10 or above, you shouldn't be using any st.conf entries.
> However, I don't work for Sun support, so you really should verify it
> with them.
>
> I'd also suspect any HBA drivers/firmware or any other fiber devices
> too.
>
> -Tim
>
> On Jan 12, 2005, at 8:12 AM, Kathryn Hemness wrote:
>
> > I've also googled and found that article.  I'm definitely using the
> > veritas drivers.
> >
> >
> > On Tue, 11 Jan 2005, Chapman, Scott wrote:
> >
> >> Date: Tue, 11 Jan 2005 20:07:10 -0800
> >> From: "Chapman, Scott" <Scott.Chapman AT icbc DOT com>
> >> To: 'Kathryn Hemness' <kfhemness AT ucdavis DOT edu>
> >> Subject: RE: [Veritas-bu] RE:Veritas-bu] End of Tape (from Nov. 2004)
> >>
> >> Kathryn, I did a google search with "External event caused rewind
> >> during"
> >> and the first hit mentions something from your logs:
> >>> From the 5.0mp2 patch release . . .
> >>    "For a checkpoint restart backup, in the rare case where bptm
> >> detects
> >>    that a position check problem occurred following a checkpoint
> >> because of
> >> a
> >>    misconfigured drive or a rewind from an external source, and the
> >> backup
> >> is
> >>    later resumed, the information on the tape prior to the checkpoint
> >> may be
> >>
> >>    invalid.
> >>
> >>    The bptm log would indicate the position check problem with one of
> >> the
> >>    following logs after a checkpoint:
> >>
> >>    08:39:57.969 [4393] <16> write_data: FREEZING media id 00011, too
> >> many
> >>    data blocks written, check tape/driver block size configuration
> >>
> >>    OR
> >>
> >>    log.041204:14:39:12.373 [6416] <16> write_data: FREEZING media id
> >> 00005,
> >> <<<< here is what your logs reflect also
> >>    External event caused rewind during write, all data on media is
> >> lost
> >>
> >>    The problem would occur if the same backup were resumed and
> >> completed
> >> with
> >>    a successful status."
> >>
> >> The one thing is this piece of information is that they mention a
> >> misconfigured drive.
> >>
> >> Question 1) Do you have the latest st driver patch installed on the
> >> backup
> >> server?
> >> Question 2) You are using the Veritas tape drivers right?  This is
> >> very
> >> important, as there doesn't seem to be many people having luck with
> >> non-veritas drivers.
> >>
> >> Here is the google search
> >> http://www.google.ca/search?
> >> hl=en&q=%22External+event+caused+rewind+during%2
> >> 2&meta=
> >>
> >>
> >> -----Original Message-----
> >> From: Kathryn Hemness [mailto:kfhemness AT ucdavis DOT edu]
> >> Sent: Tuesday, January 11, 2005 4:36 PM
> >> To: veritas-bu AT mailman.eng.auburn DOT edu
> >> Cc: scott.chapman AT icbc DOT com; song_1977 AT yahoo DOT com
> >> Subject: RE: [Veritas-bu] RE:Veritas-bu] End of Tape (from Nov. 2004)
> >>
> >>
> >> Greetings --
> >>
> >> I ran a successful backup using the
> >> /opt/openv/netbackup/db/config/NO_POSITION_CHECK
> >> setting suggested by Scott Chapman.
> >>
> >> Then I google'd for NO_POSITION_CHECK and found the following Veritas
> >> Support patch readme which had a good explanation for the behavior I'm
> >> seeing:
> >>
> >> http://seer.support.veritas.com/docs/246368.htm
> >>
> >> What's really funny is that this readme is for NB3.4 in 2002.
> >>
> >> Now that I know the cause of the problem, I need to determine a
> >> solution
> >> which will enable me to use the checkpoint restart feature of
> >> NetBackup 5.1.
> >>
> >> I welcome any suggestions.  I'm hoping there are easy Solaris or LSI
> >> Logic
> >> HBA commands for the final solution.
> >>
> >> On Tue, 11 Jan 2005, Chapman, Scott wrote:
> >>
> >>> Date: Tue, 11 Jan 2005 15:37:29 -0800
> >>> From: "Chapman, Scott" <Scott.Chapman AT icbc DOT com>
> >>> To: 'Kathryn Hemness' <kfhemness AT ucdavis DOT edu>
> >>> Subject: RE: [Veritas-bu] RE:Veritas-bu] End of Tape (from Nov. 2004)
> >>>
> >>>
> >>> I am running 4.5fp5 and 5.0 at a different site.  You aren't running
> >>> the IBM driver for the tape drives are you?  I know that has caused
> >>> some problems for people.
> >>>
> >>> What does "sgscan -v conf" show?  When I run that it confirms that
> >>> the
> >>> drive config does not come from the st.conf by putting
> >>> "NOT-IN-ST-CONFIG-FILE" at the end of each tape drive line . . .
> >>>
> >>> Scott Chapman
> >>> ICBC - Victoria, Government St.
> >>> Phone: 250.414.7650  Cell: 250.213.9295
> >>>
> >>>
> >>>
> >>> -----Original Message-----
> >>> From: Kathryn Hemness [mailto:kfhemness AT ucdavis DOT edu]
> >>> Sent: Tuesday, January 11, 2005 3:13 PM
> >>> To: K Chapman
> >>> Cc: Chapman, Scott; veritas-bu AT mailman.eng.auburn DOT edu;
> >>> song_1977 AT yahoo DOT com
> >>> Subject: RE: [Veritas-bu] RE:Veritas-bu] End of Tape (from Nov. 2004)
> >>>
> >>>
> >>>
> >>>
> >>> Turning off checkpoints was something I did early in my
> >>> troubleshooting attempts.
> >>>
> >>> I've just turned off a couple of Solaris storage managment daemons
> >>> (ssdgrptd and
> >>> ssagent) on my server and am running another test backup now.  It
> >>> should finish in about 15 more minutes.
> >>>
> >>> I'll try the NO_POSITION_CHECK after this test finishes and let you
> >>> know what happens.
> >>>
> >>>
> >>> On Tue, 11 Jan 2005, K Chapman wrote:
> >>>
> >>>> Date: Tue, 11 Jan 2005 14:54:31 -0800 (PST)
> >>>> From: K Chapman <tech2187 AT yahoo DOT com>
> >>>> To: Kathryn Hemness <kfhemness AT ucdavis DOT edu>,
> >>>>      "Chapman, Scott" <Scott.Chapman AT icbc DOT com>
> >>>> Cc: veritas-bu AT mailman.eng.auburn DOT edu, song_1977 AT yahoo DOT com
> >>>> Subject: RE: [Veritas-bu] RE:Veritas-bu] End of Tape (from Nov.
> >>>> 2004)
> >>>>
> >>>> as a test, can you try with the position check turned
> >>>> off?
> >>>>
> >>>> touch /opt/openv/netbackup/db/config/NO_POSITION_CHECK
> >>>>
> >>>> --- Kathryn Hemness <kfhemness AT ucdavis DOT edu> wrote:
> >>>>
> >>>>> Hi, Scott -
> >>>>>
> >>>>> Here's the output of my sgscan -v:
> >>>>>
> >>>>> /dev/sg/c0t3l2: Tape (/dev/rmt/0): "IBM
> >>>>> ULTRIUM-TD2     4770"
> >>>>> /dev/sg/c0t3l3: Tape (/dev/rmt/1): "IBM
> >>>>> ULTRIUM-TD2     4770"
> >>>>> /dev/sg/c0t3l4: Tape (/dev/rmt/2): "IBM
> >>>>> ULTRIUM-TD2     4770"
> >>>>>
> >>>>> We got the library in October.  The drives should be
> >>>>> at the current FW level.
> >>>>>
> >>>>> Are you using NB51?
> >>>>>
> >>>>> On Tue, 11 Jan 2005, Chapman, Scott wrote:
> >>>>>
> >>>>>> Date: Tue, 11 Jan 2005 10:42:36 -0800
> >>>>>> From: "Chapman, Scott" <Scott.Chapman AT icbc DOT com>
> >>>>>> To: 'Kathryn Hemness' <kfhemness AT ucdavis DOT edu>,
> >>>>>>      veritas-bu AT mailman.eng.auburn DOT edu
> >>>>>> Cc: song_1977 AT yahoo DOT com
> >>>>>> Subject: RE: [Veritas-bu] RE:Veritas-bu] End of
> >>>>> Tape (from Nov. 2004)
> >>>>>>
> >>>>>> Kathryn are you running current firmware on the
> >>>>> LTO2 drives?  I seem to
> >>>>>> remember something about old firmware doing
> >>>>> rewinds before netbackup was
> >>>>>> done with the drive . . .
> >>>>>>> From your logs:
> >>>>>> 01/10/2005 13:48:50 albus.ucdavis.edu
> >>>>> albus.ucdavis.edu  FREEZING media id
> >>>>>> 040004, External event caused rewind during write,
> >>>>> all data on media is lost
> >>>>>>
> >>>>>> I am running IBM drives (we don't use the LSI
> >>>>> logic HBA's) and here is some
> >>>>>> output from sgscan -v conf:
> >>>>>> /dev/sg/c2t0l0: Tape (/dev/rmt/0): "IBM
> >>>>> ULTRIUM-TD2     38D0" :
> >>>>>> NOT-IN-ST-CONFIG-FILE
> >>>>>> /dev/sg/c2t1l0: Tape (/dev/rmt/1): "IBM
> >>>>> ULTRIUM-TD2     38D0" :
> >>>>>> NOT-IN-ST-CONFIG-FILE
> >>>>>> /dev/sg/c2t2l0: Tape (/dev/rmt/2): "IBM
> >>>>> ULTRIUM-TD2     38D0" :
> >>>>>> NOT-IN-ST-CONFIG-FILE
> >>>>>> ...
> >>>>>>
> >>>>>> I don't have anything in the st.conf for the
> >>>>> drives as they have been added
> >>>>>> to the st several patches ago.  You might check
> >>>>> you st patch level as well .
> >>>>>> . .
> >>>>>>
> >>>>>> Hope this helps.
> >>>>>>
> >>>>>> Scott Chapman
> >>>>>> ICBC - Victoria, Government St.
> >>>>>> Phone: 250.414.7650  Cell: 250.213.9295
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>> -----Original Message-----
> >>>>>> From: Kathryn Hemness
> >>>>> [mailto:kfhemness AT ucdavis DOT edu]
> >>>>>> Sent: Tuesday, January 11, 2005 10:02 AM
> >>>>>> To: veritas-bu AT mailman.eng.auburn DOT edu
> >>>>>> Cc: song_1977 AT yahoo DOT com
> >>>>>> Subject: [Veritas-bu] RE:Veritas-bu] End of Tape
> >>>>> (from Nov. 2004)
> >>>>>>
> >>>>>>
> >>>>>> Good Morning --
> >>>>>>
> >>>>>> Was there ever a resolution to your NB5.0MP2/LTO
> >>>>> end of tape problem?
> >>>>>>
> >>>>>> I'm currently fighting with a new installation
> >>>>> NB5.1 on a Solaris 9 system
> >>>>>> using
> >>>>>> LTO2 tape drives.  My backups ALWAYS fail either
> >>>>> at a checkpoint-restart
> >>>>>> WRITE or
> >>>>>> at the very last WRITE of the backup, regardless
> >>>>> of how big the backup is.
> >>>>>>
> >>>>>> I've been told by my NetBackup tech support (via
> >>>>> Sun) that it was a hardware
> >>>>>> configuration problem.
> >>>>>>
> >>>>>> The backups always fail, regardless of any st.conf
> >>>>> modifications and I've
> >>>>>> even
> >>>>>> taken the fiber switch out of the mix.  Here's a
> >>>>> summary of my hardware and
> >>>>>> the
> >>>>>> types of errors I'm seeing (by the way, ufsdump
> >>>>> works just  fine....).
> >>>>>>
> >>>>>> Master: Solaris 9 version 4/04 on a Sun V240 with
> >>>>> 2 LSI Logic FC919X HBAs
> >>>>>> running
> >>>>>> NB5.1 Enterprise Server.  One LSI Logic HBA is
> >>>>> connected directly to the
> >>>>>> fiber/scsi
> >>>>>> bridge of a Qualstar 88264 LTO2 library, the other
> >>>>> to a Brocade 32-port
> >>>>>> fiber
> >>>>>> switch attached to a Sun 3511 storage array.
> >>>>>>
> >>>>>> I have tried at least 4 different st.conf LTO2
> >>>>> configurations with same
> >>>>>> failing
> >>>>>> results and am now not using any special LTO2
> >>>>> definitions.
> >>>>>>
> >>>>>> Here are the failure errors from both the
> >>>>> NetBackup reports and from the
> >>>>>> bptm logs:
> >>>>>>
> >>>>>> 01/10/2005 13:48:50 albus.ucdavis.edu
> >>>>> albus.ucdavis.edu  FREEZING media id
> >>>>>> 040004, External event caused rewind during write,
> >>>>> all data on media is lost
> >>>>>> 01/10/2005 13:48:54 albus.ucdavis.edu
> >>>>> albus.ucdavis.edu  CLIENT
> >>>>>> albus.ucdavis.edu  POLICY IR-ISM_02  SCHED
> >>>>> WeeklyFull  EXIT STATUS 84 (media
> >>>>>> write error)
> >>>>>> 01/10/2005 13:48:54 albus.ucdavis.edu
> >>>>> albus.ucdavis.edu  backup of client
> >>>>>> albus.ucdavis.edu exited with status 84 (media
> >>>>> write error)
> >>>>>>
> >>>>>> Here's the bptm log entry for the above error:
> >>>>>>
> >>>>>> 13:48:48.032 [1297] <2> write_backup: tp.tv_sec =
> >>>>> 1105393728, stp.tv_sec =
> >>>>>> 1105391634, tp.tv_usec = 27455, stp.tv_usec =
> >>>>> 544901, et = 2093483,
> >>>>>> mpx_total_kbytes[TWIN_INDEX = 0] = 21261376 13:48:48.075 [1297]
> >>>>>> <2> io_terminate_tape: writing
> >>>>> empty backup header,
> >>>>>> drive index 0, copy 1
> >>>>>> 13:48:48.091 [1297] <2> io_ioctl: command
> >>>>> (0)MTWEOF 1 from (bptm.c.7919) on
> >>>>>> drive index 0
> >>>>>> 13:48:48.645 [1297] <2> io_write_back_header:
> >>>>> drive index 0, empty_file,
> >>>>>> file num = 2, mpx_headers = 0, copy 1
> >>>>>> 13:48:48.650 [1297] <2> io_close: closing
> >>>>>> /usr/openv/netbackup/db/media/tpreq/040004, from
> >>>>> bptm.c.8046
> >>>>>> 13:48:50.848 [1297] <2> io_terminate_tape:
> >>>>> absolute block position prior to
> >>>>>> writing empty header is 332201, copy 1
> >>>>>> 13:48:50.848 [1297] <2> io_terminate_tape: block
> >>>>> position check: actual
> >>>>>> 332201, expected 332213
> >>>>>> 13:48:50.848 [1297] <2> set_job_details: Sending
> >>>>> Tfile jobid (907)
> >>>>>> 13:48:50.848 [1297] <2> set_job_details: LOG
> >>>>> 1105393730 16 bptm 1297
> >>>>>> FREEZING media id 040004, External event caused
> >>>>> rewind during write, all
> >>>>>> data on media is lost
> >>>>>>
> >>>>>> 13:48:50.848 [1297] <2> set_job_details: Done 13:48:50.880
> >>>>>> [1297] <16> io_terminate_tape:
> >>>>> FREEZING media id 040004,
> >>>>>> External event caused rewind during write, all
> >>>>> data on media is lost
> >>>>>> 13:48:50.898 [1297] <2> log_media_error:
> >>>>> successfully wrote to error file -
> >>>>>> 01/10/05 13:48:50 040004 0 WRITE_ERROR
> >>>>>> 13:48:50.910 [1297] <2> check_error_history:
> >>>>> called from bptm line 17870,
> >>>>>> EXIT_Status = 84
> >>>>>> 13:48:50.911 [1297] <2> check_error_history: drive
> >>>>> index = 0, media id =
> >>>>>> 040004, time = 01/10/05 13:48:50, both_match = 0,
> >>>>> media_match = 0,
> >>>>>> drive_match = 0
> >>>>>> 13:48:50.911 [1297] <2> tpunmount:
> >>>>> Check_for_waiting = 0,
> >>>>>> No_tpunmount_after_restore = 0,
> >>>>> Media_Unmount_Delay = 0, MediaOffset = 4
> >>>>>> 13:48:50.911 [1297] <2> tpunmount: tpunmount'ing
> >>>>>> /usr/openv/netbackup/db/media/tpreq/040004
> >>>>>>
> >>>>>>
> >>>>>> Since ufsdump works, this is indicating a
> >>>>> NetBackup 5.1 problem.  Anyway, I
> >>>>>> notice
> >>>>>> in your post-November posts, you referred to NB4.5
> >>>>> servers.  Did you have to
> >>>>>> downgrade NetBackup in order to get your LTO
> >>>>> drives to work properly?
> >>>>>
> >>>> === message truncated ===
> >>>>
> >>>>
> >>>> =====
> >>>> aaarrrggghhh!!!!
> >>>> FreeBSD rocks
> >>>>
> >>>>
> >>>>
> >>>> __________________________________
> >>>> Do you Yahoo!?
> >>>> Yahoo! Mail - Find what you need with new enhanced search.
> >>>> http://info.mail.yahoo.com/mail_250
> >>>>
> >>>
> >>> --kathy
> >>>
> >>> =====================================================================
> >>> =
> >>> ======
> >>> ===
> >>> Kathryn Hemness                        kfhemness AT ucdavis DOT edu
> >>> System Administrator                   phone: 530.752.6547
> >>> Campus Data Center & Client Services   fax:   530.752.9154
> >>>
> >>
> >> --kathy
> >>
> >> ======================================================================
> >> ======
> >> ===
> >> Kathryn Hemness                        kfhemness AT ucdavis DOT edu
> >> System Administrator                   phone: 530.752.6547
> >> Campus Data Center & Client Services   fax:   530.752.9154
> >>
> >
> > --kathy
> >
> > =======================================================================
> > ========
> > Kathryn Hemness                        kfhemness AT ucdavis DOT edu
> > System Administrator                   phone: 530.752.6547
> > Campus Data Center & Client Services   fax:   530.752.9154
> > _______________________________________________
> > Veritas-bu maillist  -  Veritas-bu AT mailman.eng.auburn DOT edu
> > http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu
> >
>

--kathy

===============================================================================
Kathryn Hemness                        kfhemness AT ucdavis DOT edu
System Administrator                   phone: 530.752.6547
Campus Data Center & Client Services   fax:   530.752.9154