Veritas-bu

[Veritas-bu] RE:Veritas-bu] End of Tape (from Nov. 2004)

2005-01-11 19:36:27
Subject: [Veritas-bu] RE:Veritas-bu] End of Tape (from Nov. 2004)
From: kfhemness AT ucdavis DOT edu (Kathryn Hemness)
Date: Tue, 11 Jan 2005 16:36:27 -0800 (PST)
Greetings --

I ran a successful backup using the 
/opt/openv/netbackup/db/config/NO_POSITION_CHECK
setting suggested by Scott Chapman.

Then I google'd for NO_POSITION_CHECK and found the following Veritas Support
patch readme which had a good explanation for the behavior I'm seeing:

http://seer.support.veritas.com/docs/246368.htm

What's really funny is that this readme is for NB3.4 in 2002.

Now that I know the cause of the problem, I need to determine a
solution which will enable me to use the checkpoint restart feature
of NetBackup 5.1.

I welcome any suggestions.  I'm hoping there are easy Solaris or LSI Logic HBA
commands for the final solution.

On Tue, 11 Jan 2005, Chapman, Scott wrote:

> Date: Tue, 11 Jan 2005 15:37:29 -0800
> From: "Chapman, Scott" <Scott.Chapman AT icbc DOT com>
> To: 'Kathryn Hemness' <kfhemness AT ucdavis DOT edu>
> Subject: RE: [Veritas-bu] RE:Veritas-bu] End of Tape (from Nov. 2004)
>
>
> I am running 4.5fp5 and 5.0 at a different site.  You aren't running the IBM
> driver for the tape drives are you?  I know that has caused some problems
> for people.
>
> What does "sgscan -v conf" show?  When I run that it confirms that the drive
> config does not come from the st.conf by putting "NOT-IN-ST-CONFIG-FILE" at
> the end of each tape drive line . . .
>
> Scott Chapman
> ICBC - Victoria, Government St.
> Phone: 250.414.7650  Cell: 250.213.9295
>
>
>
> -----Original Message-----
> From: Kathryn Hemness [mailto:kfhemness AT ucdavis DOT edu]
> Sent: Tuesday, January 11, 2005 3:13 PM
> To: K Chapman
> Cc: Chapman, Scott; veritas-bu AT mailman.eng.auburn DOT edu; song_1977 AT 
> yahoo DOT com
> Subject: RE: [Veritas-bu] RE:Veritas-bu] End of Tape (from Nov. 2004)
>
>
>
>
> Turning off checkpoints was something I did early in my troubleshooting
> attempts.
>
> I've just turned off a couple of Solaris storage managment daemons (ssdgrptd
> and
> ssagent) on my server and am running another test backup now.  It should
> finish
> in about 15 more minutes.
>
> I'll try the NO_POSITION_CHECK after this test finishes and let you know
> what happens.
>
>
> On Tue, 11 Jan 2005, K Chapman wrote:
>
> > Date: Tue, 11 Jan 2005 14:54:31 -0800 (PST)
> > From: K Chapman <tech2187 AT yahoo DOT com>
> > To: Kathryn Hemness <kfhemness AT ucdavis DOT edu>,
> >      "Chapman, Scott" <Scott.Chapman AT icbc DOT com>
> > Cc: veritas-bu AT mailman.eng.auburn DOT edu, song_1977 AT yahoo DOT com
> > Subject: RE: [Veritas-bu] RE:Veritas-bu] End of Tape (from Nov. 2004)
> >
> > as a test, can you try with the position check turned
> > off?
> >
> > touch /opt/openv/netbackup/db/config/NO_POSITION_CHECK
> >
> > --- Kathryn Hemness <kfhemness AT ucdavis DOT edu> wrote:
> >
> > > Hi, Scott -
> > >
> > > Here's the output of my sgscan -v:
> > >
> > > /dev/sg/c0t3l2: Tape (/dev/rmt/0): "IBM
> > > ULTRIUM-TD2     4770"
> > > /dev/sg/c0t3l3: Tape (/dev/rmt/1): "IBM
> > > ULTRIUM-TD2     4770"
> > > /dev/sg/c0t3l4: Tape (/dev/rmt/2): "IBM
> > > ULTRIUM-TD2     4770"
> > >
> > > We got the library in October.  The drives should be
> > > at the current FW level.
> > >
> > > Are you using NB51?
> > >
> > > On Tue, 11 Jan 2005, Chapman, Scott wrote:
> > >
> > > > Date: Tue, 11 Jan 2005 10:42:36 -0800
> > > > From: "Chapman, Scott" <Scott.Chapman AT icbc DOT com>
> > > > To: 'Kathryn Hemness' <kfhemness AT ucdavis DOT edu>,
> > > >      veritas-bu AT mailman.eng.auburn DOT edu
> > > > Cc: song_1977 AT yahoo DOT com
> > > > Subject: RE: [Veritas-bu] RE:Veritas-bu] End of
> > > Tape (from Nov. 2004)
> > > >
> > > > Kathryn are you running current firmware on the
> > > LTO2 drives?  I seem to
> > > > remember something about old firmware doing
> > > rewinds before netbackup was
> > > > done with the drive . . .
> > > > >From your logs:
> > > > 01/10/2005 13:48:50 albus.ucdavis.edu
> > > albus.ucdavis.edu  FREEZING media id
> > > > 040004, External event caused rewind during write,
> > > all data on media is lost
> > > >
> > > > I am running IBM drives (we don't use the LSI
> > > logic HBA's) and here is some
> > > > output from sgscan -v conf:
> > > > /dev/sg/c2t0l0: Tape (/dev/rmt/0): "IBM
> > > ULTRIUM-TD2     38D0" :
> > > > NOT-IN-ST-CONFIG-FILE
> > > > /dev/sg/c2t1l0: Tape (/dev/rmt/1): "IBM
> > > ULTRIUM-TD2     38D0" :
> > > > NOT-IN-ST-CONFIG-FILE
> > > > /dev/sg/c2t2l0: Tape (/dev/rmt/2): "IBM
> > > ULTRIUM-TD2     38D0" :
> > > > NOT-IN-ST-CONFIG-FILE
> > > > ...
> > > >
> > > > I don't have anything in the st.conf for the
> > > drives as they have been added
> > > > to the st several patches ago.  You might check
> > > you st patch level as well .
> > > > . .
> > > >
> > > > Hope this helps.
> > > >
> > > > Scott Chapman
> > > > ICBC - Victoria, Government St.
> > > > Phone: 250.414.7650  Cell: 250.213.9295
> > > >
> > > >
> > > >
> > > > -----Original Message-----
> > > > From: Kathryn Hemness
> > > [mailto:kfhemness AT ucdavis DOT edu]
> > > > Sent: Tuesday, January 11, 2005 10:02 AM
> > > > To: veritas-bu AT mailman.eng.auburn DOT edu
> > > > Cc: song_1977 AT yahoo DOT com
> > > > Subject: [Veritas-bu] RE:Veritas-bu] End of Tape
> > > (from Nov. 2004)
> > > >
> > > >
> > > > Good Morning --
> > > >
> > > > Was there ever a resolution to your NB5.0MP2/LTO
> > > end of tape problem?
> > > >
> > > > I'm currently fighting with a new installation
> > > NB5.1 on a Solaris 9 system
> > > > using
> > > > LTO2 tape drives.  My backups ALWAYS fail either
> > > at a checkpoint-restart
> > > > WRITE or
> > > > at the very last WRITE of the backup, regardless
> > > of how big the backup is.
> > > >
> > > > I've been told by my NetBackup tech support (via
> > > Sun) that it was a hardware
> > > > configuration problem.
> > > >
> > > > The backups always fail, regardless of any st.conf
> > > modifications and I've
> > > > even
> > > > taken the fiber switch out of the mix.  Here's a
> > > summary of my hardware and
> > > > the
> > > > types of errors I'm seeing (by the way, ufsdump
> > > works just  fine....).
> > > >
> > > > Master: Solaris 9 version 4/04 on a Sun V240 with
> > > 2 LSI Logic FC919X HBAs
> > > > running
> > > > NB5.1 Enterprise Server.  One LSI Logic HBA is
> > > connected directly to the
> > > > fiber/scsi
> > > > bridge of a Qualstar 88264 LTO2 library, the other
> > > to a Brocade 32-port
> > > > fiber
> > > > switch attached to a Sun 3511 storage array.
> > > >
> > > > I have tried at least 4 different st.conf LTO2
> > > configurations with same
> > > > failing
> > > > results and am now not using any special LTO2
> > > definitions.
> > > >
> > > > Here are the failure errors from both the
> > > NetBackup reports and from the
> > > > bptm logs:
> > > >
> > > > 01/10/2005 13:48:50 albus.ucdavis.edu
> > > albus.ucdavis.edu  FREEZING media id
> > > > 040004, External event caused rewind during write,
> > > all data on media is lost
> > > > 01/10/2005 13:48:54 albus.ucdavis.edu
> > > albus.ucdavis.edu  CLIENT
> > > > albus.ucdavis.edu  POLICY IR-ISM_02  SCHED
> > > WeeklyFull  EXIT STATUS 84 (media
> > > > write error)
> > > > 01/10/2005 13:48:54 albus.ucdavis.edu
> > > albus.ucdavis.edu  backup of client
> > > > albus.ucdavis.edu exited with status 84 (media
> > > write error)
> > > >
> > > > Here's the bptm log entry for the above error:
> > > >
> > > > 13:48:48.032 [1297] <2> write_backup: tp.tv_sec =
> > > 1105393728, stp.tv_sec =
> > > > 1105391634, tp.tv_usec = 27455, stp.tv_usec =
> > > 544901, et = 2093483,
> > > > mpx_total_kbytes[TWIN_INDEX = 0] = 21261376
> > > > 13:48:48.075 [1297] <2> io_terminate_tape: writing
> > > empty backup header,
> > > > drive index 0, copy 1
> > > > 13:48:48.091 [1297] <2> io_ioctl: command
> > > (0)MTWEOF 1 from (bptm.c.7919) on
> > > > drive index 0
> > > > 13:48:48.645 [1297] <2> io_write_back_header:
> > > drive index 0, empty_file,
> > > > file num = 2, mpx_headers = 0, copy 1
> > > > 13:48:48.650 [1297] <2> io_close: closing
> > > > /usr/openv/netbackup/db/media/tpreq/040004, from
> > > bptm.c.8046
> > > > 13:48:50.848 [1297] <2> io_terminate_tape:
> > > absolute block position prior to
> > > > writing empty header is 332201, copy 1
> > > > 13:48:50.848 [1297] <2> io_terminate_tape: block
> > > position check: actual
> > > > 332201, expected 332213
> > > > 13:48:50.848 [1297] <2> set_job_details: Sending
> > > Tfile jobid (907)
> > > > 13:48:50.848 [1297] <2> set_job_details: LOG
> > > 1105393730 16 bptm 1297
> > > > FREEZING media id 040004, External event caused
> > > rewind during write, all
> > > > data on media is lost
> > > >
> > > > 13:48:50.848 [1297] <2> set_job_details: Done
> > > > 13:48:50.880 [1297] <16> io_terminate_tape:
> > > FREEZING media id 040004,
> > > > External event caused rewind during write, all
> > > data on media is lost
> > > > 13:48:50.898 [1297] <2> log_media_error:
> > > successfully wrote to error file -
> > > > 01/10/05 13:48:50 040004 0 WRITE_ERROR
> > > > 13:48:50.910 [1297] <2> check_error_history:
> > > called from bptm line 17870,
> > > > EXIT_Status = 84
> > > > 13:48:50.911 [1297] <2> check_error_history: drive
> > > index = 0, media id =
> > > > 040004, time = 01/10/05 13:48:50, both_match = 0,
> > > media_match = 0,
> > > > drive_match = 0
> > > > 13:48:50.911 [1297] <2> tpunmount:
> > > Check_for_waiting = 0,
> > > > No_tpunmount_after_restore = 0,
> > > Media_Unmount_Delay = 0, MediaOffset = 4
> > > > 13:48:50.911 [1297] <2> tpunmount: tpunmount'ing
> > > > /usr/openv/netbackup/db/media/tpreq/040004
> > > >
> > > >
> > > > Since ufsdump works, this is indicating a
> > > NetBackup 5.1 problem.  Anyway, I
> > > > notice
> > > > in your post-November posts, you referred to NB4.5
> > > servers.  Did you have to
> > > > downgrade NetBackup in order to get your LTO
> > > drives to work properly?
> > >
> > === message truncated ===
> >
> >
> > =====
> > aaarrrggghhh!!!!
> > FreeBSD rocks
> >
> >
> >
> > __________________________________
> > Do you Yahoo!?
> > Yahoo! Mail - Find what you need with new enhanced search.
> > http://info.mail.yahoo.com/mail_250
> >
>
> --kathy
>
> ============================================================================
> ===
> Kathryn Hemness                        kfhemness AT ucdavis DOT edu
> System Administrator                   phone: 530.752.6547
> Campus Data Center & Client Services   fax:   530.752.9154
>

--kathy

===============================================================================
Kathryn Hemness                        kfhemness AT ucdavis DOT edu
System Administrator                   phone: 530.752.6547
Campus Data Center & Client Services   fax:   530.752.9154