Veritas-bu

[Veritas-bu] RE:Veritas-bu] End of Tape (from Nov. 2004)

2005-01-12 09:12:43
Subject: [Veritas-bu] RE:Veritas-bu] End of Tape (from Nov. 2004)
From: kfhemness AT ucdavis DOT edu (Kathryn Hemness)
Date: Wed, 12 Jan 2005 06:12:43 -0800 (PST)
I've also googled and found that article.  I'm definitely using the
veritas drivers.


On Tue, 11 Jan 2005, Chapman, Scott wrote:

> Date: Tue, 11 Jan 2005 20:07:10 -0800
> From: "Chapman, Scott" <Scott.Chapman AT icbc DOT com>
> To: 'Kathryn Hemness' <kfhemness AT ucdavis DOT edu>
> Subject: RE: [Veritas-bu] RE:Veritas-bu] End of Tape (from Nov. 2004)
>
> Kathryn, I did a google search with "External event caused rewind during"
> and the first hit mentions something from your logs:
> >From the 5.0mp2 patch release . . .
>    "For a checkpoint restart backup, in the rare case where bptm detects
>    that a position check problem occurred following a checkpoint because of
> a
>    misconfigured drive or a rewind from an external source, and the backup
> is
>    later resumed, the information on the tape prior to the checkpoint may be
>
>    invalid.
>
>    The bptm log would indicate the position check problem with one of the
>    following logs after a checkpoint:
>
>    08:39:57.969 [4393] <16> write_data: FREEZING media id 00011, too many
>    data blocks written, check tape/driver block size configuration
>
>    OR
>
>    log.041204:14:39:12.373 [6416] <16> write_data: FREEZING media id 00005,
> <<<< here is what your logs reflect also
>    External event caused rewind during write, all data on media is lost
>
>    The problem would occur if the same backup were resumed and completed
> with
>    a successful status."
>
> The one thing is this piece of information is that they mention a
> misconfigured drive.
>
> Question 1) Do you have the latest st driver patch installed on the backup
> server?
> Question 2) You are using the Veritas tape drivers right?  This is very
> important, as there doesn't seem to be many people having luck with
> non-veritas drivers.
>
> Here is the google search
> http://www.google.ca/search?hl=en&q=%22External+event+caused+rewind+during%2
> 2&meta=
>
>
> -----Original Message-----
> From: Kathryn Hemness [mailto:kfhemness AT ucdavis DOT edu]
> Sent: Tuesday, January 11, 2005 4:36 PM
> To: veritas-bu AT mailman.eng.auburn DOT edu
> Cc: scott.chapman AT icbc DOT com; song_1977 AT yahoo DOT com
> Subject: RE: [Veritas-bu] RE:Veritas-bu] End of Tape (from Nov. 2004)
>
>
> Greetings --
>
> I ran a successful backup using the
> /opt/openv/netbackup/db/config/NO_POSITION_CHECK
> setting suggested by Scott Chapman.
>
> Then I google'd for NO_POSITION_CHECK and found the following Veritas
> Support patch readme which had a good explanation for the behavior I'm
> seeing:
>
> http://seer.support.veritas.com/docs/246368.htm
>
> What's really funny is that this readme is for NB3.4 in 2002.
>
> Now that I know the cause of the problem, I need to determine a solution
> which will enable me to use the checkpoint restart feature of NetBackup 5.1.
>
> I welcome any suggestions.  I'm hoping there are easy Solaris or LSI Logic
> HBA commands for the final solution.
>
> On Tue, 11 Jan 2005, Chapman, Scott wrote:
>
> > Date: Tue, 11 Jan 2005 15:37:29 -0800
> > From: "Chapman, Scott" <Scott.Chapman AT icbc DOT com>
> > To: 'Kathryn Hemness' <kfhemness AT ucdavis DOT edu>
> > Subject: RE: [Veritas-bu] RE:Veritas-bu] End of Tape (from Nov. 2004)
> >
> >
> > I am running 4.5fp5 and 5.0 at a different site.  You aren't running
> > the IBM driver for the tape drives are you?  I know that has caused
> > some problems for people.
> >
> > What does "sgscan -v conf" show?  When I run that it confirms that the
> > drive config does not come from the st.conf by putting
> > "NOT-IN-ST-CONFIG-FILE" at the end of each tape drive line . . .
> >
> > Scott Chapman
> > ICBC - Victoria, Government St.
> > Phone: 250.414.7650  Cell: 250.213.9295
> >
> >
> >
> > -----Original Message-----
> > From: Kathryn Hemness [mailto:kfhemness AT ucdavis DOT edu]
> > Sent: Tuesday, January 11, 2005 3:13 PM
> > To: K Chapman
> > Cc: Chapman, Scott; veritas-bu AT mailman.eng.auburn DOT edu;
> > song_1977 AT yahoo DOT com
> > Subject: RE: [Veritas-bu] RE:Veritas-bu] End of Tape (from Nov. 2004)
> >
> >
> >
> >
> > Turning off checkpoints was something I did early in my
> > troubleshooting attempts.
> >
> > I've just turned off a couple of Solaris storage managment daemons
> > (ssdgrptd and
> > ssagent) on my server and am running another test backup now.  It
> > should finish in about 15 more minutes.
> >
> > I'll try the NO_POSITION_CHECK after this test finishes and let you
> > know what happens.
> >
> >
> > On Tue, 11 Jan 2005, K Chapman wrote:
> >
> > > Date: Tue, 11 Jan 2005 14:54:31 -0800 (PST)
> > > From: K Chapman <tech2187 AT yahoo DOT com>
> > > To: Kathryn Hemness <kfhemness AT ucdavis DOT edu>,
> > >      "Chapman, Scott" <Scott.Chapman AT icbc DOT com>
> > > Cc: veritas-bu AT mailman.eng.auburn DOT edu, song_1977 AT yahoo DOT com
> > > Subject: RE: [Veritas-bu] RE:Veritas-bu] End of Tape (from Nov.
> > > 2004)
> > >
> > > as a test, can you try with the position check turned
> > > off?
> > >
> > > touch /opt/openv/netbackup/db/config/NO_POSITION_CHECK
> > >
> > > --- Kathryn Hemness <kfhemness AT ucdavis DOT edu> wrote:
> > >
> > > > Hi, Scott -
> > > >
> > > > Here's the output of my sgscan -v:
> > > >
> > > > /dev/sg/c0t3l2: Tape (/dev/rmt/0): "IBM
> > > > ULTRIUM-TD2     4770"
> > > > /dev/sg/c0t3l3: Tape (/dev/rmt/1): "IBM
> > > > ULTRIUM-TD2     4770"
> > > > /dev/sg/c0t3l4: Tape (/dev/rmt/2): "IBM
> > > > ULTRIUM-TD2     4770"
> > > >
> > > > We got the library in October.  The drives should be
> > > > at the current FW level.
> > > >
> > > > Are you using NB51?
> > > >
> > > > On Tue, 11 Jan 2005, Chapman, Scott wrote:
> > > >
> > > > > Date: Tue, 11 Jan 2005 10:42:36 -0800
> > > > > From: "Chapman, Scott" <Scott.Chapman AT icbc DOT com>
> > > > > To: 'Kathryn Hemness' <kfhemness AT ucdavis DOT edu>,
> > > > >      veritas-bu AT mailman.eng.auburn DOT edu
> > > > > Cc: song_1977 AT yahoo DOT com
> > > > > Subject: RE: [Veritas-bu] RE:Veritas-bu] End of
> > > > Tape (from Nov. 2004)
> > > > >
> > > > > Kathryn are you running current firmware on the
> > > > LTO2 drives?  I seem to
> > > > > remember something about old firmware doing
> > > > rewinds before netbackup was
> > > > > done with the drive . . .
> > > > > >From your logs:
> > > > > 01/10/2005 13:48:50 albus.ucdavis.edu
> > > > albus.ucdavis.edu  FREEZING media id
> > > > > 040004, External event caused rewind during write,
> > > > all data on media is lost
> > > > >
> > > > > I am running IBM drives (we don't use the LSI
> > > > logic HBA's) and here is some
> > > > > output from sgscan -v conf:
> > > > > /dev/sg/c2t0l0: Tape (/dev/rmt/0): "IBM
> > > > ULTRIUM-TD2     38D0" :
> > > > > NOT-IN-ST-CONFIG-FILE
> > > > > /dev/sg/c2t1l0: Tape (/dev/rmt/1): "IBM
> > > > ULTRIUM-TD2     38D0" :
> > > > > NOT-IN-ST-CONFIG-FILE
> > > > > /dev/sg/c2t2l0: Tape (/dev/rmt/2): "IBM
> > > > ULTRIUM-TD2     38D0" :
> > > > > NOT-IN-ST-CONFIG-FILE
> > > > > ...
> > > > >
> > > > > I don't have anything in the st.conf for the
> > > > drives as they have been added
> > > > > to the st several patches ago.  You might check
> > > > you st patch level as well .
> > > > > . .
> > > > >
> > > > > Hope this helps.
> > > > >
> > > > > Scott Chapman
> > > > > ICBC - Victoria, Government St.
> > > > > Phone: 250.414.7650  Cell: 250.213.9295
> > > > >
> > > > >
> > > > >
> > > > > -----Original Message-----
> > > > > From: Kathryn Hemness
> > > > [mailto:kfhemness AT ucdavis DOT edu]
> > > > > Sent: Tuesday, January 11, 2005 10:02 AM
> > > > > To: veritas-bu AT mailman.eng.auburn DOT edu
> > > > > Cc: song_1977 AT yahoo DOT com
> > > > > Subject: [Veritas-bu] RE:Veritas-bu] End of Tape
> > > > (from Nov. 2004)
> > > > >
> > > > >
> > > > > Good Morning --
> > > > >
> > > > > Was there ever a resolution to your NB5.0MP2/LTO
> > > > end of tape problem?
> > > > >
> > > > > I'm currently fighting with a new installation
> > > > NB5.1 on a Solaris 9 system
> > > > > using
> > > > > LTO2 tape drives.  My backups ALWAYS fail either
> > > > at a checkpoint-restart
> > > > > WRITE or
> > > > > at the very last WRITE of the backup, regardless
> > > > of how big the backup is.
> > > > >
> > > > > I've been told by my NetBackup tech support (via
> > > > Sun) that it was a hardware
> > > > > configuration problem.
> > > > >
> > > > > The backups always fail, regardless of any st.conf
> > > > modifications and I've
> > > > > even
> > > > > taken the fiber switch out of the mix.  Here's a
> > > > summary of my hardware and
> > > > > the
> > > > > types of errors I'm seeing (by the way, ufsdump
> > > > works just  fine....).
> > > > >
> > > > > Master: Solaris 9 version 4/04 on a Sun V240 with
> > > > 2 LSI Logic FC919X HBAs
> > > > > running
> > > > > NB5.1 Enterprise Server.  One LSI Logic HBA is
> > > > connected directly to the
> > > > > fiber/scsi
> > > > > bridge of a Qualstar 88264 LTO2 library, the other
> > > > to a Brocade 32-port
> > > > > fiber
> > > > > switch attached to a Sun 3511 storage array.
> > > > >
> > > > > I have tried at least 4 different st.conf LTO2
> > > > configurations with same
> > > > > failing
> > > > > results and am now not using any special LTO2
> > > > definitions.
> > > > >
> > > > > Here are the failure errors from both the
> > > > NetBackup reports and from the
> > > > > bptm logs:
> > > > >
> > > > > 01/10/2005 13:48:50 albus.ucdavis.edu
> > > > albus.ucdavis.edu  FREEZING media id
> > > > > 040004, External event caused rewind during write,
> > > > all data on media is lost
> > > > > 01/10/2005 13:48:54 albus.ucdavis.edu
> > > > albus.ucdavis.edu  CLIENT
> > > > > albus.ucdavis.edu  POLICY IR-ISM_02  SCHED
> > > > WeeklyFull  EXIT STATUS 84 (media
> > > > > write error)
> > > > > 01/10/2005 13:48:54 albus.ucdavis.edu
> > > > albus.ucdavis.edu  backup of client
> > > > > albus.ucdavis.edu exited with status 84 (media
> > > > write error)
> > > > >
> > > > > Here's the bptm log entry for the above error:
> > > > >
> > > > > 13:48:48.032 [1297] <2> write_backup: tp.tv_sec =
> > > > 1105393728, stp.tv_sec =
> > > > > 1105391634, tp.tv_usec = 27455, stp.tv_usec =
> > > > 544901, et = 2093483,
> > > > > mpx_total_kbytes[TWIN_INDEX = 0] = 21261376 13:48:48.075 [1297]
> > > > > <2> io_terminate_tape: writing
> > > > empty backup header,
> > > > > drive index 0, copy 1
> > > > > 13:48:48.091 [1297] <2> io_ioctl: command
> > > > (0)MTWEOF 1 from (bptm.c.7919) on
> > > > > drive index 0
> > > > > 13:48:48.645 [1297] <2> io_write_back_header:
> > > > drive index 0, empty_file,
> > > > > file num = 2, mpx_headers = 0, copy 1
> > > > > 13:48:48.650 [1297] <2> io_close: closing
> > > > > /usr/openv/netbackup/db/media/tpreq/040004, from
> > > > bptm.c.8046
> > > > > 13:48:50.848 [1297] <2> io_terminate_tape:
> > > > absolute block position prior to
> > > > > writing empty header is 332201, copy 1
> > > > > 13:48:50.848 [1297] <2> io_terminate_tape: block
> > > > position check: actual
> > > > > 332201, expected 332213
> > > > > 13:48:50.848 [1297] <2> set_job_details: Sending
> > > > Tfile jobid (907)
> > > > > 13:48:50.848 [1297] <2> set_job_details: LOG
> > > > 1105393730 16 bptm 1297
> > > > > FREEZING media id 040004, External event caused
> > > > rewind during write, all
> > > > > data on media is lost
> > > > >
> > > > > 13:48:50.848 [1297] <2> set_job_details: Done 13:48:50.880
> > > > > [1297] <16> io_terminate_tape:
> > > > FREEZING media id 040004,
> > > > > External event caused rewind during write, all
> > > > data on media is lost
> > > > > 13:48:50.898 [1297] <2> log_media_error:
> > > > successfully wrote to error file -
> > > > > 01/10/05 13:48:50 040004 0 WRITE_ERROR
> > > > > 13:48:50.910 [1297] <2> check_error_history:
> > > > called from bptm line 17870,
> > > > > EXIT_Status = 84
> > > > > 13:48:50.911 [1297] <2> check_error_history: drive
> > > > index = 0, media id =
> > > > > 040004, time = 01/10/05 13:48:50, both_match = 0,
> > > > media_match = 0,
> > > > > drive_match = 0
> > > > > 13:48:50.911 [1297] <2> tpunmount:
> > > > Check_for_waiting = 0,
> > > > > No_tpunmount_after_restore = 0,
> > > > Media_Unmount_Delay = 0, MediaOffset = 4
> > > > > 13:48:50.911 [1297] <2> tpunmount: tpunmount'ing
> > > > > /usr/openv/netbackup/db/media/tpreq/040004
> > > > >
> > > > >
> > > > > Since ufsdump works, this is indicating a
> > > > NetBackup 5.1 problem.  Anyway, I
> > > > > notice
> > > > > in your post-November posts, you referred to NB4.5
> > > > servers.  Did you have to
> > > > > downgrade NetBackup in order to get your LTO
> > > > drives to work properly?
> > > >
> > > === message truncated ===
> > >
> > >
> > > =====
> > > aaarrrggghhh!!!!
> > > FreeBSD rocks
> > >
> > >
> > >
> > > __________________________________
> > > Do you Yahoo!?
> > > Yahoo! Mail - Find what you need with new enhanced search.
> > > http://info.mail.yahoo.com/mail_250
> > >
> >
> > --kathy
> >
> > ======================================================================
> > ======
> > ===
> > Kathryn Hemness                        kfhemness AT ucdavis DOT edu
> > System Administrator                   phone: 530.752.6547
> > Campus Data Center & Client Services   fax:   530.752.9154
> >
>
> --kathy
>
> ============================================================================
> ===
> Kathryn Hemness                        kfhemness AT ucdavis DOT edu
> System Administrator                   phone: 530.752.6547
> Campus Data Center & Client Services   fax:   530.752.9154
>

--kathy

===============================================================================
Kathryn Hemness                        kfhemness AT ucdavis DOT edu
System Administrator                   phone: 530.752.6547
Campus Data Center & Client Services   fax:   530.752.9154