Veritas-bu

Re: [Veritas-bu] Strange behaviour on 6.5.6

2011-06-23 10:50:54
Subject: Re: [Veritas-bu] Strange behaviour on 6.5.6
From: Justin Piszcz <jpiszcz AT lucidpixels DOT com>
To: "Lightner, Jeff" <JLightner AT water DOT com>
Date: Thu, 23 Jun 2011 10:50:49 -0400 (EDT)
Jeff,

These are the most comprehensive notes I have seen on this subject, thank 
you very much for sharing!

Justin.

On Thu, 23 Jun 2011, Lightner, Jeff wrote:

> Here's some notes I used in NBU 6.5.x for clearing reserved items - the
> first is about clearing a single tape and the rest goes into more detail
> on releasing tapes and drives (so far I haven't had to do any of this in
> NBU 7.x):
>
> Issue with tape SU2230
>
> nbrbutil -dump |grep SU2230
>         MdsAllocation allocationKey=430288 jobType=2 mediaKey=4004367
>         mediaId=SU2230 driveKey=0 driveName= drivePath= stuName=
>         masterServerName=atubks01 mediaServerName=atudva01
>         ndmpTapeServerName= diskVolumeKey=0 mountKey=0 linkKey=0
>         fatPipeKey=0 scsiResType=0 serverStateFlags=0
>
> Ran: nbrbutil -releaseMDS 430288
> Next "nrbutil -dump |grep SU2230" displayed nothing.
>
> ========================================================================
> ========
> Older notes:
> NBRBUTIL Notes.
>
>> From time to time a tape is assigned to a drive and for whatever reason
> it
> gets hung up. Thus the drive and tape are both in use as far as the
> shared
> storage subsystem is concerned.  This is observed in two different ways.
>
> The first I have seen is that there are queued vault jobs to duplicate
> but
> they are display the status messages:
>
> 08/22/2007 07:44:12 - requesting resource SU0822
> 08/22/2007 07:44:12 - reserving resource SU0822
>
> But never show the line
> 08/22/2007 11:29:23 - resource SU0822 reserved
>
> Sometimes the tape is in use by another backup so running "vmoprcmd  |
> grep
> hcart" will show if the tape is in a drive. If not it is most likely
> hung in
> the database and will need to be resolved.
>
> The second indication of problems was found by Rex using the command
> "vmdareq
> | egrep -I  ult|scan..  This was showing lines like:
>
> Ultrium12 - RESERVED on Wed Aug 22 10:00:23 2007
> Ultrium13 - RESERVED on Tue Aug 21 20:17:01 2007
> Ultrium14 - RESERVED on Wed Aug 22 06:15:59 2007
>
> But there were no backups running from this time.  So this shows that
> the
> drive is locked in the DB and will need to be cleared as well.  Drives
> require that the tape ID be located first and that this allocation ID be
> used
> to release the lock.
>
>
> Clearing Drives
> Run both vmoprcmd | grep -i hcart and vmdareq | egrep -i "ult|reserved".
> Look for drives that are RESERVED but do not have tapes in the drive
> from a
> vmoprcmd standpoint.  Look for allocations that are out of place with
> the
> rest of the list.
>
> Ultrium1 - RESERVED on Wed Aug 22 05:15:23 2007
>     atubks01 RESERVED SCAN_HOST UP
> Ultrium10 - AVAILABLE
> Ultrium11 - RESERVED on Wed Aug 22 11:08:13 2007
>     atubks01 RESERVED SCAN_HOST UP
> Ultrium12 - RESERVED on Wed Aug 22 10:00:23 2007
>     atubks01 RESERVED UP
> Ultrium13 - RESERVED on Tue Aug 21 20:17:01 2007
>     atubks01 RESERVED UP
> Ultrium14 - RESERVED on Wed Aug 22 06:15:59 2007
>     atubks01 RESERVED SCAN_HOST UP
> Ultrium15 - RESERVED on Wed Aug 22 11:25:30 2007
>     atubks01 RESERVED SCAN_HOST UP
> Ultrium16 - RESERVED on Tue Aug 21 20:13:26 2007
>     atubks01 RESERVED SCAN_HOST UP
> Ultrium17 - RESERVED on Wed Aug 22 04:59:29 2007
>     atubks01 RESERVED SCAN_HOST UP
> Ultrium18 - RESERVED on Wed Aug 22 11:29:23 2007
>     atubks01 RESERVED SCAN_HOST UP
> Ultrium19 - RESERVED on Wed Aug 22 11:39:36 2007
>     atubks01 RESERVED SCAN_HOST UP
> Ultrium2 - RESERVED on Wed Aug 22 09:59:57 2007
>     atubks01 RESERVED UP
> Ultrium20 - RESERVED on Wed Aug 22 07:31:56 2007
>     atubks01 RESERVED UP
> Ultrium3 - RESERVED on Tue Aug 21 20:13:05 2007
>     atubks01 RESERVED UP
> Ultrium4 - RESERVED on Wed Aug 22 11:29:22 2007
>     atubks01 RESERVED SCAN_HOST UP
> Ultrium5 - RESERVED on Wed Aug 22 02:00:00 2007
>     atubks01 RESERVED UP
> Ultrium6 - RESERVED on Wed Aug 22 10:38:42 2007
>     atubks01 RESERVED SCAN_HOST UP
> Ultrium7 - RESERVED on Wed Aug 22 07:31:56 2007
>     atubks01 RESERVED UP
> Ultrium8 - RESERVED on Tue Aug 21 20:16:44 2007
>     atubks01 RESERVED SCAN_HOST UP
> Ultrium9 - RESERVED on Wed Aug 22 11:39:40 2007
>     atubks01 RESERVED SCAN_HOST UP
>
> Ultrium1                 Yes     Yes    SU1142  SU1142  Yes       hcart
> Ultrium10                No      No                     No        hcart
> Ultrium11                Yes     Yes    SU1092  SU1092  Yes       hcart
> Ultrium12                Yes     Yes    SU2619  SU2619  Yes       hcart
> Ultrium13                No      No                     No        hcart
> Ultrium14                Yes     Yes    SU1150  SU1150  Yes       hcart
> Ultrium15                Yes     Yes    SU2427  SU2427  Yes       hcart
> Ultrium16                No      No                     No        hcart
> Ultrium17                Yes     Yes    SU2783  SU2783  Yes       hcart
> Ultrium18                Yes     Yes    SU0822  SU0822  Yes       hcart
> Ultrium19                Yes     Yes    SU1283  SU1283  Yes       hcart
> Ultrium2                 Yes     Yes    SU2246  SU2246  Yes       hcart
> Ultrium20                Yes     Yes    SU1731  SU1731  Yes       hcart
> Ultrium3                 No      No                     No        hcart
> Ultrium4                 Yes     Yes    SU2182  SU2182  Yes       hcart
> Ultrium5                 Yes     Yes    SU1006  SU1006  Yes       hcart
> Ultrium6                 Yes     Yes    SU1414  SU1414  Yes       hcart
> Ultrium7                 Yes     Yes    SU2276  SU2276  Yes       hcart
> Ultrium8                 No      No                     No        hcart
> Ultrium9                 Yes     Yes    SU0468  SU0468  Yes       hcart
>
> In the example output above drives Ultrium3, Ultrum8, Ultrium13 and
> Ultrium16
> have reservation times on them from around 20:00 the night before.
> Vmoprcmd
> shows that the drives do not have tapes in them either but the drives
> are
> reserved for use. Running nbrbutil and looking for one of the drives
> shows
> the following:  The reserved time must be at least 16 hours later than
> the
> current time AND no backups are running that started at this time.
>
> # nbrbutil -dump | grep Ultrium3
>         index=21 (Allocation
> allocation={74CDE574-1DD2-11B2-B328-000F20687028}
> provider=ReservationGroupProvider resourcename=SU0637
> masterserver=atubks01
> groupid={74CDE1DC-1DD2-11B2-8032-000F20687028} userSequence=1
> userid="jobid=208398" (Media_Drive_Allocation_Record:
> AllocationKey=28844
> (Media_Drive_Record:  MediaKey=4002684 MediaId=SU0637
> MediaServer=atubks01
> DriveKey=2000046 DriveName=Ultrium3 PrimaryPath=/dev/rmt/22mnb
> PoolName=Full
> RobotNum=2 RobotType=8 MediaTypeName=NetBackup HCART1
> DriveTypeName=NetBackup
> HCART1 NdmpControlHost= RetentionLevel=0 PolicyType=2 JobType=10
> MasterServer=atubks01) (Storage_Unit_Record:  name= MasterServer=
> MediaServer= STUType=0 RobotType=0 RobotNumber=0 Density=0
> OnDemandOnly=0
> ConcurrentJobs=0 ActiveJobs=0 MaxMultiplexing=0 NdmpAttachHost=
> AbsolutePath=) (Bptm_Strings_Record: 0="MEDIADB 1 28844 SU0637 4002684
> ------
> 6 1186882203 1187578805 1190257205 1187618390 50030944 5 5 3 10 0 0 1024
> 0
> 195447 0" 1="VOLUME 1 SU0637 4002684 SU0637 Full FUJIFILM 02DO114064 6 8
> 2
> 442 0 {00000000-0000-0000-0000-000000000000} 0" 2="DRIVE 2 Ultrium3
> 2000046
> IE71K03187 /dev/rmt/22mnb -1 -1 -1 -1 0 0 0 0 *NULL* *NULL* *NULL*
> *NULL* 1
> 16" 3="STORAGE 0 *NULL* 0 0" ) TpReqFileName=))
>        MdsAllocation allocationKey=28844 jobType=10 mediaKey=4002684
> mediaId=SU0637 driveKey=2000046 driveName=Ultrium3
> drivePath=/dev/rmt/22mnb
> stuName= masterServerName=atubks01 mediaServerName=atubks0
>
> I see that the media ID is SU0637 in the above example so I use the
> process
> below for releasing media.  In the case where there is no media ID
> listed
> (mediaId= driveKey=.) for all dumped entries, then I used the allocation
> ID
> listed to release the drive using the release procedures below.
>
> Clearing Tapes
> To clear allocation holds for a tape do the following:
>
> # nbrbutil -dump | grep <media id> or <drive name>
>
> # nbrbutil -dump |grep SU0822
>         index=1 (Request provider=DriveOperationProvider
> resourcename=MEDIA
> RESOURCE  userSequence=1 (MediaRequest: mediaId=SU0822
> mediaServer=atubks01
> mediaKey=0 userReservationId= assignedTime=0 client= usageType=10
> mustBeNdmp=no driveName= drivePath= mediaPool= robotNumber=-1
> slotNumber=-1
> density=-1 ndmpControlHost= failIfNoMedia=yes externalFile=))
>         index=2 (Request provider=ReservationGroupProvider
> resourcename=SU0822  userSequence=2 (MediaReservationRequest:
> mountCount=1
> request=(MediaRequest: mediaId=SU0822 mediaServer=atubks01 mediaKey=0
> userReservationId= assignedTime=0 client= usageType=10 mustBeNdmp=no
> driveName= drivePath= mediaPool= robotNumber=-1 slotNumber=-1 density=-1
> ndmpControlHost= failIfNoMedia=yes externalFile=))))
>         index=5 (Allocation
> allocation={11B5E9AA-1DD2-11B2-B9FA-000F20687028}
> provider=ReservationGroupProvider resourcename= masterserver=atubks01
> groupid={11B5E734-1DD2-11B2-82D0-000F20687028} userSequence=2
> userid="jobid=208396" (MediaReservation: mountCount=1
> reservationKey=28789
> request=(MediaRequest: mediaId=SU0822 mediaServer=atubks01
> mediaKey=4003252
> userReservationId= assignedTime=0 client= usageType=0 mustBeNdmp=no
> driveName= drivePath= mediaPool= robotNumber=-1 slotNumber=-1 density=-1
> ndmpControlHost= failIfNoMedia=no externalFile=)))
>         index=48 (Allocation
> allocation={11B5EAD6-1DD2-11B2-85BC-000F20687028}
> provider=ReservationGroupProvider resourcename=SU0822
> masterserver=atubks01
> groupid={11B5E734-1DD2-11B2-82D0-000F20687028} userSequence=1
> userid="jobid=208396" (Media_Drive_Allocation_Record:
> AllocationKey=28791
> (Media_Drive_Record:  MediaKey=4003252 MediaId=SU0822
> MediaServer=atubks01
> DriveKey=2000053 DriveName=Ultrium10 PrimaryPath=/dev/rmt/29mnb
> PoolName=Full
> RobotNum=2 RobotType=8 MediaTypeName=NetBackup HCART1
> DriveTypeName=NetBackup
> HCART1 NdmpControlHost= RetentionLevel=0 PolicyType=2 JobType=10
> MasterServer=atubks01) (Storage_Unit_Record:  name= MasterServer=
> MediaServer= STUType=0 RobotType=0 RobotNumber=0 Density=0
> OnDemandOnly=0
> ConcurrentJobs=0 ActiveJobs=0 MaxMultiplexing=0 NdmpAttachHost=
> AbsolutePath=) (Bptm_Strings_Record: 0="MEDIADB 1 28791 SU0822 4003252
> ------
> 6 1185750016 1187564415 1190242815 1187106913 157741280 4 3 3 10 0 0
> 1024 0
> 2464718 0" 1="VOLUME 1 SU0822 4003252 SU0822 Full FUJIFILM 02DO112184 6
> 8 2
> 126 0 {00000000-0000-0000-0000-000000000000} 0" 2="DRIVE 2 Ultrium10
> 2000053
> IE72K00060 /dev/rmt/29mnb -1 -1 -1 -1 0 0 0 0 *NULL* *NULL* *NULL*
> *NULL* 1
> 16" 3="STORAGE 0 *NULL* 0 0" ) TpReqFileName=))
>        MdsAllocation allocationKey=28789 jobType=14 mediaKey=4003252
> mediaId=SU0822 driveKey=0 driveName= drivePath= stuName=
> masterServerName=atubks01 mediaServerName= ndmpTapeServerName=
>        MdsAllocation allocationKey=28791 jobType=10 mediaKey=4003252
> mediaId=SU0822 driveKey=2000053 driveName=Ultrium10
> drivePath=/dev/rmt/29mnb
> stuName= masterServerName=atubks01 mediaServerName=atubks01
> ndmpTapeServerName=
>
> Look at the output above and the bold highlights.  This shows that this
> is
> for this tape.  Now look and find the userid="jobid=208396" field.  This
> is
> showing that this jobid has an exclusive lock on this tape ID.  It also
> shows
> the drive as well.  A quick check of the active jobs in the GUI or with
> "bpdbjobs | grep Active". command will show that this is not an active
> job but
> a hung allocation.  So this needs to be released.
>
> Always make sure to verify the following BEFORE attempting to release a
> lock!
> 1.      The listed jobid is not an active job!
> 2.      The associated tape is not in a tape drive.
> 3.      There are no tapes in the listed drive.
> 4.      The tape drive is UP on all servers.
>
> Failure to make sure the above criteria are correct can result in failed
> backup and possibly media corruption.  If there is ANY doubt do not do
> the
> operation and get with one of the Unix Admins.
>
> Releasing the lock.
> Find the allocation= field and do copy for a later paste operation.  You
> want
> the allocation ID for the one that had MediaReservation: inside the
> second
> set of parentheses.  The tape drive allocation will have
> Media_Drive_Allocation_Record.
>
> The following command does the release of the media and the drive.
>
> # nbrbutil -canel <allocation=>
>
>
> # nbrbutil -cancel 11B5EAD6-1DD2-11B2-85BC-000F20687028
> No request with ID {11B5EAD6-1DD2-11B2-85BC-000F20687028} found
> # nbrbutil -release 11B5EAD6-1DD2-11B2-85BC-000F20687028
> (releasing (Allocation allocation={11B5EAD6-1DD2-11B2-85BC-000F20687028}
> provider=ReservationGroupProvider resourcename=SU0822
> masterserver=atubks01
> groupid={11B5E734-1DD2-11B2-82D0-000F20687028} userSequence=1
> userid="jobid=208396" (Media_Drive_Allocation_Record:
> AllocationKey=28791
> (Media_Drive_Record:  MediaKey=4003252 MediaId=SU0822
> MediaServer=atubks01
> DriveKey=2000053 DriveName=Ultrium10 PrimaryPath=/dev/rmt/29mnb
> PoolName=Full
> RobotNum=2 RobotType=8 MediaTypeName=NetBackup HCART1
> DriveTypeName=NetBackup
> HCART1 NdmpControlHost= RetentionLevel=0 PolicyType=2 JobType=10
> MasterServer=atubks01) (Storage_Unit_Record:  name= MasterServer=
> MediaServer= STUType=0 RobotType=0 RobotNumber=0 Density=0
> OnDemandOnly=0
> ConcurrentJobs=0 ActiveJobs=0 MaxMultiplexing=0 NdmpAttachHost=
> AbsolutePath=) (Bptm_Strings_Record: 0="MEDIADB 1 28791 SU0822 4003252
> ------
> 6 1185750016 1187564415 1190242815 1187106913 157741280 4 3 3 10 0 0
> 1024 0
> 2464718 0" 1="VOLUME 1 SU0822 4003252 SU0822 Full FUJIFILM 02DO112184 6
> 8 2
> 126 0 {00000000-0000-0000-0000-000000000000} 0" 2="DRIVE 2 Ultrium10
> 2000053
> IE72K00060 /dev/rmt/29mnb -1 -1 -1 -1 0 0 0 0 *NULL* *NULL* *NULL*
> *NULL* 1
> 16" 3="STORAGE 0 *NULL* 0 0" ) TpReqFileName=)))
>
> Now run the dump process again.  If a job was waiting on the media, the
> information will update accordingly.
>
> # nbrbutil -dump | grep SU0822
>         index=8 (Allocation
> allocation={75BE0D58-1DD2-11B2-B723-000F20687028}
> provider=ReservationGroupProvider resourcename=SU0822
> masterserver=atubks01
> groupid={75BE0966-1DD2-11B2-8B2C-000F20687028} userSequence=1
> userid="jobid=208516" (Media_Drive_Allocation_Record:
> AllocationKey=29065
> (Media_Drive_Record:  MediaKey=4003252 MediaId=SU0822
> MediaServer=atubks01
> DriveKey=2000061 DriveName=Ultrium18 PrimaryPath=/dev/rmt/15mnb
> PoolName=Full
> RobotNum=2 RobotType=8 MediaTypeName=NetBackup HCART1
> DriveTypeName=NetBackup
> HCART1 NdmpControlHost= RetentionLevel=0 PolicyType=2 JobType=10
> MasterServer=atubks01) (Storage_Unit_Record:  name= MasterServer=
> MediaServer= STUType=0 RobotType=0 RobotNumber=0 Density=0
> OnDemandOnly=0
> ConcurrentJobs=0 ActiveJobs=0 MaxMultiplexing=0 NdmpAttachHost=
> AbsolutePath=) (Bptm_Strings_Record: 0="MEDIADB 1 29065 SU0822 4003252
> ------
> 6 1185750016 1187564415 1190242815 1187106913 157741280 4 3 3 10 0 0
> 1024 0
> 2464718 0" 1="VOLUME 1 SU0822 4003252 SU0822 Full FUJIFILM 02DO112184 6
> 8 2
> 126 0 {00000000-0000-0000-0000-000000000000} 0" 2="DRIVE 2 Ultrium18
> 2000061
> HUB4C01W8H /dev/rmt/15mnb -1 -1 -1 -1 0 0 0 0 *NULL* *NULL* *NULL*
> *NULL* 1
> 16" 3="STORAGE 0 *NULL* 0 0" ) TpReqFileName=))
>         index=34 (Allocation
> allocation={75BE0BDC-1DD2-11B2-83E1-000F20687028}
> provider=ReservationGroupProvider resourcename= masterserver=atubks01
> groupid={75BE0966-1DD2-11B2-8B2C-000F20687028} userSequence=2
> userid="jobid=208516" (MediaReservation: mountCount=1
> reservationKey=29063
> request=(MediaRequest: mediaId=SU0822 mediaServer=atubks01
> mediaKey=4003252
> userReservationId= assignedTime=0 client= usageType=0 mustBeNdmp=no
> driveName= drivePath= mediaPool= robotNumber=-1 slotNumber=-1 density=-1
> ndmpControlHost= failIfNoMedia=no externalFile=)))
>        MdsAllocation allocationKey=29063 jobType=14 mediaKey=4003252
> mediaId=SU0822 driveKey=0 driveName= drivePath= stuName=
> masterServerName=atubks01 mediaServerName= ndmpTapeServerName=
>        MdsAllocation allocationKey=29065 jobType=10 mediaKey=4003252
> mediaId=SU0822 driveKey=2000061 driveName=Ultrium18
> drivePath=/dev/rmt/15mnb
> stuName= masterServerName=atubks01 mediaServerName=atubks01
> ndmpTapeServerName=
>
> -----Original Message-----
> From: veritas-bu-bounces AT mailman.eng.auburn DOT edu
> [mailto:veritas-bu-bounces AT mailman.eng.auburn DOT edu] On Behalf Of Justin
> Piszcz
> Sent: Thursday, June 23, 2011 8:04 AM
> To: Patrick
> Cc: veritas-bu AT mailman.eng.auburn DOT edu
> Subject: Re: [Veritas-bu] Strange behaviour on 6.5.6
>
> Hi,
>
> Also:
>
>> While this is not the worst of our problems in may be symptomatic of
> the
>> overall problem. We get these errors on a daily basis:
>> 2, 5, 6, 10, 12, 13, 23, 24, 25, 40, 42, 47, 50, 52, 54, 58, 59, 63,
> 71,
>> 156, 160, 191, 196, 219, 232, 239, 249, 2001, 2009
> Something sounds REALLY broken with this environment, is DNS working
> properly, are catalog backups running successfully?
>
> Also your vmquery shows;
> media ID:              ABS287
> barcode:               VTABS287
> media description:     Virtual tapes from VTL
> robot type:            TLD - Tape Library DLT (8)
>
> If its a physical tape why does it have a virtual tape label for the
> barcode?
>
> Are you just stepping into this environment?
> How many policies/etc are there?
> I wonder if it would be worth starting over if there are that many
> issues
> and if you don't have any infinite retention data/etc.
>
> Justin.
>
>
>
>
> On Thu, 23 Jun 2011, Justin Piszcz wrote:
>
>>
>> On Thu, 23 Jun 2011, Patrick wrote:
>>
>>> Hi All,
>>>
>>>
>>>
>>> I am at a new site and it is in very bad shape. One thing I would
> like to
>>> ask:
>>>
>>
>> Hi,
>>
>> What does nbrbutil -dump | grep ABS287
>> Say?
>>
>> Have you tried cycling/bouncing the environment, e.g. if there is
> request
>> associated with that resource (ABS287)?
>>
>> Justin.
>>
>>
> _______________________________________________
> Veritas-bu maillist  -  Veritas-bu AT mailman.eng.auburn DOT edu
> http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu
>
> Proud partner. Susan G. Komen for the Cure.
>
> Please consider our environment before printing this e-mail or attachments.
> ----------------------------------
> CONFIDENTIALITY NOTICE: This e-mail may contain privileged or confidential 
> information and is for the sole use of the intended recipient(s). If you are 
> not the intended recipient, any disclosure, copying, distribution, or use of 
> the contents of this information is prohibited and may be unlawful. If you 
> have received this electronic transmission in error, please reply immediately 
> to the sender that you have received the message in error, and delete it. 
> Thank you.
> ----------------------------------
>
_______________________________________________
Veritas-bu maillist  -  Veritas-bu AT mailman.eng.auburn DOT edu
http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu

<Prev in Thread] Current Thread [Next in Thread>