Veritas-bu

Re: [Veritas-bu] Strange behaviour on 6.5.6

2011-06-23 08:47:42
Subject: Re: [Veritas-bu] Strange behaviour on 6.5.6
From: "Lightner, Jeff" <JLightner AT water DOT com>
To: "Justin Piszcz" <jpiszcz AT lucidpixels DOT com>, "Patrick" <netbackup AT whelan-consulting.co DOT uk>
Date: Thu, 23 Jun 2011 08:47:35 -0400
Here's some notes I used in NBU 6.5.x for clearing reserved items - the
first is about clearing a single tape and the rest goes into more detail
on releasing tapes and drives (so far I haven't had to do any of this in
NBU 7.x):

Issue with tape SU2230 

 nbrbutil -dump |grep SU2230
         MdsAllocation allocationKey=430288 jobType=2 mediaKey=4004367
         mediaId=SU2230 driveKey=0 driveName= drivePath= stuName=
         masterServerName=atubks01 mediaServerName=atudva01
         ndmpTapeServerName= diskVolumeKey=0 mountKey=0 linkKey=0
         fatPipeKey=0 scsiResType=0 serverStateFlags=0

Ran: nbrbutil -releaseMDS 430288
Next "nrbutil -dump |grep SU2230" displayed nothing.

========================================================================
========
Older notes:
NBRBUTIL Notes.

>From time to time a tape is assigned to a drive and for whatever reason
it
gets hung up. Thus the drive and tape are both in use as far as the
shared
storage subsystem is concerned.  This is observed in two different ways.

The first I have seen is that there are queued vault jobs to duplicate
but
they are display the status messages:

08/22/2007 07:44:12 - requesting resource SU0822
08/22/2007 07:44:12 - reserving resource SU0822

But never show the line
08/22/2007 11:29:23 - resource SU0822 reserved

Sometimes the tape is in use by another backup so running "vmoprcmd  |
grep
hcart" will show if the tape is in a drive. If not it is most likely
hung in
the database and will need to be resolved.

The second indication of problems was found by Rex using the command
"vmdareq
| egrep -I  ult|scan..  This was showing lines like:

Ultrium12 - RESERVED on Wed Aug 22 10:00:23 2007
Ultrium13 - RESERVED on Tue Aug 21 20:17:01 2007
Ultrium14 - RESERVED on Wed Aug 22 06:15:59 2007

But there were no backups running from this time.  So this shows that
the
drive is locked in the DB and will need to be cleared as well.  Drives
require that the tape ID be located first and that this allocation ID be
used
to release the lock.


Clearing Drives
Run both vmoprcmd | grep -i hcart and vmdareq | egrep -i "ult|reserved".
Look for drives that are RESERVED but do not have tapes in the drive
from a
vmoprcmd standpoint.  Look for allocations that are out of place with
the
rest of the list.

Ultrium1 - RESERVED on Wed Aug 22 05:15:23 2007
     atubks01 RESERVED SCAN_HOST UP
Ultrium10 - AVAILABLE
Ultrium11 - RESERVED on Wed Aug 22 11:08:13 2007
     atubks01 RESERVED SCAN_HOST UP
Ultrium12 - RESERVED on Wed Aug 22 10:00:23 2007
     atubks01 RESERVED UP
Ultrium13 - RESERVED on Tue Aug 21 20:17:01 2007
     atubks01 RESERVED UP
Ultrium14 - RESERVED on Wed Aug 22 06:15:59 2007
     atubks01 RESERVED SCAN_HOST UP
Ultrium15 - RESERVED on Wed Aug 22 11:25:30 2007
     atubks01 RESERVED SCAN_HOST UP
Ultrium16 - RESERVED on Tue Aug 21 20:13:26 2007
     atubks01 RESERVED SCAN_HOST UP
Ultrium17 - RESERVED on Wed Aug 22 04:59:29 2007
     atubks01 RESERVED SCAN_HOST UP
Ultrium18 - RESERVED on Wed Aug 22 11:29:23 2007
     atubks01 RESERVED SCAN_HOST UP
Ultrium19 - RESERVED on Wed Aug 22 11:39:36 2007
     atubks01 RESERVED SCAN_HOST UP
Ultrium2 - RESERVED on Wed Aug 22 09:59:57 2007
     atubks01 RESERVED UP
Ultrium20 - RESERVED on Wed Aug 22 07:31:56 2007
     atubks01 RESERVED UP
Ultrium3 - RESERVED on Tue Aug 21 20:13:05 2007
     atubks01 RESERVED UP
Ultrium4 - RESERVED on Wed Aug 22 11:29:22 2007
     atubks01 RESERVED SCAN_HOST UP
Ultrium5 - RESERVED on Wed Aug 22 02:00:00 2007
     atubks01 RESERVED UP
Ultrium6 - RESERVED on Wed Aug 22 10:38:42 2007
     atubks01 RESERVED SCAN_HOST UP
Ultrium7 - RESERVED on Wed Aug 22 07:31:56 2007
     atubks01 RESERVED UP
Ultrium8 - RESERVED on Tue Aug 21 20:16:44 2007
     atubks01 RESERVED SCAN_HOST UP
Ultrium9 - RESERVED on Wed Aug 22 11:39:40 2007
     atubks01 RESERVED SCAN_HOST UP

Ultrium1                 Yes     Yes    SU1142  SU1142  Yes       hcart
Ultrium10                No      No                     No        hcart
Ultrium11                Yes     Yes    SU1092  SU1092  Yes       hcart
Ultrium12                Yes     Yes    SU2619  SU2619  Yes       hcart
Ultrium13                No      No                     No        hcart
Ultrium14                Yes     Yes    SU1150  SU1150  Yes       hcart
Ultrium15                Yes     Yes    SU2427  SU2427  Yes       hcart
Ultrium16                No      No                     No        hcart
Ultrium17                Yes     Yes    SU2783  SU2783  Yes       hcart
Ultrium18                Yes     Yes    SU0822  SU0822  Yes       hcart
Ultrium19                Yes     Yes    SU1283  SU1283  Yes       hcart
Ultrium2                 Yes     Yes    SU2246  SU2246  Yes       hcart
Ultrium20                Yes     Yes    SU1731  SU1731  Yes       hcart
Ultrium3                 No      No                     No        hcart
Ultrium4                 Yes     Yes    SU2182  SU2182  Yes       hcart
Ultrium5                 Yes     Yes    SU1006  SU1006  Yes       hcart
Ultrium6                 Yes     Yes    SU1414  SU1414  Yes       hcart
Ultrium7                 Yes     Yes    SU2276  SU2276  Yes       hcart
Ultrium8                 No      No                     No        hcart
Ultrium9                 Yes     Yes    SU0468  SU0468  Yes       hcart

In the example output above drives Ultrium3, Ultrum8, Ultrium13 and
Ultrium16
have reservation times on them from around 20:00 the night before.
Vmoprcmd
shows that the drives do not have tapes in them either but the drives
are
reserved for use. Running nbrbutil and looking for one of the drives
shows
the following:  The reserved time must be at least 16 hours later than
the
current time AND no backups are running that started at this time.

# nbrbutil -dump | grep Ultrium3
         index=21 (Allocation
allocation={74CDE574-1DD2-11B2-B328-000F20687028}
provider=ReservationGroupProvider resourcename=SU0637
masterserver=atubks01
groupid={74CDE1DC-1DD2-11B2-8032-000F20687028} userSequence=1
userid="jobid=208398" (Media_Drive_Allocation_Record:
AllocationKey=28844
(Media_Drive_Record:  MediaKey=4002684 MediaId=SU0637
MediaServer=atubks01
DriveKey=2000046 DriveName=Ultrium3 PrimaryPath=/dev/rmt/22mnb
PoolName=Full
RobotNum=2 RobotType=8 MediaTypeName=NetBackup HCART1
DriveTypeName=NetBackup
HCART1 NdmpControlHost= RetentionLevel=0 PolicyType=2 JobType=10
MasterServer=atubks01) (Storage_Unit_Record:  name= MasterServer=
MediaServer= STUType=0 RobotType=0 RobotNumber=0 Density=0
OnDemandOnly=0
ConcurrentJobs=0 ActiveJobs=0 MaxMultiplexing=0 NdmpAttachHost=
AbsolutePath=) (Bptm_Strings_Record: 0="MEDIADB 1 28844 SU0637 4002684
------
6 1186882203 1187578805 1190257205 1187618390 50030944 5 5 3 10 0 0 1024
0
195447 0" 1="VOLUME 1 SU0637 4002684 SU0637 Full FUJIFILM 02DO114064 6 8
2
442 0 {00000000-0000-0000-0000-000000000000} 0" 2="DRIVE 2 Ultrium3
2000046
IE71K03187 /dev/rmt/22mnb -1 -1 -1 -1 0 0 0 0 *NULL* *NULL* *NULL*
*NULL* 1
16" 3="STORAGE 0 *NULL* 0 0" ) TpReqFileName=))
        MdsAllocation allocationKey=28844 jobType=10 mediaKey=4002684
mediaId=SU0637 driveKey=2000046 driveName=Ultrium3
drivePath=/dev/rmt/22mnb
stuName= masterServerName=atubks01 mediaServerName=atubks0

I see that the media ID is SU0637 in the above example so I use the
process
below for releasing media.  In the case where there is no media ID
listed
(mediaId= driveKey=.) for all dumped entries, then I used the allocation
ID
listed to release the drive using the release procedures below.

Clearing Tapes
To clear allocation holds for a tape do the following:

# nbrbutil -dump | grep <media id> or <drive name>

# nbrbutil -dump |grep SU0822
         index=1 (Request provider=DriveOperationProvider
resourcename=MEDIA
RESOURCE  userSequence=1 (MediaRequest: mediaId=SU0822
mediaServer=atubks01
mediaKey=0 userReservationId= assignedTime=0 client= usageType=10
mustBeNdmp=no driveName= drivePath= mediaPool= robotNumber=-1
slotNumber=-1
density=-1 ndmpControlHost= failIfNoMedia=yes externalFile=))
         index=2 (Request provider=ReservationGroupProvider
resourcename=SU0822  userSequence=2 (MediaReservationRequest:
mountCount=1
request=(MediaRequest: mediaId=SU0822 mediaServer=atubks01 mediaKey=0
userReservationId= assignedTime=0 client= usageType=10 mustBeNdmp=no
driveName= drivePath= mediaPool= robotNumber=-1 slotNumber=-1 density=-1
ndmpControlHost= failIfNoMedia=yes externalFile=))))
         index=5 (Allocation
allocation={11B5E9AA-1DD2-11B2-B9FA-000F20687028}
provider=ReservationGroupProvider resourcename= masterserver=atubks01
groupid={11B5E734-1DD2-11B2-82D0-000F20687028} userSequence=2
userid="jobid=208396" (MediaReservation: mountCount=1
reservationKey=28789
request=(MediaRequest: mediaId=SU0822 mediaServer=atubks01
mediaKey=4003252
userReservationId= assignedTime=0 client= usageType=0 mustBeNdmp=no
driveName= drivePath= mediaPool= robotNumber=-1 slotNumber=-1 density=-1
ndmpControlHost= failIfNoMedia=no externalFile=)))
         index=48 (Allocation
allocation={11B5EAD6-1DD2-11B2-85BC-000F20687028}
provider=ReservationGroupProvider resourcename=SU0822
masterserver=atubks01
groupid={11B5E734-1DD2-11B2-82D0-000F20687028} userSequence=1
userid="jobid=208396" (Media_Drive_Allocation_Record:
AllocationKey=28791
(Media_Drive_Record:  MediaKey=4003252 MediaId=SU0822
MediaServer=atubks01
DriveKey=2000053 DriveName=Ultrium10 PrimaryPath=/dev/rmt/29mnb
PoolName=Full
RobotNum=2 RobotType=8 MediaTypeName=NetBackup HCART1
DriveTypeName=NetBackup
HCART1 NdmpControlHost= RetentionLevel=0 PolicyType=2 JobType=10
MasterServer=atubks01) (Storage_Unit_Record:  name= MasterServer=
MediaServer= STUType=0 RobotType=0 RobotNumber=0 Density=0
OnDemandOnly=0
ConcurrentJobs=0 ActiveJobs=0 MaxMultiplexing=0 NdmpAttachHost=
AbsolutePath=) (Bptm_Strings_Record: 0="MEDIADB 1 28791 SU0822 4003252
------
6 1185750016 1187564415 1190242815 1187106913 157741280 4 3 3 10 0 0
1024 0
2464718 0" 1="VOLUME 1 SU0822 4003252 SU0822 Full FUJIFILM 02DO112184 6
8 2
126 0 {00000000-0000-0000-0000-000000000000} 0" 2="DRIVE 2 Ultrium10
2000053
IE72K00060 /dev/rmt/29mnb -1 -1 -1 -1 0 0 0 0 *NULL* *NULL* *NULL*
*NULL* 1
16" 3="STORAGE 0 *NULL* 0 0" ) TpReqFileName=))
        MdsAllocation allocationKey=28789 jobType=14 mediaKey=4003252
mediaId=SU0822 driveKey=0 driveName= drivePath= stuName=
masterServerName=atubks01 mediaServerName= ndmpTapeServerName=
        MdsAllocation allocationKey=28791 jobType=10 mediaKey=4003252
mediaId=SU0822 driveKey=2000053 driveName=Ultrium10
drivePath=/dev/rmt/29mnb
stuName= masterServerName=atubks01 mediaServerName=atubks01
ndmpTapeServerName=

Look at the output above and the bold highlights.  This shows that this
is
for this tape.  Now look and find the userid="jobid=208396" field.  This
is
showing that this jobid has an exclusive lock on this tape ID.  It also
shows
the drive as well.  A quick check of the active jobs in the GUI or with
"bpdbjobs | grep Active". command will show that this is not an active
job but
a hung allocation.  So this needs to be released.

Always make sure to verify the following BEFORE attempting to release a
lock!
1.      The listed jobid is not an active job!
2.      The associated tape is not in a tape drive.
3.      There are no tapes in the listed drive.
4.      The tape drive is UP on all servers.

Failure to make sure the above criteria are correct can result in failed
backup and possibly media corruption.  If there is ANY doubt do not do
the
operation and get with one of the Unix Admins.

Releasing the lock.
Find the allocation= field and do copy for a later paste operation.  You
want
the allocation ID for the one that had MediaReservation: inside the
second
set of parentheses.  The tape drive allocation will have
Media_Drive_Allocation_Record.

The following command does the release of the media and the drive.

# nbrbutil -canel <allocation=>


# nbrbutil -cancel 11B5EAD6-1DD2-11B2-85BC-000F20687028
No request with ID {11B5EAD6-1DD2-11B2-85BC-000F20687028} found
# nbrbutil -release 11B5EAD6-1DD2-11B2-85BC-000F20687028
(releasing (Allocation allocation={11B5EAD6-1DD2-11B2-85BC-000F20687028}
provider=ReservationGroupProvider resourcename=SU0822
masterserver=atubks01
groupid={11B5E734-1DD2-11B2-82D0-000F20687028} userSequence=1
userid="jobid=208396" (Media_Drive_Allocation_Record:
AllocationKey=28791
(Media_Drive_Record:  MediaKey=4003252 MediaId=SU0822
MediaServer=atubks01
DriveKey=2000053 DriveName=Ultrium10 PrimaryPath=/dev/rmt/29mnb
PoolName=Full
RobotNum=2 RobotType=8 MediaTypeName=NetBackup HCART1
DriveTypeName=NetBackup
HCART1 NdmpControlHost= RetentionLevel=0 PolicyType=2 JobType=10
MasterServer=atubks01) (Storage_Unit_Record:  name= MasterServer=
MediaServer= STUType=0 RobotType=0 RobotNumber=0 Density=0
OnDemandOnly=0
ConcurrentJobs=0 ActiveJobs=0 MaxMultiplexing=0 NdmpAttachHost=
AbsolutePath=) (Bptm_Strings_Record: 0="MEDIADB 1 28791 SU0822 4003252
------
6 1185750016 1187564415 1190242815 1187106913 157741280 4 3 3 10 0 0
1024 0
2464718 0" 1="VOLUME 1 SU0822 4003252 SU0822 Full FUJIFILM 02DO112184 6
8 2
126 0 {00000000-0000-0000-0000-000000000000} 0" 2="DRIVE 2 Ultrium10
2000053
IE72K00060 /dev/rmt/29mnb -1 -1 -1 -1 0 0 0 0 *NULL* *NULL* *NULL*
*NULL* 1
16" 3="STORAGE 0 *NULL* 0 0" ) TpReqFileName=)))

Now run the dump process again.  If a job was waiting on the media, the
information will update accordingly.

# nbrbutil -dump | grep SU0822
         index=8 (Allocation
allocation={75BE0D58-1DD2-11B2-B723-000F20687028}
provider=ReservationGroupProvider resourcename=SU0822
masterserver=atubks01
groupid={75BE0966-1DD2-11B2-8B2C-000F20687028} userSequence=1
userid="jobid=208516" (Media_Drive_Allocation_Record:
AllocationKey=29065
(Media_Drive_Record:  MediaKey=4003252 MediaId=SU0822
MediaServer=atubks01
DriveKey=2000061 DriveName=Ultrium18 PrimaryPath=/dev/rmt/15mnb
PoolName=Full
RobotNum=2 RobotType=8 MediaTypeName=NetBackup HCART1
DriveTypeName=NetBackup
HCART1 NdmpControlHost= RetentionLevel=0 PolicyType=2 JobType=10
MasterServer=atubks01) (Storage_Unit_Record:  name= MasterServer=
MediaServer= STUType=0 RobotType=0 RobotNumber=0 Density=0
OnDemandOnly=0
ConcurrentJobs=0 ActiveJobs=0 MaxMultiplexing=0 NdmpAttachHost=
AbsolutePath=) (Bptm_Strings_Record: 0="MEDIADB 1 29065 SU0822 4003252
------
6 1185750016 1187564415 1190242815 1187106913 157741280 4 3 3 10 0 0
1024 0
2464718 0" 1="VOLUME 1 SU0822 4003252 SU0822 Full FUJIFILM 02DO112184 6
8 2
126 0 {00000000-0000-0000-0000-000000000000} 0" 2="DRIVE 2 Ultrium18
2000061
HUB4C01W8H /dev/rmt/15mnb -1 -1 -1 -1 0 0 0 0 *NULL* *NULL* *NULL*
*NULL* 1
16" 3="STORAGE 0 *NULL* 0 0" ) TpReqFileName=))
         index=34 (Allocation
allocation={75BE0BDC-1DD2-11B2-83E1-000F20687028}
provider=ReservationGroupProvider resourcename= masterserver=atubks01
groupid={75BE0966-1DD2-11B2-8B2C-000F20687028} userSequence=2
userid="jobid=208516" (MediaReservation: mountCount=1
reservationKey=29063
request=(MediaRequest: mediaId=SU0822 mediaServer=atubks01
mediaKey=4003252
userReservationId= assignedTime=0 client= usageType=0 mustBeNdmp=no
driveName= drivePath= mediaPool= robotNumber=-1 slotNumber=-1 density=-1
ndmpControlHost= failIfNoMedia=no externalFile=)))
        MdsAllocation allocationKey=29063 jobType=14 mediaKey=4003252
mediaId=SU0822 driveKey=0 driveName= drivePath= stuName=
masterServerName=atubks01 mediaServerName= ndmpTapeServerName=
        MdsAllocation allocationKey=29065 jobType=10 mediaKey=4003252
mediaId=SU0822 driveKey=2000061 driveName=Ultrium18
drivePath=/dev/rmt/15mnb
stuName= masterServerName=atubks01 mediaServerName=atubks01
ndmpTapeServerName=

-----Original Message-----
From: veritas-bu-bounces AT mailman.eng.auburn DOT edu
[mailto:veritas-bu-bounces AT mailman.eng.auburn DOT edu] On Behalf Of Justin
Piszcz
Sent: Thursday, June 23, 2011 8:04 AM
To: Patrick
Cc: veritas-bu AT mailman.eng.auburn DOT edu
Subject: Re: [Veritas-bu] Strange behaviour on 6.5.6

Hi,

Also:

> While this is not the worst of our problems in may be symptomatic of
the 
> overall problem. We get these errors on a daily basis:
> 2, 5, 6, 10, 12, 13, 23, 24, 25, 40, 42, 47, 50, 52, 54, 58, 59, 63,
71, 
> 156, 160, 191, 196, 219, 232, 239, 249, 2001, 2009
Something sounds REALLY broken with this environment, is DNS working 
properly, are catalog backups running successfully?

Also your vmquery shows;
media ID:              ABS287
barcode:               VTABS287
media description:     Virtual tapes from VTL
robot type:            TLD - Tape Library DLT (8)

If its a physical tape why does it have a virtual tape label for the 
barcode?

Are you just stepping into this environment?
How many policies/etc are there?
I wonder if it would be worth starting over if there are that many
issues 
and if you don't have any infinite retention data/etc.

Justin.




On Thu, 23 Jun 2011, Justin Piszcz wrote:

>
> On Thu, 23 Jun 2011, Patrick wrote:
>
>> Hi All,
>> 
>> 
>> 
>> I am at a new site and it is in very bad shape. One thing I would
like to
>> ask:
>> 
>
> Hi,
>
> What does nbrbutil -dump | grep ABS287
> Say?
>
> Have you tried cycling/bouncing the environment, e.g. if there is
request 
> associated with that resource (ABS287)?
>
> Justin.
>
>
_______________________________________________
Veritas-bu maillist  -  Veritas-bu AT mailman.eng.auburn DOT edu
http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu
 
Proud partner. Susan G. Komen for the Cure.
 
Please consider our environment before printing this e-mail or attachments.
----------------------------------
CONFIDENTIALITY NOTICE: This e-mail may contain privileged or confidential 
information and is for the sole use of the intended recipient(s). If you are 
not the intended recipient, any disclosure, copying, distribution, or use of 
the contents of this information is prohibited and may be unlawful. If you have 
received this electronic transmission in error, please reply immediately to the 
sender that you have received the message in error, and delete it. Thank you.
----------------------------------
_______________________________________________
Veritas-bu maillist  -  Veritas-bu AT mailman.eng.auburn DOT edu
http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu

<Prev in Thread] Current Thread [Next in Thread>