Veritas-bu

[Veritas-bu] [Spam Detected] drive is going down

2006-11-08 10:19:23
Subject: [Veritas-bu] [Spam Detected] drive is going down
From: krzys at perfekt.net (Krzys)
Date: Wed, 8 Nov 2006 10:19:23 -0500 (EST)
Thank you for the reply.. here is what I have, maybe this will help in figuring 
out what is going on with my system... I have HP StorageWorks MSL 6000 series 
with two LTO3 tape drives.
Firmware rev :  5.16
Hardware Rev :  3
Boot Version :  4

LTO3 drives info:
Prod Revision Level :  G24W
ACI Rev Level       :  4.2
Firmware Rev Level  :  008.462

they seem to be scsi drives one has scsi id1 and the other has scsi id 2

they are connected to my SAN switch via fiber optic cable.

I am running Solaris 9 on Sparc 280R, software is netbackup 6 MP3.

Everything was working fine until all of a sudden one of the drives started to 
go down, on ocassion the second drive is downed too and then my backups do not 
take place.... Yesterday I did start Ed Gurski's script to run and monitor for 
drive to go down and then it brought it up. Since 6pm last night until now at 
around 10am this morning I got 207 emails in my mailbox that drive did go down 
and was brought up. This is crazy and I have no idea what is causing it. As I 
said its only one of the drives that is acting up this way, the other one does 
work fine for most of the time except for occasions every now and then... So I 
would figure out connection to SAN switch would not be the case of this problem 
but I do not want to rulle it out. Here is what I see in netbackup logs:

0,51216,143,111,421170,1162997654867,27118,7,0:,11:downed 
path,20:sql_update_down_path,2
0,51216,143,111,421171,1162997654867,27118,7,0:,31:all_recs = 0, alloc_key = 
13667,28:sql_update_alloc_status_recs,2
0,51216,143,111,421172,1162997654869,27118,7,0:,33:updated allocation status 
records,28:sql_update_alloc_status_recs,2
0,51216,143,111,421173,1162997654869,27118,7,0:,31:all_recs = 0, alloc_key = 
13667,27:sql_update_delete_alloc_rec,2
0,51216,111,111,1444192,1162997654870,27118,7,0:,92:SQL - 
retval=EMM_ERROR_SQLNoDataFound(2007031) retdal=100 native=<0> sqlerror=<>
  sqlstate=<>,21:DbConnection::Execute,1
0,51216,111,111,1444193,1162997654870,27118,7,0:,121:stmt=<DELETE FROM 
EMM_Allocations WHERE AllocationKey = 13667 AND DriveKey = 0
AND MediaKey = 0 AND StorageUnitName = ''>,21:DbConnection::Execute,1
0,51216,143,111,421174,1162997654870,27118,7,0:,69:cur_err = 0, m_dbconn_stat = 
0, m_dberr_stat = 0, m_closed_db_trx = 0,22:END_MDS_
DB_TRANSACTION,2
0,51216,143,111,421175,1162997654886,27118,7,0:,30:committed database 
transaction,22:END_MDS_DB_TRANSACTION,2
0,51216,143,111,421176,1162997654886,27118,7,0:,19:drive_key = 
2000018,20:notify_dealloc_drive,2
0,51216,144,111,98005,1162997654886,27118,7,0:,20:DriveKey < 2000018 
>,45:DeviceAllocatorImpl::helperDriveDeallocated(),1
0,51216,144,111,98006,1162997654900,27118,7,0:,38:Drive < LTO3-2 >, has been 
deallocated,45:DeviceAllocatorImpl::helperDriveDealloca
ted(),1
0,51216,111,111,1444194,1162997654900,27118,7,0:,9: Exiting 
,41:DeviceAllocatorImpl::~DeviceAllocatorImpl,1
0,51216,143,111,421177,1162997654901,27118,7,0:,56:drive_key = 2000018, 
ndmp_host_key = 0, path = {2,0,0,2},16:notify_down_path,2
0,51216,143,111,421178,1162997654901,27118,7,0:,242:media_serv - host_info_t: 
key = 1000003, parent_key = 1000002, fqname = testefm1
, state = 14, nbversion = 600000, nbtype = 1, cluster_key = 0, cluster_fqname = 
, active_node_key = 0, flags = 4, raw_host_key = 100
0003, raw_host_name = testefm1,16:notify_down_path,2
0,51216,144,111,98007,1162997654901,27118,7,0:,43:DriveKey < 2000018 > 
MachineKey < 1000003 >,38:DeviceAllocatorImpl::helperDownDriv
e(),1
0,51216,144,111,98008,1162997654919,27118,7,0:,119:DOWN'ing Drive < LTO3-2 > on 
host < testefm1 >, using path < {2,0,0,2} > on ndmp
host < none >, Sending 
DOWN_DRIVE_PATH,38:DeviceAllocatorImpl::helperDownDrive(),1
0,51216,144,111,98009,1162997654920,27118,7,0:,97:machineName = < testefm1 >, 
driveName = < LTO3-2 > drivePath = < {2,0,0,2} >, ndmp
HostName = <  >,26:DA_Thread_Pool::QDownDrive,1
0,51216,144,111,98010,1162997654926,27118,7,0:,17: - retval = < 0 
>,38:DeviceAllocatorImpl::helperDownDrive(),1
0,51216,144,111,98011,1162997654926,27118,15,0:,171:Command = < 2 >, 
MachineName 
= < testefm1 >, DriveName = < LTO3-2 >, Primary Pat
h = < {2,0,0,2} >, UserName = <  >, Password = <  >, PasswordKey = <  >, 
AllocationId < 0 >,19:DA_Thread_Pool::svc,1
0,51216,111,111,1444195,1162997654926,27118,7,0:,9: Exiting 
,41:DeviceAllocatorImpl::~DeviceAllocatorImpl,1
0,51216,143,111,421179,1162997654926,27118,7,0:,62:attempting to remove, 
media_key = 4000201, drive_key = 2000018,23:remove_from_all
oc_lists,2
0,51216,143,111,421180,1162997654926,27118,7,0:,23:removing, key = 
4000201,19:remove_from_key_seq,2
0,51216,143,111,421181,1162997654926,27118,7,0:,45:could not find key in 
sequence, key = 4000201,19:remove_from_key_seq,2
0,51216,143,111,421182,1162997654926,27118,7,0:,23:removing, key = 
2000018,19:remove_from_key_seq,2
0,51216,143,111,421183,1162997654927,27118,7,0:,45:could not find key in 
sequence, key = 2000018,19:remove_from_key_seq,2
2,51216,143,111,421184,1162997654927,27118,7,0:,0:,0:,2,(1029|A9:{2,0,0,2}|A5:-----|A6:LTO3-2|A8:testefm1|A24:Robotic
 
dismount failu
re|)
0,51216,143,111,421185,1162997654927,27118,7,0:,69:cur_err = 0, m_dbconn_stat = 
0, m_dberr_stat = 0, m_closed_db_trx = 1,22:END_MDS_
DB_TRANSACTION,2
0,51216,143,111,421186,1162997654927,27118,7,0:,44:database transaction has 
already been closed,22:END_MDS_DB_TRANSACTION,2
0,51216,143,111,421187,1162997654927,27118,7,0:,10:EXIT INFO:,10:deallocate,1
0,51216,143,111,421188,1162997654927,27118,7,0:,198:MdsCommonEnv_Record: 
masterServerName = tony2d, masterServerKey = 1000002, jobTy
pe = 1, allocationKey = 13667, reallocFlags = 1, spanning = 0, statusOnly = 0, 
tryMediaKey = 4000201, tryMediaLsm = -1,10:deallocate
,2

Thanks for any suggestions or help.

Chris


On Wed, 8 Nov 2006, Wilkinson, Tim wrote:

> Chris,
>
> How is the drive connected? Is it a SAN or connected via cables, etc.?
> Are you talking about the drives going down on the Master server or a
> Media server? Is it Windows or Unix?
>
>
> Cheers,
>
> Tim
>
> -----Original Message-----
> From: veritas-bu-bounces at mailman.eng.auburn.edu
> [mailto:veritas-bu-bounces at mailman.eng.auburn.edu] On Behalf Of Krzys
> Sent: Wednesday, 8 November 2006 1:33 PM
> To: Veritas-bu at mailman.eng.auburn.edu
> Subject: [Spam Detected] [Veritas-bu] drive is going down
>
> I have veritas netbackup 6.0 MP3 installed, and I have two LTO-3 tape
> drives, one of them is going down very often, the other one does go down
> every now and then but not that often... I downloaded a script that
> someone wrote (forgot his
> name) that detects when drive is down and it attempts to bring it up. So
> far after I started it at around 5pm I got like 20 messages that drive
> was down...
> what would cause drive to be brought down? Is there any settings in
> netbackup not to bring tape drives down? I mean if I can bring it up
> there is no physical problem... how can I find out what brings those
> drives down?
>
> Thanks for any suggestions.
>
> Chris
>
> _______________________________________________
> Veritas-bu maillist  -  Veritas-bu at mailman.eng.auburn.edu
> http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu
>
>
> !DSPAM:122,455146a92331146431084!
>

<Prev in Thread] Current Thread [Next in Thread>