Thank you for the reply.. here is what I have, maybe this will help in figuring
out what is going on with my system... I have HP StorageWorks MSL 6000 series
with two LTO3 tape drives.
Firmware rev : 5.16
Hardware Rev : 3
Boot Version : 4
LTO3 drives info:
Prod Revision Level : G24W
ACI Rev Level : 4.2
Firmware Rev Level : 008.462
they seem to be scsi drives one has scsi id1 and the other has scsi id 2
they are connected to my SAN switch via fiber optic cable.
I am running Solaris 9 on Sparc 280R, software is netbackup 6 MP3.
Everything was working fine until all of a sudden one of the drives started to
go down, on ocassion the second drive is downed too and then my backups do not
take place.... Yesterday I did start Ed Gurski's script to run and monitor for
drive to go down and then it brought it up. Since 6pm last night until now at
around 10am this morning I got 207 emails in my mailbox that drive did go down
and was brought up. This is crazy and I have no idea what is causing it. As I
said its only one of the drives that is acting up this way, the other one does
work fine for most of the time except for occasions every now and then... So I
would figure out connection to SAN switch would not be the case of this problem
but I do not want to rulle it out. Here is what I see in netbackup logs:
0,51216,143,111,421170,1162997654867,27118,7,0:,11:downed
path,20:sql_update_down_path,2
0,51216,143,111,421171,1162997654867,27118,7,0:,31:all_recs = 0, alloc_key =
13667,28:sql_update_alloc_status_recs,2
0,51216,143,111,421172,1162997654869,27118,7,0:,33:updated allocation status
records,28:sql_update_alloc_status_recs,2
0,51216,143,111,421173,1162997654869,27118,7,0:,31:all_recs = 0, alloc_key =
13667,27:sql_update_delete_alloc_rec,2
0,51216,111,111,1444192,1162997654870,27118,7,0:,92:SQL -
retval=EMM_ERROR_SQLNoDataFound(2007031) retdal=100 native=<0> sqlerror=<>
sqlstate=<>,21:DbConnection::Execute,1
0,51216,111,111,1444193,1162997654870,27118,7,0:,121:stmt=<DELETE FROM
EMM_Allocations WHERE AllocationKey = 13667 AND DriveKey = 0
AND MediaKey = 0 AND StorageUnitName = ''>,21:DbConnection::Execute,1
0,51216,143,111,421174,1162997654870,27118,7,0:,69:cur_err = 0, m_dbconn_stat =
0, m_dberr_stat = 0, m_closed_db_trx = 0,22:END_MDS_
DB_TRANSACTION,2
0,51216,143,111,421175,1162997654886,27118,7,0:,30:committed database
transaction,22:END_MDS_DB_TRANSACTION,2
0,51216,143,111,421176,1162997654886,27118,7,0:,19:drive_key =
2000018,20:notify_dealloc_drive,2
0,51216,144,111,98005,1162997654886,27118,7,0:,20:DriveKey < 2000018
>,45:DeviceAllocatorImpl::helperDriveDeallocated(),1
0,51216,144,111,98006,1162997654900,27118,7,0:,38:Drive < LTO3-2 >, has been
deallocated,45:DeviceAllocatorImpl::helperDriveDealloca
ted(),1
0,51216,111,111,1444194,1162997654900,27118,7,0:,9: Exiting
,41:DeviceAllocatorImpl::~DeviceAllocatorImpl,1
0,51216,143,111,421177,1162997654901,27118,7,0:,56:drive_key = 2000018,
ndmp_host_key = 0, path = {2,0,0,2},16:notify_down_path,2
0,51216,143,111,421178,1162997654901,27118,7,0:,242:media_serv - host_info_t:
key = 1000003, parent_key = 1000002, fqname = testefm1
, state = 14, nbversion = 600000, nbtype = 1, cluster_key = 0, cluster_fqname =
, active_node_key = 0, flags = 4, raw_host_key = 100
0003, raw_host_name = testefm1,16:notify_down_path,2
0,51216,144,111,98007,1162997654901,27118,7,0:,43:DriveKey < 2000018 >
MachineKey < 1000003 >,38:DeviceAllocatorImpl::helperDownDriv
e(),1
0,51216,144,111,98008,1162997654919,27118,7,0:,119:DOWN'ing Drive < LTO3-2 > on
host < testefm1 >, using path < {2,0,0,2} > on ndmp
host < none >, Sending
DOWN_DRIVE_PATH,38:DeviceAllocatorImpl::helperDownDrive(),1
0,51216,144,111,98009,1162997654920,27118,7,0:,97:machineName = < testefm1 >,
driveName = < LTO3-2 > drivePath = < {2,0,0,2} >, ndmp
HostName = < >,26:DA_Thread_Pool::QDownDrive,1
0,51216,144,111,98010,1162997654926,27118,7,0:,17: - retval = < 0
>,38:DeviceAllocatorImpl::helperDownDrive(),1
0,51216,144,111,98011,1162997654926,27118,15,0:,171:Command = < 2 >,
MachineName
= < testefm1 >, DriveName = < LTO3-2 >, Primary Pat
h = < {2,0,0,2} >, UserName = < >, Password = < >, PasswordKey = < >,
AllocationId < 0 >,19:DA_Thread_Pool::svc,1
0,51216,111,111,1444195,1162997654926,27118,7,0:,9: Exiting
,41:DeviceAllocatorImpl::~DeviceAllocatorImpl,1
0,51216,143,111,421179,1162997654926,27118,7,0:,62:attempting to remove,
media_key = 4000201, drive_key = 2000018,23:remove_from_all
oc_lists,2
0,51216,143,111,421180,1162997654926,27118,7,0:,23:removing, key =
4000201,19:remove_from_key_seq,2
0,51216,143,111,421181,1162997654926,27118,7,0:,45:could not find key in
sequence, key = 4000201,19:remove_from_key_seq,2
0,51216,143,111,421182,1162997654926,27118,7,0:,23:removing, key =
2000018,19:remove_from_key_seq,2
0,51216,143,111,421183,1162997654927,27118,7,0:,45:could not find key in
sequence, key = 2000018,19:remove_from_key_seq,2
2,51216,143,111,421184,1162997654927,27118,7,0:,0:,0:,2,(1029|A9:{2,0,0,2}|A5:-----|A6:LTO3-2|A8:testefm1|A24:Robotic
dismount failu
re|)
0,51216,143,111,421185,1162997654927,27118,7,0:,69:cur_err = 0, m_dbconn_stat =
0, m_dberr_stat = 0, m_closed_db_trx = 1,22:END_MDS_
DB_TRANSACTION,2
0,51216,143,111,421186,1162997654927,27118,7,0:,44:database transaction has
already been closed,22:END_MDS_DB_TRANSACTION,2
0,51216,143,111,421187,1162997654927,27118,7,0:,10:EXIT INFO:,10:deallocate,1
0,51216,143,111,421188,1162997654927,27118,7,0:,198:MdsCommonEnv_Record:
masterServerName = tony2d, masterServerKey = 1000002, jobTy
pe = 1, allocationKey = 13667, reallocFlags = 1, spanning = 0, statusOnly = 0,
tryMediaKey = 4000201, tryMediaLsm = -1,10:deallocate
,2
Thanks for any suggestions or help.
Chris
On Wed, 8 Nov 2006, Wilkinson, Tim wrote:
> Chris,
>
> How is the drive connected? Is it a SAN or connected via cables, etc.?
> Are you talking about the drives going down on the Master server or a
> Media server? Is it Windows or Unix?
>
>
> Cheers,
>
> Tim
>
> -----Original Message-----
> From: veritas-bu-bounces at mailman.eng.auburn.edu
> [mailto:veritas-bu-bounces at mailman.eng.auburn.edu] On Behalf Of Krzys
> Sent: Wednesday, 8 November 2006 1:33 PM
> To: Veritas-bu at mailman.eng.auburn.edu
> Subject: [Spam Detected] [Veritas-bu] drive is going down
>
> I have veritas netbackup 6.0 MP3 installed, and I have two LTO-3 tape
> drives, one of them is going down very often, the other one does go down
> every now and then but not that often... I downloaded a script that
> someone wrote (forgot his
> name) that detects when drive is down and it attempts to bring it up. So
> far after I started it at around 5pm I got like 20 messages that drive
> was down...
> what would cause drive to be brought down? Is there any settings in
> netbackup not to bring tape drives down? I mean if I can bring it up
> there is no physical problem... how can I find out what brings those
> drives down?
>
> Thanks for any suggestions.
>
> Chris
>
> _______________________________________________
> Veritas-bu maillist - Veritas-bu at mailman.eng.auburn.edu
> http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu
>
>
> !DSPAM:122,455146a92331146431084!
>
|