ADSM-L

Re: [ADSM-L] TSM 8.1.1 on Linux crash

2018-01-09 18:06:09
Subject: Re: [ADSM-L] TSM 8.1.1 on Linux crash
From: Remco Post <r.post AT PLCS DOT NL>
To: ADSM-L AT VM.MARIST DOT EDU
Date: Wed, 10 Jan 2018 00:02:58 +0100
> Op 9 jan. 2018, om 18:14 heeft Martin Janosik <martin.janosik AT CZ.IBM DOT 
> COM> het volgende geschreven:
> 
> Hello there,
> 
> errno 16 usually corresponds to 'device busy'. I have observed this error
> mostly on systems with shared library (shared=yes) and with storage agents
> that had some problems to communicate with TSM server.
> Are you facing the issue only on a single tape drive, or all? Are they
> zoned by any chance also to other systems, i.e. 2nd cluster node?
> Can you send few lines from actlog related to the drive with error before
> failure occured?
> 

Hi Martin,

you are absolutely right, the drives are shared to multiple systems. The issue 
arises when the TSM instance using the drive crashes. To then free up the drive 
it has to be rebooted. But apparently, rebooting a tape drive is not a safe 
operation, given the fall-out is sometimes has.

> M. Janosik
> 
> "ADSM: Dist Stor Manager" <ADSM-L AT VM.MARIST DOT EDU> wrote on 2018-01-09
> 17:58:09:
> 
>> From: Remco Post <r.post AT PLCS DOT NL>
>> To: ADSM-L AT VM.MARIST DOT EDU
>> Date: 2018-01-09 17:59
>> Subject: [ADSM-L] TSM 8.1.1 on Linux crash
>> Sent by: "ADSM: Dist Stor Manager" <ADSM-L AT VM.MARIST DOT EDU>
>> 
>> Hi All,
>> 
>> over here we have a few TSM servers on Linux (RHEL 7.4) TSM 8.1.1.
>> 021 (to be upgraded to .100 soon) and we see something new that we
>> never saw on AIX. If for some reason a tape gets left in a drive
>> (IBM 3592) the only way to get the drive working again in to reboot
>> the drive. Until then TSM is unable to open the drive (error 16).
>> Unfortunately we seem to be able to reliably cause severe issues in
>> TSM by rebooting a tapedrive:
>> 
>> kernel: NMI watchdog: BUG: soft lockup - CPU#6 stuck for 22s!
> [dsmserv:6021]
>> 
>> The only way out is to reboot the entire server.
>> 
>> We never had such issues with TSM on AIX… is this something Linux-
>> specific that we can hang TSM in non-interuptable routines (kernel
>> space) by simply rebooting a tape drive?
>> 
>> --
>> 
>> Met vriendelijke groeten/Kind Regards,
>> 
>> Remco Post
>> r.post AT plcs DOT nl
>> +31 6 248 21 622
>> 

-- 

 Met vriendelijke groeten/Kind Regards,

Remco Post
r.post AT plcs DOT nl
+31 6 248 21 622


ADSM.ORG Privacy and Data Security by KimLaw, PLLC