ADSM-L

Re: [ADSM-L] segfault: dsmserv 6.1.3.4 RHEL 5.5 x86_64

2010-06-14 13:50:10
Subject: Re: [ADSM-L] segfault: dsmserv 6.1.3.4 RHEL 5.5 x86_64
From: Zoltan Forray/AC/VCU <zforray AT VCU DOT EDU>
To: ADSM-L AT VM.MARIST DOT EDU
Date: Mon, 14 Jun 2010 13:48:31 -0400
Out of curiosity, what level/version of lin_tape are you running?
Zoltan Forray
TSM Software & Hardware Administrator
Virginia Commonwealth University
UCC/Office of Technology Services
zforray AT vcu DOT edu - 804-828-4807
Don't be a phishing victim - VCU and other reputable organizations will
never use email to request that you reply with your password, social
security number or confidential personal information. For more details
visit http://infosecurity.vcu.edu/phishing.html



From:
Josh Davis <xaminmo AT OMNITECH DOT NET>
To:
ADSM-L AT VM.MARIST DOT EDU
Date:
06/14/2010 12:47 PM
Subject:
[ADSM-L] segfault: dsmserv 6.1.3.4  RHEL 5.5 x86_64
Sent by:
"ADSM: Dist Stor Manager" <ADSM-L AT VM.MARIST DOT EDU>



I'm at 1h, 40m waiting on callback, but I thought I'd post this for people
searching.

I found the issue because DBB to tape would crash, but BA STG to tape did
not.  Neither did BA DB T=F DEVC=FILECLASS.

I found out later that the customer loaded some tapes and checked them in,
but it didn't click that they weren't labelled.

The gdb backtrace shows that it's crashing in ScsiAutoLabelVolume, and
just after that, I hit a crash during BA STG writing to a new tape.

This is easy enough to work around, don't let autolable run, but it's an
offering, and it shouldn't crash.

Here's my writeup:

ENV: dsmserv 6.1.3.4, RHEL 5.5 x86_64
PROBLEM: dsmserv crashes when autolabelling a tape
* Tapes processed with LABEL LIBVOL are fine.
* dsmserv drops core in the instance directory.
* No actlog and no stderr/stdout at crash
* Only a segfault indication in /var/log/messages but no details
* db2diag shows rc -50 from the dsm library and has a minicore
* DB2 stays running.
* Before and after, no more than 450mb of swap used.
* System has 16G of RAM & two 4-core Intel Xeon E5530 2.4GHz procs
* gdb backtrace on the core file shows:
#0  ScsiAutoLabelVolume (driveP=0x2aaab0108908, newLabelP=0x445b4eb0
"000042L3",
    readLabelP=0x445b3e80 "", isScratch=True, isBlank=True,
createWorm=False)
    at mmsscsi.c:19973
#1  0x00000000009aa531 in ScsiMountVolume (volNameP=0x21640c50 "SCRTCH",
    poolNameP=<value optimized out>, mntDescP=<value optimized out>,
    callbackArgP=0x1f9d4ae8) at mmsscsi.c:19464
#2  0x000000000095580e in MmsMountVolume (volNameP=0x21640c50 "SCRTCH",
    poolNameP=0x21641d10 "TAPEPOOL", mntDescP=0x445b5fa0,
    callbackArgP=0x1f9d4ae8) at mms.c:1392
#3  0x0000000000ce33aa in LtoOpen (args=0x21640c48) at pvrlto.c:288
#4  0x000000000091d675 in AgentThread (argP=0x21640bf8) at pvr.c:12986
#5  0x0000000000c807f4 in StartThread (startInfoP=0x1e5c7a08) at
pkthread.c:3325
#6  0x00000032bcc0673d in start_thread () from /lib64/libpthread.so.0
#7  0x00000032bc0d3d1d in clone () from /lib64/libc.so.6

ACTION TAKEN: isolated issue as above, created PMR with IBM RC.

ACTION PLAN: Waiting on callback.
* autolabel should not drop core.

TESTCASE: DB2 cores, dsmserv cores, db2diag, actlog



 With friendly regards,
Josh-Daniel S. Davis