ADSM-L

[ADSM-L] segfault: dsmserv 6.1.3.4 RHEL 5.5 x86_64

2010-06-14 12:47:54
Subject: [ADSM-L] segfault: dsmserv 6.1.3.4 RHEL 5.5 x86_64
From: Josh Davis <xaminmo AT OMNITECH DOT NET>
To: ADSM-L AT VM.MARIST DOT EDU
Date: Mon, 14 Jun 2010 09:46:36 -0700
I'm at 1h, 40m waiting on callback, but I thought I'd post this for people 
searching.

I found the issue because DBB to tape would crash, but BA STG to tape did not.  
Neither did BA DB T=F DEVC=FILECLASS.

I found out later that the customer loaded some tapes and checked them in, but 
it didn't click that they weren't labelled.

The gdb backtrace shows that it's crashing in ScsiAutoLabelVolume, and just 
after that, I hit a crash during BA STG writing to a new tape.

This is easy enough to work around, don't let autolable run, but it's an 
offering, and it shouldn't crash.

Here's my writeup:

ENV: dsmserv 6.1.3.4, RHEL 5.5 x86_64
PROBLEM: dsmserv crashes when autolabelling a tape
* Tapes processed with LABEL LIBVOL are fine.
* dsmserv drops core in the instance directory.
* No actlog and no stderr/stdout at crash
* Only a segfault indication in /var/log/messages but no details
* db2diag shows rc -50 from the dsm library and has a minicore
* DB2 stays running.
* Before and after, no more than 450mb of swap used.
* System has 16G of RAM & two 4-core Intel Xeon E5530 2.4GHz procs
* gdb backtrace on the core file shows:
#0  ScsiAutoLabelVolume (driveP=0x2aaab0108908, newLabelP=0x445b4eb0 "000042L3",
    readLabelP=0x445b3e80 "", isScratch=True, isBlank=True, createWorm=False)
    at mmsscsi.c:19973
#1  0x00000000009aa531 in ScsiMountVolume (volNameP=0x21640c50 "SCRTCH",
    poolNameP=<value optimized out>, mntDescP=<value optimized out>,
    callbackArgP=0x1f9d4ae8) at mmsscsi.c:19464
#2  0x000000000095580e in MmsMountVolume (volNameP=0x21640c50 "SCRTCH",
    poolNameP=0x21641d10 "TAPEPOOL", mntDescP=0x445b5fa0,
    callbackArgP=0x1f9d4ae8) at mms.c:1392
#3  0x0000000000ce33aa in LtoOpen (args=0x21640c48) at pvrlto.c:288
#4  0x000000000091d675 in AgentThread (argP=0x21640bf8) at pvr.c:12986
#5  0x0000000000c807f4 in StartThread (startInfoP=0x1e5c7a08) at pkthread.c:3325
#6  0x00000032bcc0673d in start_thread () from /lib64/libpthread.so.0
#7  0x00000032bc0d3d1d in clone () from /lib64/libc.so.6

ACTION TAKEN: isolated issue as above, created PMR with IBM RC.

ACTION PLAN: Waiting on callback.
* autolabel should not drop core.

TESTCASE: DB2 cores, dsmserv cores, db2diag, actlog



 With friendly regards,
Josh-Daniel S. Davis