ADSM-L

Re: [ADSM-L] segfault: dsmserv 6.1.3.4 RHEL 5.5 x86_64

2010-06-14 12:56:03
Subject: Re: [ADSM-L] segfault: dsmserv 6.1.3.4 RHEL 5.5 x86_64
From: "Cowen, Richard" <rcowen AT SBSPLANET DOT COM>
To: ADSM-L AT VM.MARIST DOT EDU
Date: Mon, 14 Jun 2010 12:54:58 -0400
check: IC65602: SERVER CRASHES DURING AUTOLABEL OF LIBRARY VOLUME.


-----Original Message-----
From: ADSM: Dist Stor Manager [mailto:ADSM-L AT VM.MARIST DOT EDU] On Behalf Of
Josh Davis
Sent: Monday, June 14, 2010 12:47 PM
To: ADSM-L AT VM.MARIST DOT EDU
Subject: [ADSM-L] segfault: dsmserv 6.1.3.4 RHEL 5.5 x86_64

I'm at 1h, 40m waiting on callback, but I thought I'd post this for
people searching.

I found the issue because DBB to tape would crash, but BA STG to tape
did not.  Neither did BA DB T=F DEVC=FILECLASS.

I found out later that the customer loaded some tapes and checked them
in, but it didn't click that they weren't labelled.

The gdb backtrace shows that it's crashing in ScsiAutoLabelVolume, and
just after that, I hit a crash during BA STG writing to a new tape.

This is easy enough to work around, don't let autolable run, but it's an
offering, and it shouldn't crash.

Here's my writeup:

ENV: dsmserv 6.1.3.4, RHEL 5.5 x86_64
PROBLEM: dsmserv crashes when autolabelling a tape
* Tapes processed with LABEL LIBVOL are fine.
* dsmserv drops core in the instance directory.
* No actlog and no stderr/stdout at crash
* Only a segfault indication in /var/log/messages but no details
* db2diag shows rc -50 from the dsm library and has a minicore
* DB2 stays running.
* Before and after, no more than 450mb of swap used.
* System has 16G of RAM & two 4-core Intel Xeon E5530 2.4GHz procs
* gdb backtrace on the core file shows:
#0  ScsiAutoLabelVolume (driveP=0x2aaab0108908, newLabelP=0x445b4eb0
"000042L3",
    readLabelP=0x445b3e80 "", isScratch=True, isBlank=True,
createWorm=False)
    at mmsscsi.c:19973
#1  0x00000000009aa531 in ScsiMountVolume (volNameP=0x21640c50 "SCRTCH",
    poolNameP=<value optimized out>, mntDescP=<value optimized out>,
    callbackArgP=0x1f9d4ae8) at mmsscsi.c:19464
#2  0x000000000095580e in MmsMountVolume (volNameP=0x21640c50 "SCRTCH",
    poolNameP=0x21641d10 "TAPEPOOL", mntDescP=0x445b5fa0,
    callbackArgP=0x1f9d4ae8) at mms.c:1392
#3  0x0000000000ce33aa in LtoOpen (args=0x21640c48) at pvrlto.c:288
#4  0x000000000091d675 in AgentThread (argP=0x21640bf8) at pvr.c:12986
#5  0x0000000000c807f4 in StartThread (startInfoP=0x1e5c7a08) at
pkthread.c:3325
#6  0x00000032bcc0673d in start_thread () from /lib64/libpthread.so.0
#7  0x00000032bc0d3d1d in clone () from /lib64/libc.so.6

ACTION TAKEN: isolated issue as above, created PMR with IBM RC.

ACTION PLAN: Waiting on callback.
* autolabel should not drop core.

TESTCASE: DB2 cores, dsmserv cores, db2diag, actlog



 With friendly regards,
Josh-Daniel S. Davis



The information contained in this transmission may contain privileged and 
confidential information. 
It is intended only for the use of the person(s) named above. If you are not 
the intended  
recipient, you are hereby notified that any review, dissemination, distribution 
or  
duplication of this communication is strictly prohibited. If you are not the 
intended recipient, 
please contact the sender by reply email and destroy all copies of the original 
message. 
To reply to our email administrator directly, please send an email to 
postmaster AT sbsplanet DOT com.