ADSM-L

Urgent: Errors restoring NDMP Celerra files

2006-10-13 13:29:37
Subject: Urgent: Errors restoring NDMP Celerra files
From: "Schneider, John" <schnjd AT STLO.MERCY DOT NET>
To: ADSM-L AT VM.MARIST DOT EDU
Date: Fri, 13 Oct 2006 12:25:56 -0500
Greetings,
        I am trying to restore an individual file from a TSM NDMP backup.
This is a 3.7TB filesystem, so my predecessor didn't think it was worth the
disk space to throw at a TOC backup, so I have to restore using 'restore
node'.

TSM version 5.3.4, running under AIX 5.3ML1.  Here is the console from my
first restore attempt:


tsm: MDCTSM00>restore node cx02_dm2 /fs_dst
filelist=/dst/csdforms/CSD/UpdateCSD
s.vbs pitd=10/10/2006 pittime=23:00:00
ANR1059I Selective restore of NAS node CX02_DM2, file system /fs_dst,
started
as process 933 by administrator SCHNJD.  Specified files and/or directory
trees
will be restored to destination /fs_dst.
ANS8003I Process number 933 started.

tsm: MDCTSM00>q proc

 Process Process Description  Status
  Number
-------- --------------------
-------------------------------------------------
     933 Restore NAS          NAS Node CX02_DM2, Source /fs_dst, Destination
          (selective)          /fs_dst, Elapsed Time 22 seconds. Waiting for
                               mount of input volume 010259 (21 seconds).

tsm: MDCTSM00>q req
ANR8352I Requests outstanding:
ANR8308I 003: NAS volume 010259 is required for use in library IBM3584;
CHECKIN
LIBVOLUME required within 179 minutes.

tsm: MDCTSM00>q nasbackup cx02_dm2 *

Node Name       Filespace   Object Type Object Size  Creation  Has Table of
                Name                           (MB)    Date    Contents
                                                               (TOC)?
--------------- ----------- ----------- ----------- ---------- -------------
CX02_DM2        /fs_dst     Differenti-    61,393.7 10/05/2006 No
                             al Image                23:00:01
CX02_DM2        /fs_dst     Differenti-    67,050.3 10/06/2006 No
                             al Image                23:00:00
CX02_DM2        /fs_dst     Full Image  3,711,751.2 10/07/2006 No
                                                     18:00:00
CX02_DM2        /fs_dst     Differenti-     3,736.2 10/08/2006 No
                             al Image                23:00:01
CX02_DM2        /fs_dst     Differenti-     8,003.7 10/09/2006 No
                             al Image                23:00:01
CX02_DM2        /fs_dst     Differenti-    15,614.2 10/10/2006 No
                             al Image                23:00:00
CX02_DM2        /fs_dst     Differenti-    21,198.5 10/11/2006 No
                             al Image                23:00:00

I mounted 010259 as it asked for it, and after it mounted the tape it ran
for over an hour and then produced:

10/12/06 14:28:03     ANR9999D ndlog.c(105): ThreadId<40> NDMPLOG: E Read
failed
                       on archive volume 1 (SESSION: 128391, PROCESS: 933)

10/12/06 14:28:03     ANR9999D ThreadId<40> issued message 9999 from:

                       <-0x000000010001c168 outDiagf <-0x00000001002e6c3c

                       spiOutMsg <-0x09000000005896f0 ndmpLogMessage
<-0x090000-
                       0000580608 ndmp_process_messages <-0x09000000005810e4

                       ndmpProcessRequests <-0x09000000005886a4 ndmpPoll

                       <-0x09000000005771d4 spiRtrvSeg <-0x000000010040e728

                       AsReadSegRemote <-0x00000001004ae1e0 ssRtrvRemote

                       <-0x0000000100497b10 AfRtrvRemoteThread
<-0x000000010000-
                       e9dc StartThread <-0x09000000003402dc _pthread_body

                       (SESSION: 128391, PROCESS: 933)

10/12/06 14:28:08     ANR1104E NAS Restore process 933 terminated - NDMP
session
                       errors encountered. (SESSION: 128391, PROCESS: 933)

10/12/06 14:28:08     ANR0985I Process 933 for RESTORE NAS (SELECTIVE)
running
                       in the BACKGROUND completed with completion state
FAILURE
                       at 14:28:08. (SESSION: 128391, PROCESS: 933)

10/12/06 14:28:08     ANR0514I Session 128391 closed volume 010259.
(SESSION:
                       128391)
10/12/06 14:28:37     ANR8468I NAS volume 010259 dismounted from drive
DRIVE262
                       (c112t0l0) in library IBM3584. (SESSION: 128391,
PROCESS:
                       933)

I looked back through the activity log, and it looks like 010259 is the
first tape in the previous full backup for this filesystem.
On the Celerra, we get the following:


2006-10-12 14:17:47: DRIVERS: 4:  FCDMTL 2 [1.4.1] Initiator check
condition: loopID = 3., SK = 06, ASC/Q = 2800
2006-10-12 14:17:47: DRIVERS: 4:  FCDMTL 2 [1.4.1] Initiator check
condition: loopID = 3., SK = 02, ASC/Q = 3a00
2006-10-12 14:17:51: DRIVERS: 4:  last message repeated 1 times
2006-10-12 14:17:51: DRIVERS: 4:  FCDMTL 2 [1.4.1] Suppressed all but 2 out
of 4 identical log entries
2006-10-12 14:17:51: DRIVERS: 4:  FCDMTL 2 [1.4.1] Initiator check
condition: loopID = 3., SK = 02, ASC/Q = 0401
2006-10-12 14:18:03: DRIVERS: 4:  last message repeated 1 times
2006-10-12 14:18:03: CAM: 3:  check_timeout (112) : c112t0l0 req bfba5090
timeout (10 sec 39236029860 cyc)
func 01 flags 41 stat 00 cdb 380a4c1a000000000000


2006-10-12 14:18:03: FCP: 3:  c112t0l0: abort called
2006-10-12 14:18:03: DRIVERS: 4:  FCDMTL 2 [1.4.1] Suppressed all but 2 out
of 4 identical log entries
2006-10-12 14:18:03: DRIVERS: 4:  FCDMTL 2 [1.4.1] Initiator command error:
loop_id = 3., SCSI status = 0, EXM status = 12
2006-10-12 14:18:03: CAM: 3: 6:  Attention: Scsi-112 hba is AGAIN NORMAL.

   |_SCSI ERR ON #11200    camstat 0x04  bstat DeviceError
2006-10-12 14:18:07: DRIVERS: 4:  FCDMTL 2 [1.4.1] Initiator check
condition: loopID = 3., SK = 06, ASC/Q = 2800
2006-10-12 14:18:15: DRIVERS: 4:  FCDMTL 2 [1.4.1] Initiator check
condition: loopID = 3., SK = 00, ASC/Q = 0001
2006-10-12 14:18:41: NDMP: 4:  Name: FILESYSTEM Value: /fs_dst
2006-10-12 14:18:41: NDMP: 4:  Name: LEVEL Value: 0
2006-10-12 14:18:41: NDMP: 4:  Name: UPDATE Value: Y
2006-10-12 14:18:41: NDMP: 4:  Name: NON_QUOTA_TREE Value: N
2006-10-12 14:18:41: NDMP: 4:  Name: NO_ACLS Value: N
2006-10-12 14:18:41: NDMP: 4:  Name: HIST Value: N
2006-10-12 14:18:41: NDMP: 4:  Name: DIRECT Value: N
2006-10-12 14:18:41: NDMP: 4:  Name: EXTRACT Value: Y
2006-10-12 14:18:41: NDMP: 4:  Name: EXTRACT_ACL Value: Y
2006-10-12 14:18:41: NDMP: 4:  Name: DEBUG Value: N
2006-10-12 14:18:41: NDMP: 4:  Name: VERBOSE Value: N
2006-10-12 14:18:41: NDMP: 4:  Name: REPLICATE Value: N
2006-10-12 14:18:41: NDMP: 4:  Name: FORCE Value: N
2006-10-12 14:18:41: NDMP: 4:  Name: NOWRITE Value: N
2006-10-12 14:18:41: NDMP: 4:  Name: LIST Value: N
2006-10-12 14:18:41: NDMP: 4:  Name: LIST_TREES Value: N
2006-10-12 14:18:41: NDMP: 4:  Name: DMP_NAME Value: TSM /fs_dst
2006-10-12 14:18:41: NDMP: 4:  Name: DATA_BLOCK_SIZE Value: 0
2006-10-12 14:18:41: NDMP: 4:  Name: TYPE Value: DUMP
2006-10-12 14:18:41: NDMP: 4:  < Backup type: dump >

Note that the tape drive in this case is c112t0l0, known to TSM as DRIVE262.


My second attempt was to go back and specify the latest full as the
point-in-time:

tsm: MDCTSM00>q nasbackup cx02_dm2 *

Node Name       Filespace   Object Type Object Size  Creation  Has Table of
                Name                           (MB)    Date    Contents
                                                               (TOC)?
--------------- ----------- ----------- ----------- ---------- -------------
CX02_DM2        /fs_dst     Differenti-    67,050.3 10/06/2006 No
                             al Image                23:00:00
CX02_DM2        /fs_dst     Full Image  3,711,751.2 10/07/2006 No
                                                     18:00:00
CX02_DM2        /fs_dst     Differenti-     3,736.2 10/08/2006 No
                             al Image                23:00:01
CX02_DM2        /fs_dst     Differenti-     8,003.7 10/09/2006 No
                             al Image                23:00:01
CX02_DM2        /fs_dst     Differenti-    15,614.2 10/10/2006 No
                             al Image                23:00:00
CX02_DM2        /fs_dst     Differenti-    21,198.5 10/11/2006 No
                             al Image                23:00:00

tsm: MDCTSM00>restore node cx02_dm2 /fs_dst
filelist=/dst/csdforms/CSD/UpdateCSD
s.vbs pitd=10/07/2006 pittime=18:00:00
ANR1059I Selective restore of NAS node CX02_DM2, file system /fs_dst,
started
as process 971 by administrator SCHNJD.  Specified files and/or directory
trees
will be restored to destination /fs_dst.
ANS8003I Process number 971 started.

tsm: MDCTSM00>

I am trying to restore a file which is on the full backup for 10/07/2006
18:00:00.  Am I doing the right thing, then, when I specify that
point-in-time for the restore?  Or do I have to specify a point-in-time in
the future?  The reason I ask is because I it asked to mount tape 010404,
but that tape was not used in the full backup on that date.  Here are the
tapes that it used for that full:

010259
010261
030260
030774
030901
030925
030961
031034
031054
031056
031088
031142

Either way, we got the same error as before:

10/13/06 10:49:13     ANR9999D ndlog.c(105): ThreadId<46> NDMPLOG: E Read
failed
                       on archive volume 1 (SESSION: 132921, PROCESS: 971)

10/13/06 10:49:13     ANR9999D ThreadId<46> issued message 9999 from:

                       <-0x000000010001c168 outDiagf <-0x00000001002e6c3c

                       spiOutMsg <-0x09000000005896f0 ndmpLogMessage
<-0x090000-
                       0000580608 ndmp_process_messages <-0x09000000005810e4

                       ndmpProcessRequests <-0x09000000005886a4 ndmpPoll

                       <-0x09000000005771d4 spiRtrvSeg <-0x000000010040e728

                       AsReadSegRemote <-0x00000001004ae1e0 ssRtrvRemote

                       <-0x0000000100497b10 AfRtrvRemoteThread
<-0x000000010000-
                       e9dc StartThread <-0x09000000003402dc _pthread_body

                       (SESSION: 132921, PROCESS: 971)

10/13/06 10:49:18     ANR1104E NAS Restore process 971 terminated - NDMP
session
                       errors encountered. (SESSION: 132921, PROCESS: 971)

10/13/06 10:49:18     ANR0985I Process 971 for RESTORE NAS (SELECTIVE)
running
                       in the BACKGROUND completed with completion state
FAILURE
                       at 10:49:18. (SESSION: 132921, PROCESS: 971)

10/13/06 10:49:18     ANR0514I Session 132921 closed volume 010404.
(SESSION:
                       132921)

10/13/06 10:49:46     ANR8468I NAS volume 010404 dismounted from drive
DRIVE263
                       (c160t0l0) in library IBM3584. (SESSION: 132921,
PROCESS:
                       971)

Note that the tape drive in this case is c160t0l0, known to TSM as DRIVE263.
So I am getting the same problem on two different tape backups on two
different days, and two different tapes, restoring from two different tape
drives.

I would be eternally grateful for any suggestions.  I have opened up a PMR
for this, but they have not yet responded.

Best Regards,

John D. Schneider
Senior Systems Administrator - Storage
Sisters of Mercy Health System
3637 S. Geyer Rd.
St. Louis, MO 63127
Email: schnjd AT stlo.mercy DOT net
Phone: 314-364-3150  Cell: 314-486-2359

<Prev in Thread] Current Thread [Next in Thread>
  • Urgent: Errors restoring NDMP Celerra files, Schneider, John <=