ADSM-L

Re: [ADSM-L] Is there any way within TSM to terminate a process on excessive read errors?

2009-01-01 10:40:39
Subject: Re: [ADSM-L] Is there any way within TSM to terminate a process on excessive read errors?
From: Don France <DFrance-TSM AT ATT DOT NET>
To: ADSM-L AT VM.MARIST DOT EDU
Date: Thu, 1 Jan 2009 15:39:37 +0000
Yep,,, you will probably want to do just that;  you would normally hope the 
process would end after the first error, but (alas) it's an imperfect world --- 
I'd advise you open a PMR, this smells like a problem that could/should be 
fixed via process-end.  

The caveat is that some processes will restart (repeatedly) due to your other 
settings (like reclamation thresholds, etc.) , which could cause this to recur 
-- though drive-cleaning should have resolved.

-Don

---

Don France
Technical Architect - Tivoli Certified Consultant
Tivoli Storage Manager - Win2K/2003, AIX/Unix, OS/390

Professional Association of Contract Employees (P.A.C.E.) - www.pacepros.com
San Jose, CA
Phone - Voice/Mobile: (408) 348-8926
email: don_france AT att DOT net 

-------------- Original message from "Kauffman, Tom" <KauffmanT AT NIBCO DOT 
COM>: -------------- 


> I get frustrated when I see something like this: 
> 
> ANR8944E Hardware or media error on drive DRIVE_02 (/dev/rmt0) with volume 
> 444035L4(OP=LOCATE, Error Number= 110, CC=0, KEY=03, ASC=09, ASCQ=00, 
> SENSE=70- 
> .00.03.00.00.00.00.58.00.00.00.00.09.00.36.00.78.B5.78.B5.00.01.34.34.34.30.33-
>  
> .35.4C.19.00.00.15.04.CB.00.00.00.00.00.80.2B.60.00.00.00.20.DD.20.00.00.00.00-
>  
> .00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00-
>  
> .00.00.00.00.00.00.00.37.41.33.31.00.00.00.00.00.00, 
> Description=An undetermin- 
> ed error has occurred). Refer to Appendix C in the 'Messages' manual for 
> recommended action. 
> ANR8359E Media fault detected on LTO volume 444035L4 in drive DRIVE_02 
> (/dev/rmt0) of library GOBI. 
> ANR1080W Space reclamation is ended for volume 444035L4. The process is 
> canceled. 
> ANR1163W Offsite volume 333114L2 still contains files which could not be 
> moved.A 
> NR0986I Process 1714 for SPACE RECLAMATION running in the BACKGROUND 
> processed 
> 80934 items for a total of 13,531,260,308 bytes with a completion state of 
> FAILURE at 10:54:00. 
> 
> At the time I cancelled this a query process showed something in excess of 
> 26,000 files unreadable and I had several hundred entries in the AIX error 
> log. 
> (I have 28, 896 occurances of the ANR8944E error message today, so I presume 
> that's the accurate count). 
> 
> I cancelled the process, the input tape dismounted, the library cleaned the 
> drive - and I processed 717,095 files with no errors. 
> 
> Do I have to come up with a script of my own to catch and kill processes like 
> this? 
> 
> TIA - 
> 
> Tom Kauffman 
> NIBCO, Inc 
> 
> ________________________________ 
> 

<Prev in Thread] Current Thread [Next in Thread>
  • Re: [ADSM-L] Is there any way within TSM to terminate a process on excessive read errors?, Don France <=