ADSM-L

Re: [ADSM-L] HSM Node Failing

2009-09-02 11:46:09
Subject: Re: [ADSM-L] HSM Node Failing
From: David McClelland <tsm AT NETWORKC.CO DOT UK>
To: ADSM-L AT VM.MARIST DOT EDU
Date: Wed, 2 Sep 2009 16:45:03 +0100
Okay - I think this error gets us more towards the business end of your
problem.

Take a look at:

http://support.microsoft.com/kb/909424/en-us

Guessing at the history of this, if it pre-dates your involvement, an
explanation could be that TSM Journaling was enabled on this backup client
because it had millions of files that the BA client was trying to backup on
a daily basis - this can take a long time and can sometimes fall over
altogether, for example if the dsmc.exe thread reaches its upper memory
limit, or another problem crops up (as per the MSKB article above). Look up
in the IBM TSM client docs why Journaling is useful in these instances
(http://publib.boulder.ibm.com/infocenter/tivihelp/v1r1/topic/com.ibm.itsmfd
t.doc/ans6000099.htm#journalb).

If the journaling engine keeps on falling over (and there are some
troubleshooting tools), the BA client falls back to a 'normal' incremental
backup - which looks now as though it's failing. If the KB article above
doesn't help, there may be workarounds, including splitting the backup into
smaller chunks, and a client option 'MEMORYEFFICIENTBACKUP'
(http://publib.boulder.ibm.com/infocenter/tivihelp/v1r1/topic/com.ibm.itsmfd
t.doc/ans60000322.htm#opt6080). Have a look here too:
http://www.adsm.org/forum/showthread.php?t=15867

Hope that helps,

/David Mc
London, UK

-----Original Message-----
From: ADSM: Dist Stor Manager [mailto:ADSM-L AT VM.MARIST DOT EDU] On Behalf Of
Danny Blair
Sent: 02 September 2009 15:49
To: ADSM-L AT VM.MARIST DOT EDU
Subject: Re: [ADSM-L] HSM Node Failing

I do see an error in the dsmerror.log:


09/02/2009 02:37:48 ANS9999E ntrc.cpp(939): Received Win32 RC 1450
(0x000005aa) from FileRead(): ReadFile
'\\hi-winhsm\g$\EEGRAW\NKT\EEG2100\CA6742Y0.VOR\6742Y004.M4A'. Error
desription: Insufficient system resources exist to complete the requested
service.
09/02/2009 02:37:51 ANS1028S An internal program error occurred.
09/02/2009 02:37:52 ANS1512E Scheduled event 'SM-0200' failed.  Return code
= 12.

I thought I included that in my original email, my apologies. This error is
what lead me to belive there is an issue on the node itself.

On Wed, Sep 2, 2009 at 10:43 AM, David McClelland <tsm AT networkc.co DOT 
uk>wrote:

> At first glance, the errors below look to me like more of an issue
> regarding
> the TSM Journaling Engine on this client, rather than any HSM component.
> When you say 'client is failing', can you give a little more description
of
> the symptoms which suggest this failure, perhaps some output from your
> dsmerror.log file with any relevant errors?
>
> /David Mc
> London, UK
>
> -----Original Message-----
> From: ADSM: Dist Stor Manager [mailto:ADSM-L AT VM.MARIST DOT EDU] On Behalf 
> Of
> Danny Blair
> Sent: 02 September 2009 15:31
> To: ADSM-L AT VM.MARIST DOT EDU
>  Subject: [ADSM-L] HSM Node Failing
>
> While I am new to managing TSM, I have tried to research this and have not
> found much other than "upgrade to 5.5.2" (we are on 5.5.1). Has anyone
else
> seen this on you HSM clients?
>
> Client is failing. Event viewer on the client machine is full of
>
> "Journal for fs 'H:' reset:"
>
> and
>
> "Notification buffer overrun monitoring fs 'H:\', journal will be reset."
>
> 3.37 GB RAM, 60 GB free on C: anmd at least 200 gb free on each drive.
>
> It was a problem on just one drive (others where apparently working fine),
> but just last night it started on another.
>
> It sounds like to me that the server itself is having a problem completing
> the clients request.
>
> Any ideas?
>
> Thanks in advance.
>
> No virus found in this incoming message.
> Checked by AVG - www.avg.com
> Version: 8.5.409 / Virus Database: 270.13.69/2328 - Release Date: 09/02/09
> 05:50:00
>

No virus found in this incoming message.
Checked by AVG - www.avg.com
Version: 8.5.409 / Virus Database: 270.13.69/2328 - Release Date: 09/02/09
05:50:00

<Prev in Thread] Current Thread [Next in Thread>