ADSM-L

Hanging TSM backup invalidates journal?

2003-11-07 07:08:14
Subject: Hanging TSM backup invalidates journal?
From: David McClelland <David.McClelland AT REUTERS DOT COM>
To: ADSM-L AT VM.MARIST DOT EDU
Date: Fri, 7 Nov 2003 12:05:00 +0000
Dear TSMville,

Hanging Backup Invalidates TSM Journal

o - TSM Client 5.1.5.0, Journaling engine backing up c 12,000,000 files
with a daily addition/update of around 80,000. When it works = 40
minutes. When it doesn't = 13 hours...
o - TSM Server 5.1.6.2 WinNT

A couple of nights ago, a journal backup hung and just kinda stayed
around on the TSM server in IdleW without anyone noticing. The next
day's backup began and, I'm guessing from hereon, it couldn't get access
to the TSM journal, so it reverted to a looooong normal incremental
backup. I subsequently spotted this, killed off the two IdleW sessions
and kicked off a new backup on the journal client. However, it failed to
do a journal backup and started a normal incremental again... 

Looking in the dsmerror.log, I spy a 'NpOpen: Named pipe error
connecting to server WaitOnPipe failed. > NpOpen: call failed with
return code:121 pipe name //./pipe.jnl'.

I understand that this named pipe is opened up at the initiation of a
journal backup as the b/a client attempts to connect to the journal
daemon - the return code 121 suggests that the connect failed, and
possibly the tsmjbbd.exe process wasn't up and running. I look at task
manager, and it is, but consuming a 'healthy' 263,632K of memory.
Observing its behaviour, I see it is still doing some work 'I/O Other'
in Task Manager's useful extra columns, but nothing in the 'I/O Writes'
or 'Reads' section, is this suspect...

I'm guessing that the journal became invalidated somewhere down the line
during the hung backup, or that the subsequent attempt at a backup
failed as maybe the old TSM backup still has a lock on it? The
tsmjbbd.exe is still present, and there is nothing from these dates in
the jbberror.log.

Any ideas what may be going on here? I seem to be able to get around 6
or 7 days of JBB backups before it starts to break and I have to
hand-hold it to get it up again... In terms of automatically monitoring
this, sticking a Tivoli process monitor to make sure the tsmjbbd.exe
process is running is only useful to a point (i.e. it wouldn't have
spotted the above), so it looks as though I'm going to have to trawl the
stdout of our backup logs to make sure that 'using journal for x$' is
present. Any ideas where else I should be looking - perhaps in the (what
we've called) jbberror.log for 'Journal will be restarted for FS x'?

So, questions are: 

o - any ideas what might be behind the above? A dead/alive tsmjbbd.exe,
and if so, how?
o - tsmjbbd.exe - how big should it be in 'healthy' usage? Is 263MB a
bit excessive?
o - any ideas about the best way to monitor (preferably using Tivoli
e.g. ITM, logfile adapters etc) jbb backups?

Quite a lot there - sorry!

Rgds,

David McClelland        
Global Management Systems       
Reuters 
85 Fleet Street 
London EC4P 4AJ 
E-mail  david.mcclelland AT reuters DOT com     
Reuters Messaging       david.mcclelland.reuters.com AT reuters DOT net 




-------------------------------------------------------------- --
        Visit our Internet site at http://www.reuters.com

Get closer to the financial markets with Reuters Messaging - for more
information and to register, visit http://www.reuters.com/messaging

Any views expressed in this message are those of  the  individual
sender,  except  where  the sender specifically states them to be
the views of Reuters Ltd.

<Prev in Thread] Current Thread [Next in Thread>