ADSM-L

Re: Long pause in the middle of an NT backup.

2004-06-10 11:12:26
Subject: Re: Long pause in the middle of an NT backup.
From: Pete Tanenhaus <tanenhau AT US.IBM DOT COM>
To: ADSM-L AT VM.MARIST DOT EDU
Date: Thu, 10 Jun 2004 11:11:50 -0400
 > dsmerror.log:
>
> 06/08/2004 01:30:48 win32NpWrite(): Error 233 writing to named pipe

This indicates that the backup client has encountered an error sending a
journal notification to the journal
service.

Most likely what happened is that the error occurred while the client was
performing the initial
non-journal based backup of a drive with an active but not yet valid
journal.

The journal service probably encountered an error reading a notification
from the backup client
and severed the connection with the client (check jbberror.log for a
return code of 998), and any
subsequent notifications sent to the journal service by the client would
result in broken connection
errors (this is what rc 233 is).

Since the client can't send any more notifications the final "Incremental
Complete" notification is
never sent to the journal service and the journal for the drive  never set
to the valid state, which means
that the next backup of the drive won't be journal based.

This problem is addressed by apar IC40627 and will be fixed in the next
5.23 fixtest (check with IBM
service for availability).

I'm not sure if this answers all your questions as to why the backup is so
slow but it should at least
answer why journaling isn't working.



Pete Tanenhaus
Tivoli Storage Solutions Software Development
email: tanenhau AT us.ibm DOT com
tieline: 320.8778, external: 607.754.4213

"Those who refuse to challenge authority are condemned to conform to it"

---------------------- Forwarded by Pete Tanenhaus/San Jose/IBM on 06/10/2004 
10:58 AM ---------------------------
Please respond to "ADSM: Dist Stor Manager" <ADSM-L AT VM.MARIST DOT EDU>
Sent by:        "ADSM: Dist Stor Manager" <ADSM-L AT VM.MARIST DOT EDU>
To:     ADSM-L AT VM.MARIST DOT EDU
cc:
Subject:        Re: Long pause in the middle of an NT backup.



I don't remember at what client level the syntax changed, but at client
level 5.2 you can add a "testflag instrument:detail" to your dsm.opt and
generate a report to dsminstr.report file that details how the session
is spending it's time.

David

>>> Matt.Anderson AT PEARSON DOT COM 6/9/2004 3:55:32 PM >>>
Thanks for the responses. I'm going to implement a cron job tonight to
query
the sessions every few minutes to see exactly what the session in
question
is doing during the "inactive" time.

Journaling (as suggested in another message) is currently being tested
for
some on one of our servers. We had problems with it in the past but
are
hoping it's improved to the point where it's usable to us.

Regarding fragmentation, I hadn't realized that could be related to
this
behavior, but it does make sense. In fact, we know we're horribly
fragmented
but are unable to take the necessary downtime to unload/reload the
database
in the foreseeable future.

Thanks!

--Matt



-----Original Message-----
From: ADSM: Dist Stor Manager [mailto:ADSM-L AT VM.MARIST DOT EDU] On Behalf
Of
Andy Carlson
Sent: Wednesday, June 09, 2004 2:47 PM
To: ADSM-L AT VM.MARIST DOT EDU
Subject: Re: Long pause in the middle of an NT backup.


One thing that appens during that time is that the TSM server is
downloading
directory entries to the client, to sort through and decide what to
back up.
With millions of entries (which it appears you have), it can take
awhile.
We had a situation with a terribly fragmented DB, where this pause was
lasting 10-15 hours on some boxes.  We did an unload/load, and they
have all
but one gone back to "normal" (the one that didn't has 5.5Million
entries,
and does not have enough memory on the client box).

On Tue, 8 Jun 2004, Anderson, Matt wrote:

> I'm experiencing behavior on some NT nodes where, in the middle of a
> scheduled backup, there will be a long pause where no activity seems
> to occur. It would appear to occur after the backup of one drive
> finishes, but before the next one starts. For example, see this log
> excerpt from this
> morning:
>
> 06/08/2004 01:30:46 ANS1898I ***** Processed 1,911,000 files *****
> 06/08/2004 01:30:47 ANS1898I ***** Processed 1,916,000 files *****
> 06/08/2004 01:30:49 ANS1898I ***** Processed 1,916,500 files *****
> 06/08/2004 01:31:50 Normal File-->       141,971,832
> \\my-server\f$\PerforceETC\Logs\Log.txt  Changed
> 06/08/2004 01:32:34 Retry # 1  Normal File-->       141,971,832
> \\my-server\f$\PerforceETC\Logs\Log.txt [Sent]
> 06/08/2004 01:32:34 Successful incremental backup of
'\\my-server\f$'
>
> 06/08/2004 07:56:10 ANS1898I ***** Processed 1,917,000 files *****
> 06/08/2004 07:56:13 ANS1898I ***** Processed 1,917,500 files *****
> 06/08/2004 07:56:15 ANS1898I ***** Processed 1,918,000 files *****
> 06/08/2004 07:56:17 ANS1898I ***** Processed 1,918,500 files *****
>
> There's almost 6.5 hours where no activity was logged in the
> dsmsched.log. There is a (possibly) corresponding message in the
> dsmerror.log:
>
> 06/08/2004 01:30:48 win32NpWrite(): Error 233 writing to named pipe
>
> The client is Windows 2000 running TSM client version 5.2.2.5 The
TSM
> server is AIX 5.2 running TSM server version 5.2.2.2
>
> Any insight that can be provided would be appreciated!
>
> Thanks,
>
> Matt Anderson
> Backup & Recovery Engineer
>
>
**********************************************************************
> ******
> This email may contain confidential material.
> If you were not an intended recipient,
> Please notify the sender and delete all copies.
> We may monitor email to and from our network.
>
****************************************************************************
>
>

****************************************************************************
This email may contain confidential material.
If you were not an intended recipient,
Please notify the sender and delete all copies.
We may monitor email to and from our network.
****************************************************************************