ADSM-L

Re: More TSM journaling stuff

2003-09-24 11:14:34
Subject: Re: More TSM journaling stuff
From: David McClelland <David.McClelland AT REUTERS DOT COM>
To: ADSM-L AT VM.MARIST DOT EDU
Date: Wed, 24 Sep 2003 16:13:23 +0100
Pete,

Thanks for your responses...

>> The error you are seeing in the journal daemon is probably caused
because the journal db has exceeded the supported 
>> maximum of 2 gig.

I was watching the journal files, and the big one never went above
1.6GB... This was during the initial full backup. The jbberror.log
entries accompanying the termination of the service go something like
these:

09/23/2003 20:06:37 jnlDbCntrl(): Error updating the journal for fs
'G:', dbUpdEntry() rc = -1, last error = 27
09/23/2003 20:06:38 JbbMonitorThread(): DB Access thread, tid 3100 ended
with return code 215.
09/23/2003 20:07:39 NpOpen: Named pipe error connecting to server
WaitOnPipe failed.
NpOpen: call failed with return code:121 pipe name \\.\pipe\jnl
09/23/2003 20:07:39 NpListeningThreadCleanUp():  NpOpen(): Error -190 

Still looks as though I'm seeing your error as described below though...

>> That having been said, the real problem to look at is why the journal
grew so large.

Agreed! Although as I say, it didn't seem to go above the 2GB limit.

>> Keep in mind that each journal entry represents a the most recent
change for a file/directory, and that journal 
>> entries are unique, meaning the there can only be one entry for each
object on the file system.

Okay, well, this was the first full backup of a 9 million file
filesystem, so would this cause a big journal file?

If so, does it follow that in practice, we're best to do a normal
'unjournalled' initial backup of a filesystem so that we get all of the
initial hit out of the way (an don't come a cropper (is that only an
English term?) with a large journal file), and *then* do another
incremental with journaling enabled so we get the journaling engine
initialised?

>> Are you running virus scan software and if so what type and version ?
>> (example: Norton Anti-Virus Corporate Edition Version 8.00)

>> Some virus protection software touches every file processed during
virus scan processing, 
>> and this in turn floods the journal with change notifications and
grows the journal.

Okay, I'm running Sophos Antivirus 3.69. My include exclude list means
I'm only backing up one filepath on a drive (e.g. g:\file_data\...\*),
but I guess that the journal engine records all changes, regardless of
include/exclude list specification.

I'd be very interested in having a look at the journal proofing utility
- please feel free to point me at it/mail it off-list if necessary.

Pete - thanks for all your help so far...

Rgds,

David McClelland
Management Systems Integrator
Global Management Systems
Reuters
85 Fleet Street
London EC4P 4AJ

E-mail - david.mcclelland AT reuters DOT com
Reuters Messaging - david.mcclelland.reuters.com AT reuters DOT net

-----Original Message-----
From: Pete Tanenhaus [mailto:tanenhau AT US.IBM DOT COM] 
Sent: 24 September 2003 14:26
To: ADSM-L AT VM.MARIST DOT EDU
Subject: 


I'll try to answer/address your questions as best I can.

>>> My TSM client is a file server, on its first full incremental backup

>>> (with journaling turned on) stowed away nearly 9 million files on 
>>> the TSM server - a perfect candidate for the TSM journaling engine I

>>> thought. However, the tsmjbbd.exe process bombed just before the 
>>> end>>

>>> with a 'DB Access Critical Thread Return code 215' type error,
although
>>> the backup continued.


The error you are seeing in the journal daemon is probably caused
because the journal db has exceeded the supported maximum of 2 gig.

If you look in your journal errorlog (jbberror.log) you'll probably see
the following message:

 Error updating the journal  for fs C:', dbUpdEntry() rc = 27

There is a bug the journal service which causes the process to shutdown
when this error occurs and apar IC37040 has been opened and the fix will
be included in an upcoming fixtest.

That having been said, the real problem to look at is why the journal
grew so large.

Keep in mind that each journal entry represents a the most recent change
for a file/directory, and that journal entries are unique, meaning the
there can only be one entry for each object on the file system.

Are you running virus scan software and if so what type and version ?
(example: Norton Anti-Virus Corporate Edition Version 8.00)

Some virus protection software touches every file processed during virus
scan processing, and this in turn floods the journal with change
notifications and grows the journal.

There are circumventions from at least one of the virus protection
vendors
(Symantec) for this problem.


>>>Now, 9 million files, at an average of maybe 500K per TSM database
entry
>>>equals roughly 4.5GB. Was TSM trying to send the *whole* 4.5GB
inventory
>>>for this node to the dsmc.exe process on the client? Needless to say,
at
>>>2GB (I believe the limit that Win2K places on a single process) the 
>>>TSM client had had enough and ended with an 'ANS1030E System ran out 
>>>of memory. Process ended'.

>>> So, what shall I do - is MEMORYEFFICIENTBACKUP YES my only get out 
>>>of jail card here, and exactly what does this do differently? Is my 
>>>understanding above what is actually happening?

Keep in mind that a full progressive incremental backup must be done
(one that results in the Last Backup Complete Data being updated on the
server) before backups will be journal based.

Once the intial backup has been completed and the journal is validated
the next backup should be journal based.

So you may want to use MEMORYEFFICIENTBACKUP for the initial backup at
least.


Journal Based Backup should use much less memory since the only objects
inspected are those obtained from the journal.



>>> Now, 9 million files, at an average of maybe 500K per TSM database
entry
>>> equals roughly 4.5GB. Was TSM trying to send the *whole* 4.5GB
inventory
>>> for this node to the dsmc.exe process on the client? Needless to 
>>> say,
at
>>> 2GB (I believe the limit that Win2K places on a single process) the
TSM
>>> client had had enough and ended with an 'ANS1030E System ran out of 
>>> memory. Process ended'.

>>> So, what shall I do - is MEMORYEFFICIENTBACKUP YES my only get out 
>>> of jail card here, and exactly what does this do differently? Is my 
>>> understanding above what is actually happening?

>>> I'd be most grateful to hear of anyone else's positive or negative 
>>> experiences of using the Journaling Engine, as it seems just so
*ideal*
>>> for some of our file servers, yet my experiences so far suggest it
might
>>> not be as easy and robust as I would ideally like it to be (i.e. 
>>> cancelled backups forcing restart of journal, process bombing out
midway
>>> through backup etc.), especially as a full or normal incremental
backup
>>> can run into days to complete..

Aborting a backup  doesn't cause the journal process to be restarted or
the journal to be invalidated, but certain other circumstances can.

The most likely cause of this is when the file system is flooded with a
large amount of change activity which either fills up the journal or
can't be processed fast enough by the journal file system monitor.

The process should never shutdown when these problems occur (again,
there is an apar opened against it shutting down when the journal grows
larger than 2 gig), but the journal has to be invalidated which means
that backup's can't be journal based until other full incremental is
performed.

Another thing to keep in mind is that journals are always invalidated
when the journal daemon process is recycled unless the PreserveDbOnExit
flag is specified.

All this having been said, Journal Based Backup is only a viable
solution for environments in which the amount of file system activity is
light to moderate, and that the activty is somewhat well distributed.

Running applications which touch every file (or a very large percentage
of files) on the file system, or which flood the file system with
changes in a very short period of time (such as copying a very large
directory
tree) will make journaling unusable.

I have developed a file system monitoring/profiling tools which can
useful in determining if journaling is viable for a particular file
system, and I am more than willing to provide it to anyone who is
interested.

Hope this helps ...


Pete Tanenhaus
Tivoli Storage Solutions Software Development
email: tanenhau AT us.ibm DOT com
tieline: 320.8778, external: 607.754.4213

"Those who refuse to challenge authority are condemned to conform to it"


-------------------------------------------------------------- --
        Visit our Internet site at http://www.reuters.com

Get closer to the financial markets with Reuters Messaging - for more
information and to register, visit http://www.reuters.com/messaging

Any views expressed in this message are those of  the  individual
sender,  except  where  the sender specifically states them to be
the views of Reuters Ltd.

<Prev in Thread] Current Thread [Next in Thread>