ADSM-L

[no subject]

2003-09-24 09:26:53
From: Pete Tanenhaus <tanenhau AT US.IBM DOT COM>
To: ADSM-L AT VM.MARIST DOT EDU
Date: Wed, 24 Sep 2003 09:26:26 -0400
I'll try to answer/address your questions as best I can.

>>> My TSM client is a file server, on its first full incremental backup
>>> (with journaling turned on) stowed away nearly 9 million files on the
>>> TSM server - a perfect candidate for the TSM journaling engine I
>>> thought. However, the tsmjbbd.exe process bombed just before the end>>

>>> with a 'DB Access Critical Thread Return code 215' type error,
although
>>> the backup continued.


The error you are seeing in the journal daemon is probably caused because
the journal db
has exceeded the supported maximum of 2 gig.

If you look in your journal errorlog (jbberror.log) you'll probably see
the following message:

 Error updating the journal  for fs C:', dbUpdEntry() rc = 27

There is a bug the journal service which causes the process to shutdown
when this error occurs
and apar IC37040 has been opened and the fix will be included in an
upcoming fixtest.

That having been said, the real problem to look at is why the journal grew
so large.

Keep in mind that each journal entry represents a the most recent change
for a file/directory,
and that journal entries are unique, meaning the there can only be one
entry for each object
on the file system.

Are you running virus scan software and if so what type and version ?
(example: Norton Anti-Virus Corporate Edition Version 8.00)

Some virus protection software touches every file processed during virus
scan processing, and this in turn floods the journal with
change notifications and grows the journal.

There are circumventions from at least one of the virus protection vendors
(Symantec) for this problem.


>>>Now, 9 million files, at an average of maybe 500K per TSM database
entry
>>>equals roughly 4.5GB. Was TSM trying to send the *whole* 4.5GB
inventory
>>>for this node to the dsmc.exe process on the client? Needless to say,
at
>>>2GB (I believe the limit that Win2K places on a single process) the TSM
>>>client had had enough and ended with an 'ANS1030E System ran out of
>>>memory. Process ended'.

>>> So, what shall I do - is MEMORYEFFICIENTBACKUP YES my only get out of
>>>jail card here, and exactly what does this do differently? Is my
>>>understanding above what is actually happening?

Keep in mind that a full progressive incremental backup must be done (one
that
results in the Last Backup Complete Data being updated on the server)
before
backups will be journal based.

Once the intial backup has been completed and the journal is validated the
next
backup should be journal based.

So you may want to use MEMORYEFFICIENTBACKUP for the initial backup at
least.


Journal Based Backup should use much less memory since the only objects
inspected
are those obtained from the journal.



>>> Now, 9 million files, at an average of maybe 500K per TSM database
entry
>>> equals roughly 4.5GB. Was TSM trying to send the *whole* 4.5GB
inventory
>>> for this node to the dsmc.exe process on the client? Needless to say,
at
>>> 2GB (I believe the limit that Win2K places on a single process) the
TSM
>>> client had had enough and ended with an 'ANS1030E System ran out of
>>> memory. Process ended'.

>>> So, what shall I do - is MEMORYEFFICIENTBACKUP YES my only get out of
>>> jail card here, and exactly what does this do differently? Is my
>>> understanding above what is actually happening?

>>> I'd be most grateful to hear of anyone else's positive or negative
>>> experiences of using the Journaling Engine, as it seems just so
*ideal*
>>> for some of our file servers, yet my experiences so far suggest it
might
>>> not be as easy and robust as I would ideally like it to be (i.e.
>>> cancelled backups forcing restart of journal, process bombing out
midway
>>> through backup etc.), especially as a full or normal incremental
backup
>>> can run into days to complete..

Aborting a backup  doesn't cause the journal process to be restarted or
the
journal to be invalidated, but certain other circumstances can.

The most likely cause of this is when the file system is flooded with a
large amount of change activity which either fills up the journal or
can't be processed fast enough by the journal file system monitor.

The process should never shutdown when these problems occur (again, there
is an apar opened against it shutting down when the journal grows larger
than 2 gig), but the journal has to be invalidated which means that
backup's
can't be journal based until other full incremental is performed.

Another thing to keep in mind is that journals are always invalidated when
the journal daemon process is recycled unless the PreserveDbOnExit flag is
specified.

All this having been said, Journal Based Backup is only a viable solution
for environments in which the amount of file system activity is light to
moderate, and that the activty is somewhat well distributed.

Running applications which touch every file (or a very large percentage
of files) on the file system, or which flood the file system with changes
in a very short period of time (such as copying a very large directory
tree) will make journaling unusable.

I have developed a file system monitoring/profiling tools which can useful
in determining if journaling is viable for a particular file system, and
I am more than willing to provide it to anyone who is interested.

Hope this helps ...


Pete Tanenhaus
Tivoli Storage Solutions Software Development
email: tanenhau AT us.ibm DOT com
tieline: 320.8778, external: 607.754.4213

"Those who refuse to challenge authority are condemned to conform to it"

<Prev in Thread] Current Thread [Next in Thread>
  • [no subject], Pete Tanenhaus <=