ADSM-L

Re: Does the Windows journal engine affect (help) ARCHIVE performance ?

2003-06-08 10:35:23
Subject: Re: Does the Windows journal engine affect (help) ARCHIVE performance ?
From: Andrew Raibeck <storman AT US.IBM DOT COM>
To: ADSM-L AT VM.MARIST DOT EDU
Date: Sun, 8 Jun 2003 07:34:48 -0700
In my prior post, I forgot to mention that another performance advantage
of journal based backup is that the file systems are not actually scanned
for changed files. For very large file systems (hundreds of thousands or
more files), this provides another performance benefit.

Regards,

Andy

Andy Raibeck
IBM Software Group
Tivoli Storage Manager Client Development
Internal Notes e-mail: Andrew Raibeck/Tucson/IBM@IBMUS
Internet e-mail: storman AT us.eyebm DOT com (change eye to i to reply)

The only dumb question is the one that goes unasked.
The command line is your friend.
"Good enough" is the enemy of excellence.




Andrew Raibeck/Tucson/IBM@IBMUS
Sent by: "ADSM: Dist Stor Manager" <ADSM-L AT VM.MARIST DOT EDU>
06/07/2003 16:43
Please respond to "ADSM: Dist Stor Manager"


        To:     ADSM-L AT VM.MARIST DOT EDU
        cc:
        Subject:        Re: Does the Windows journal engine affect (help) 
ARCHIVE performance ?



The journal engine is for incremental backup only.

For normal (non-journaled) incremental backup, the first thing the client
does is query the server for information about the existing active backup
versions for the node. This information is then used to determine whether
the files have changed on the client machine (i.e. it compares the current
file attributes with those from the TSM server). If the node's backup
inventory is very large, it can take a substantial amount of time for the
client to retrieve that information from the server.

The journal engine keeps track of changed files as they are changed. When
the incremental backup starts, it just backs up the files that the journal
has flagged as changed. The server inventory does not need to be queried,
and therein lies the performance advantage.

Because archive and selective backup are not based on whether a file has
changed, there is no server inventory query to begin with, and therefore
the journal engine offers no advantage. The journal engine is not used for
these operations.

Regards,

Andy

Andy Raibeck
IBM Software Group
Tivoli Storage Manager Client Development
Internal Notes e-mail: Andrew Raibeck/Tucson/IBM@IBMUS
Internet e-mail: storman AT us.eyebm DOT com (change eye to i to reply)

The only dumb question is the one that goes unasked.
The command line is your friend.
"Good enough" is the enemy of excellence.




Rick Stratton <rstratton AT INFLOW DOT COM>
Sent by: "ADSM: Dist Stor Manager" <ADSM-L AT VM.MARIST DOT EDU>
06/07/2003 09:57
Please respond to "ADSM: Dist Stor Manager"


        To:     ADSM-L AT VM.MARIST DOT EDU
        cc:
        Subject:        Does the Windows journal engine affect (help)
ARCHIVE performance ?



I have a Windows2000 client who has well over 5 million files that we
ARCHIVE every week for offsite-vaulting purposes. Why we don't use the TSM
primary storage pool --> copy storage pool methodology for this client
would
take a while to explain. At any rate, due to the large # of files
(currently
approx 5 million) and the large size of the backup (approx 175GB), the
ARCHIVE job takes a couple of days to complete.



I thought about moving the client weekly full to an online image backup,
but
with image backup, you do not get file-level granularity during a restore.
I
was thinking about using the journaling engine, but my thoughts are that
it
would probably only help the performance of the daily incrementals, not an
ARCHIVE job. Is this correct, or would enabling the journaling engine help
my ARCHIVE performance? If so, any estimates on how much improvement and
also, are there any pitfalls with this setup? I have just started playing
with the journal engine in the lab (unfortunately, I do not have a machine
with 5 million files totaling 175GB of data in my lab to test with)



I would like to find a fix that allows offsite vaulting of data with a
different retention policy than the onsite data, as well as allowing
file-level granularity. I don't want to use backupsets for a couple of
reasons, basically, they take too long to generate, thrash the database,
tie
up tape resources, and you cannot do per-file restores from a backupset
from
the GUI (or at least the last time I checked you couldn't). Plus, if the
GENERATE BACKUPSET fails due to any DB problems/conflicts, you must start
the GENERATE BACKUPSET job all over again.



I guess one option would be to have one instance (dsm.opt) on the client
with a scheduler service running whose data is assigned to an onsite mgmt
class and then have another instance (dsm1.op) with a separate scheduler
service running whose data is assigned to an offsite mgmt class, but this
seems a little 'clunky'.