ADSM-L

Re: Does the Windows journal engine affect (help) ARCHIVE perform ance ?

2003-06-08 12:32:47
Subject: Re: Does the Windows journal engine affect (help) ARCHIVE perform ance ?
From: Rick Stratton <rstratton AT INFLOW DOT COM>
To: ADSM-L AT VM.MARIST DOT EDU
Date: Sun, 8 Jun 2003 10:33:00 -0600
All,
        Thank you for your postings on whether the journal engine helps with
ARCHIVE performance. It basically validated two of my thoughts:

1: The journal engine does not help ARCHIVE performance
2: The way we are doing backups and offsite vaulting for this customer is
not the best approach



Just to clarify a couple of points that some people had questions on, as
well as help clarify why we do some of the things we are currently doing:

**We are a service provider, not an internal IT shop, so we actually bill
our customers for the backup service we provide to them.

**Before we started using TSM, the service was provided using Veritas and
the billing method was called 'largest-full'. The way that worked was that
every week, customers would get a weekly full and at the end of the month,
the customer was billed for their largest full backup @ $x/GB. This was all
before I was involved, so not really interested in hearing the good/bad/ugly
of this.

**When our company chose to go to TSM, there was a great misunderstanding of
how TSM worked (Always incremental vs. weekly full) so the legal contract
text between our company and our customers was never changed. Because of
this we were forced to do weekly fulls, as ridiculous as that sounds.

**After almost a full year of me preaching the gospel of incrementals vs.
periodic fulls, we have now changed the legal text of our backup offering to
only do incrementals (except for hot DB agents - we still do period fulls).
Unfortunately, this only applies to new customers or existing customers
whose current contract expires and gets renewed using the new contract.

**Another major flaw of our service offering is that the offsite vaulting
contract specifies that each customer's data will be on its own tape when it
goes offsite and that the customer is able to specify a longer retention
policy for the offsite data. In order for me to make this work, I saw that I
only had two choices: BACKUPSETS or ARCHIVE. I chose to go with ARCHIVE due
to the inability (at that time, and probably still now) of the client from
being able to do file-level restores from a BACKUPSET via the GUI. While
this may not sound important to some people, it should be remembered that as
a service provider, we have to make the product as easy to use as possible.
We are not in control of the client machine nor of the level of expertise of
the administrator of that client machine. I also chose to not use the
BACKUPSET option due to limited tape drive availability and the fact that if
there is a problem with a GENERATE BACKUPSE job that requires restarting,
the job starts all over from the beginning.



        Now, more specific to the problem that I posted with having to
ARCHIVE approx. 5 million files every week. The only data on the machine is
data that has been real-time replicated from a couple of physically-remote
machines using replication software (package name escapes me). This keeps
the replication target machine quite busy itself, let alone backing up the
files that get replicated over to it.
        Once replicated, the customer then wanted the data backed up via a
backup system and then sent offsite periodically (all of which seems
overkill to me). The onsite data has a 14 day retention requirement while
the offsite data has an 8 week retention requirement. The way I implemented
it is 1 daily incremental using the TSM BACKUP function and 1 weekly 'FULL'
using the ARCHIVE function. The Backup Copy group has a setting of
14,14,14,14 (ver data exist, ver data deleted, retain extra, retain only)
while the Archive copy group has a setting of 56 (retain version). The
Backup Copy group points to a shared storage pool that always remains onsite
while the Archive Copy group points to a dedicated storage pool that has
tapes sent offsite every week (remember, our offsite vaulting service
currently guarantees a customers offsite data on their own tape)
        Before we migrated this customer off of the 'largest full' billing
method, I was forced to implement weekly fulls due to the legal reasons
outline above. We now have the customer paying for the space that their data
uses on the TSM server/library (we call it 'occupancy-based' billing), and I
would like to move the customer to a more standard primary pool /copy pool
setup, perform only daily incrementals and send the copy pool tapes offsite
but here are my following concerns/issues with that:
        *** With a primary pool /copy pool setup, the data in the copy pool
is maintained/expired using the same policy settings as the primary. This
means that if the customer wants their offsite data kept for 8 weeks, the
onsite data would have to be kept for 8 weeks. Changing their retention
policy is only going to increase the amount of space the customer 'occupies'
on our TSM server/library, which would mean a price increase to them, as
they are only currently paying for occupancy based on a 2 week retention
policy. I think I have a couple of ways of dealing with that, but unsure
right now.
        *** In order to implement a primary/copy storage pool setup, I will
need to get their data out of the shared onsite storage pool (not all of our
customers buy offsite vaulting, so I can't just copy that storage pool).
Since my TSM server is on 4.2.x, I do not have access to the MOVE NODEDATA
command - I will be upgrading the server to 5.x soon, so that issue will be
fixed

        I agree with several of the comments that have been made such as:

*Why does all of the data (all 5 million files) have to be backed up and/or
sent offsite?
*The method implemented (weekly full) is using a 1970s mentality
*Sounds like a non-technical person defined the requirements
*...start from scratch and fully define the requirements, and only then
consider solutions.  Don't build a house on sand

        These are all valid points, and having feedback like this helps me
further my case for re-engaging with the customer and redesigning a backup
solution around their true requirements - the backup team was not brought
into the picture until well after a 'solution' had been 'designed' and sold
to the customer, so it is now my uphill battle to go back and try and force
the redesign.

        Again, thanks for the feedback.

<Prev in Thread] Current Thread [Next in Thread>
  • Re: Does the Windows journal engine affect (help) ARCHIVE perform ance ?, Rick Stratton <=