ADSM-L

Re: ADSM versus Arcserve and Backup Exec

1998-09-15 11:56:16
Subject: Re: ADSM versus Arcserve and Backup Exec
From: David Hendrix <dmhendri AT FEDEX DOT COM>
Date: Tue, 15 Sep 1998 09:56:16 -0600
We had a situation with one puny C20 with 256MB of memory where they had
architected the application to write images files (50K to 200K each)
into one single directory.  Unfortunately, they tracked around 1.5
million files in a 90 day period.

V2 client choked and so did the V3 client (heck they couldn't even ls
the directory!).  They have rearchitected their system, and with the
installation of V3 on a sun 5K, we allow them to do incrementals on
directories (note that V2 only allowed incr on filesystems).  Since each
directory holds 50K to 300K worth of files, they now can finish their
backups in a very acceptable time frame (it takes about 3-9 minutes to
just scan the files).  We just did a restore of the whole system to a
test box and it took a while (approx 10GB per hour), with most of the
overhead in being on assembling the restore list for each directory.  We
restored BY DIRECTORY to limit the scope of each invocation.  Total size
of all directories was 65GB.

The client is talking to a V2 server over a dedicated FDDI backbone, and
once we upgrade the server to V3, I expect the overhead to be reduced
due to no query restore.

V2 could not even begin to handle this application in any form.  V3 made
it possible, especially with the incremental on directory enhancement.
Large systems like these we discovered needed to be broken down as much
as possible.

On the other hand, we also have other Sun clients with approx 800K files
spread through multiple dirs.  We simply perform incrementals on the
whole system.  Again, prior to the V3 client, it was not possible to
even finish these kinds of backups let alone attempt a restore.

So, our experience has been: move to V3 as fast as possible (at least
the client), break down the problem into managable chunks, and then test
before you need to actually restore to make sure you can meet your
business requirement (our service level and resource requirements for a
project are based on the business requirement for restoral of the
system, not the backup).

Hope this was informative...

David Hendrix
dmhendri AT fedex DOT com


Christo Heuer wrote:
>
> Hi,
>
> I can agree with Dan regarding poor performance when it gets to
> restoring a great number of small files. It does not matter what platform
> you are on, the performance IS an issue!
> To give you an idea - We had  a standalone unix server that we backed
> up (4M/bit T/ring card, not very powerfull as far as processing etc. goes).
> We backed up this box - 15Gig of user data consisting of about 300K files,
> between 1Kb and 1.7Meg each.
> The backup took about 16Hours - this is WITH other backups etc. running to
> the ADSM server. In other words the ADSM server was not dedicated to this
> task.
> This is acceptable performance taking the above into consideration, BUT,
> (Why is there always a BUT?), the restore was a different story.
>
> This unix box was moving to a node on our SP/2 - the data was already
> on this node but a bit outdated as to speak.
> So the only option we had was to do one of two things:
> 1) Wipe all the older data on the SP/2 node and restore the standalone
> unix box onto the SP/2 node
> or
> 2) Use ADSM's intelligence and restore the box using the -ifn (If newer then
> replace).
> BIG mistake!
> The restore ran for two days non-stop - where we eventually cancelled it and
> restored
> via the GUI braking the directory structure down a bit more.
> This helped in the sense that we had less files in the restore stream and we
> could
> also make use of parrallel sessions.
> At the end it took us about the whole week-end sitting monitoring the
> restore process
> and starting new processes. (Not a pretty solution).
> Now - Where was the bottleneck?
> Who knows - Bad design I think (Sorry IBM)
> The ADSM server: In this case is a powerfull MVS mainframe - doing a few
> I/O's
> and using very little CPU cycles during all of the restore process.
>
> The network: Escon attached to SP/2 node
>
> The Adsm client: (Powerfull node - the reason we had to move this
> box to the node was because of performance on the old box)
>
> We monitored the network/cpu usage on the SP/2 side etc. but
> nowhere could we find any pointers as to what the problem
> with the bad performance was.
>
> During the whole restore process the first time - after running for
> about 24 hours only 15Meg of user data was physically transferred.
> The rest of the time seems to be spent comparing files between
> MVS and the SP/2 - not that this should be taking so long?
> Has anyone been able to get better performance from a restore
> when there is such a number of files in the picture?
> Does not matter what platform!
>
> Cheers
> Christo Heuer
>