ADSM-L

Re: Linux client pathology

2003-12-10 10:58:55
Subject: Re: Linux client pathology
From: Lloyd Dieter <dieter AT SNRGY DOT COM>
To: ADSM-L AT VM.MARIST DOT EDU
Date: Wed, 10 Dec 2003 10:58:41 -0500
5 Million files on ext2?  Hope you never have to fsck that monster.  You
may want to talk to the admin about using reiserfs....

-Lloyd


On Tue, 9 Dec 2003 13:49:39 -0500
Thomas Denier <Thomas.Denier AT MAIL.TJU DOT EDU> wrote thusly:

> > On Tue, 9 Dec 2003 11:29:59 -0500
> > Thomas Denier <Thomas.Denier AT MAIL.TJU DOT EDU> wrote:
> >
> > > We have been backing up a Linux client with about five million files
> > > on it. The system administrator has told me that he recently updated
> > > the include/exclude files to exclude a portion of /var that accounts
> > > for
> >
> > 5 million files in /var??? amazing :-)
>
> The messages for an electronic mail system with several thousand users
> reside in a subdirectory of /var. The messages undoubtedly should have
> gone into a separate file system, even if the file system's mount point
> were in /var.
>
> > > most of the file. I have not so far been able to verify this be
> > > inspecting the configuration files myself. Ever since the change, we
> > > have been seeing strange and very troublesome behavior. The client
> >
> > did the admin use exclude or exclude.dir, the latter may save you a
> > lot of directory traversal.
>
> I have restored dsm.opt, dsm.sys, and the include/exclude file to a
> system I have access to. The change date on the include/exclude file
> lines up with the Linux administrator's statements. He used an
> exclude.dir. The other two files had not changed in almost six
> months. All three files look normal for our site, with the exception
> of two anomalies that have no obvious connection to the current
> problem. One anomaly is the presence of a nodename statement in dsm.sys.
> The node name specified is the same as the host name. The second
> anomaly is a schedmode of 'polling' rather than 'prompted'. This is
> needed because there is a firewall between the client and the TSM
> server.
>
> > > will send about eight gigabytes of data to the server over the
> > > course of about twelve hours. At that point data transfer will slow
> > > to a crawl and all other sessions and processes will perform poorly
> > > until the session is cancelled. We have been through three
> > > iterations so far, transferring about 23 gigabytes from client to
> > > server (I stopped the last one at about 7 gigabytes). About one
> > > gigabyte out of the 23 can be accounted for by data written to
> > > storage pools. I usually don't get the various session statistics
> > > messages. In the one case where I did, they reported 0 objects
> > > deleted and only 55 objects expired.
> >
> > Depending on the filesystem used, Linux can be very slow in listing
> > large dirs. The default listing requires in memory sorting of the dir
> > (TSM does this as well) and with very large dirs, with lots of files,
> > you may suffer from horrific performance on the client just from
> > listing. Most likely, a ls in that one dir will either crash or hang
> > forever....
>
> The file system is EXT2. As far as I know, no one directory contains an
> outrageous number of files. There is one directory near the top of the
> hierarchy with a subdirectory for each of the several thousand users.
> Past experience helping the administrator set up restores shows tens
> to hundreds of files in individual directories further down the
> hierarchy.
>
> > > I would expect a substantial amount of data movement from client to
> > > server after excluding large numbers of files as the client told the
> > > server to expire the files. However, this would presumably be
> > > comparable to the amount of file status information sent from server
> > > to client each night before the change. This was about a gigabyte.
> >
> > I would expect that your server has never had a decent back-up of that
> > one dir, that TSM is still listing that dir since the admin didn't use
> > exclude.dir and that you have very files to expire.
>
> As far as we can tell, backups of /var had been finishing successfully
> on most days (albeit later in the morning than we would have liked).
>
> > > Our entire TSM database is only about 20 gigabytes. The client code
> > > is at the 5.1.6.0 level. The Linux kernel is at the 2.4.20-24 level.
> > > The TSM server is at the 5.1.7.2 level and runs under OS/390.
> >
> > 5 milion object would eat about 1.5 GB (give or take half a GB)
> > expiration of these files must be noticable on your server.
>
> It accounts for about half of the time we spend on expiration. Most
> of the other half is due to another, slightly smaller mail server
> run by a different part of our organization.
>


--
-----------------------------------------------------------------
    Lloyd Dieter        -       Senior Technology Consultant
                     Registered Linux User 285528
   Synergy, Inc.   http://www.synergyinc.cc   ldieter AT snrgy DOT com
             Main:585-389-1260    fax:585-389-1267
-----------------------------------------------------------------

<Prev in Thread] Current Thread [Next in Thread>