Re: strange dump message

On Wed, Feb 05, 2003 at 09:01:10AM -0500, Eric Sproul wrote:
> All,
> On the last few runs, I've seen some strangeness with one DLE that I
> can't figure out.  I'm using the canned 'comp-root' dumptype:
> 
> define dumptype comp-root {
>     comment "Root partitions with compression"
>     options compress-fast
>     priority low
> }
> 
> The DLE is the root filesystem (/dev/sda1) on a Debian 3.0 box.  AMANDA
> has been trying to do both level-0 and level-1 dumps of this filesystem
> since 1/25 when the last full backup was successful.  Both client and
> server are running 2.4.3.  Below is the failure message from this
> morning's run.
> 
> 
> /-- dhcp01     sda1 lev 1 STRANGE
> sendbackup: start [dhcp01:sda1 level 1]
> sendbackup: info BACKUP=/sbin/dump
> sendbackup: info RECOVER_CMD=/bin/gzip -dc |/sbin/restore -f... -
> sendbackup: info COMPRESS_SUFFIX=.gz
> sendbackup: info end
> |   DUMP: Date of this level 1 dump: Wed Feb  5 04:06:47 2003
> |   DUMP: Date of last level 0 dump: Sat Jan 25 04:00:08 2003
> |   DUMP: Dumping /dev/sda1 (/) to standard output
> |   DUMP: Added inode 7 to exclude list (resize inode)
> |   DUMP: Label: none
> |   DUMP: mapping (Pass I) [regular files]
> |   DUMP: mapping (Pass II) [directories]
> |   DUMP: estimated 131 tape blocks.
> |   DUMP: Volume 1 started with block 1 at: Wed Feb  5 04:06:47 2003
> |   DUMP: dumping (Pass III) [directories]
> ? /dev/sda1: EXT2 directory corrupted while converting directory #41021
> ? 
> ?   DUMP: error reading command pipe: Connection reset by peer
> ?   DUMP: error reading command pipe: Connection reset by peer
> ??error [/sbin/dump returned 3]? dumper: strange [missing size line from
> sendbackup]
> ? dumper: strange [missing end line from sendbackup]
> \--------
> 
> I tried looking for an explanation for the "EXT2 directory corrupted"
> dump message, but all I could find was basically "we don't know what it
> is, just fsck it and move on".  That isn't good enough for me.  ;) 
> Whatever it is, it doesn't seem to be causing grief to the server's
> normal operation, but it is preventing me from getting a backup.

Normal operation is not surprising, that area of the fs may just not
be getting used by anything.  I recall an ancient system where I could
not add things to the drives "bad block table", not something fsck deals
with.  I knew several specific blocks were bad and caused problems each
time the system tried to allocate them to a file.  I created a dummy file
"DoNotRemove" in a dummy directory "DoNotTouch".  With a disk editor, fsdb,
I "allocated" the specific block to the dummy file.  An fsck cleaned up
the mess I made with the free block list.  The fs ran fine for years then.

On your specific problem, I'm guessing the "#41021" represents an inode
number (41021).  It is also the format of the names fsck uses when it
recovers an unknown named file into the "lost+found" directory on my
Solaris systems.  Does linux/ext2 use the "lost+found" system.  There
would be a directory by that name in the root of the fs.  It would
contain recoverd files and dirs all named with an octothorpe (an #)
followed by the inode number (the name was unknown).

What I'm getting at, is if my guess is correct, 41021 is an inode num,
maybe you can find the offending file/directory and investigate.  Two
commands to assist: the ls command has a "-i" option that prints the
inode number of the entries.  Alternatively, if you don't have an idea
where the offender is, find can locate by inode number:

   find <path_to_root_of_fs> -xdev -inum 41021

The "-xdev" keeps the search to the one fs since each fs will have its
own inode 41021.

The above syntax is for system V type commands.  I presume linux with
its gnu counterparts has the same/similar syntax.

-- 
Jon H. LaBadie                  jon AT jgcomp DOT com
 JG Computing
 4455 Province Line Road        (609) 252-0159
 Princeton, NJ  08540-4322      (609) 683-7220 (fax)