BackupPC-users

Re: [BackupPC-users] Incremental dumps hanging with 'Can't get rsync digests' & 'Can't call method "isCached"'

2008-10-27 09:00:45
Subject: Re: [BackupPC-users] Incremental dumps hanging with 'Can't get rsync digests' & 'Can't call method "isCached"'
From: Holger Parplies <wbppc AT parplies DOT de>
To: "Jeffrey J. Kosowsky" <backuppc AT kosowsky DOT org>
Date: Mon, 27 Oct 2008 13:57:35 +0100
Hi,

Jeffrey J. Kosowsky wrote on 2008-10-27 01:18:24 -0400 [Re: [BackupPC-users] 
Incremental dumps hanging with 'Can't get rsync digests' & 'Can't call method 
"isCached"']:
> I have been playing with your script and it seems to spit out about
> 80% of the files/directories (24908/28834) that are backed up in this 
> incremental backup.
> 
> Most of the 24000 files seem to have 5 or 6 digit perms like 40755 or
> 100660. Not sure if this is a problem and if so what to do about it.

sorry about that. I experimented on tar backups, which apparently don't
(redundantly) store the file type bits in the "mode" entry. rsync obviously
does. That's ok (i.e. expected, once you read the code). It's due to the way
file type information is transfered in the respective protocols/data streams.
I should check the validity of the file type bits in the mode entry separately.
I'll add that soon. For the moment, I suggest you change line 134 to

        or not exists $permmap {$list [$i + 2] & 07777}

(just adding the " & 07777").

> I seem to get this on the good backups too. In fact, at first glance
> it's hard to tell what if anything is the difference between a 'good'
> and 'bad' backup. For most directories, every file and subdirectory are listed
> even though it all seems fine. 

Yes, with that much bogus output it's very much useless. Sorry again.

> I tried doing -X "40755,100644" etc. to exclude these perms but it
> didn't seem to have any effect

You probably mean "040755,0100644", but I'll change my eval() to oct(),
because I doubt anyone uses non-octal numeric modes anyway. eval() was not
a good idea to begin with. The patch should fix these two, but the remark
applies to any other values you might want to exclude until I change it.

> Also, the debug output for the Users list looks good except for the
> last 2 elements: 
>        0755 (which looks like a perm)

Yes, that is strange, especially the leading zero. You didn't specify a
"-u 0755" option, did you? :)

>        65534 (which seems like the max number for a uid)
> Similarly the last element of the group list is: 65534.

That's nobody and nogroup - don't they appear in your /etc/passwd and
/etc/group?

> Finally, you may want to add the following debug line to your script for 
> completeness:
> 
> print "Perms: ", (join ',', sort {$a <=> $b} keys %permmap), "\n"
>   if $opts {D};

Right. Noted. Though I'll sort $b <=> $a ;-).

Jeffrey J. Kosowsky wrote on 2008-10-27 04:00:27 -0400 [Re: [BackupPC-users] 
Incremental dumps hanging with 'Can't get rsync digests' & 'Can't call method 
"isCached"']:
> Interesting -- I ended up having to reboot (which of course required a
> restart of the backuppc service) and the problem went away.
> 
> This is the second time this has happened to me.
> I suspect (in a fuzzy type of way) that somehow this may have been
> caused by my rebooting the nfs server (which is mounted on
> /var/lib/BackupPC) without doing something like restarting the
> backuppc service - the result was that for some time there may have
> been a stale nfs link hanging around and it is possible that this
> occurred during the middle of a backup.

Normally, rebooting the NFS server should *not* lead to stale NFS mounts. In
my experience that happens when device numbers (on the NFS server) change
(though I vaguely remember seeing an unexpected instance of that myself
lately). Try to fix it and you will save yourself a lot of headaches (like
adding a hook to remount the FS, but that's another thread).

This probably means you shouldn't backup to an NFS mounted pool (which you
probably shouldn't do for performance reasons anyway).

What mount options are you using (esp. hard/soft, intr/nointr, tcp/udp)?

> I also may have killed the BackupPC_dump process using 'kill -9' when I was
> unable to kill it from the web interface.

SIGKILL is a bad habit to get into. You should try SIGINT first (though it
probably won't work in the "stale NFS file handle" case). If you can't access
the pool, killing BackupPC_dump is unlikely to do any additional harm :).

> Still... it would be nice to get some type of email or other warning
> when a backup freezes up because conceivably one could be unaware of
> this issue for days...

The BackupPC daemon could report backups running for an "unusually long" time
(for a configurable value of "unusually long") by e-mail. I would strongly
argue against aborting them (like $Conf{ClientTimeout} does), because the
daemon has even less control over what is actually happening on the network
level than BackupPC_dump, but optionally informing the admin seems reasonable.
It should be possible to turn these warnings off, though.

> I will keep your troubleshooting patch in mind and will use it next
> time I see this problem.

You will need to apply it before the fact ... but if it's just "stale NFS file
handle" [how about including $! in the log message - " (err=$err [$!], ..."?],
it's a local configuration problem rather than something BackupPC could
sensibly handle. Aborting on (detectable!) fatal errors is one thing, but
providing logic to call a hook and retry the failing operation on every disk
access is clearly not a good idea. Neither is aborting an otherwise good
backup because one attrib file happens to have gone missing.


To sum it up, your problem appears to be NFS server related ("stale NFS file
handle"), not due to corrupted attrib files (though a crashing NFS server
could lead to corruption of an attrib file, I guess). Thank you for the
feedback on my script anyway.

Regards,
Holger

-------------------------------------------------------------------------
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK & win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100&url=/
_______________________________________________
BackupPC-users mailing list
BackupPC-users AT lists.sourceforge DOT net
List:    https://lists.sourceforge.net/lists/listinfo/backuppc-users
Wiki:    http://backuppc.wiki.sourceforge.net
Project: http://backuppc.sourceforge.net/

<Prev in Thread] Current Thread [Next in Thread>