Re: [Networker] Problem with ending a savegroup
2005-08-11 11:11:12
I have seen this problem occur on Linux systems, too. In my experience,
there are two reasons: A. The affected client has a zillion files,
causing the backup software to crawl through a horrendous number of
inodes to determine what to backup (e.g. on an incremental) or maybe
there's just a ton of stuff to back up anyway, protracting the whole
process or B. some kind of DNS problem. We had this happen once wherein
all clients in the group would finish their incrementals in like 30
minutes for 30+ machines, but one client would just hang for an hour or
two before finally doing anything. We finally tracked it down to a
bogus entry in the client's /etc/hosts file. It would complete its
backups, but it would take forever before it started. After that fix,
problem solved.
I think what we need is a feature in the product that would somehow
allow the remaining running or pending savesets to continue but also
allow the group to restart. Maybe those savesets in limbo could somehow
be transferred to a temporary group so they could continue to run and
then run again later, but he main group would not be affected?
The thing about killing off a running group so it can restart is that
you might not want to do that if there's a full still running and it's
near completion. I think I'd prefer to do it manually so I can make that
determination on a case by case basis.
However, that provides little succor when your away on vacation.
George
John Stoffel wrote:
Conrad> All that would do is prevent Windows boxes from hanging Unix
Conrad> systems. The Windows boxes would still hang each other, and
Conrad> the occasional Unix failure would still hang the group.
Sure, but at least it wouldn't hang all the systems. Some improvement
is better than none.
Conrad> The problem is very inconsistent. A client will cause a
Conrad> savegroup to hang one day that hadn't done that the previous
Conrad> day and won't do it the next. When a client hangs a savegroup
Conrad> consistently, we can track it down. And when we can't we do
Conrad> exactly what you suggest, with a special savegroup.
Conrad> Yes, patching and client reboots do sometimes help the
Conrad> situation. There doesn't appear to be any correlation with
Conrad> filesystem size or number of files. In most cases there is no
Conrad> data passing across the link.
Wish I had better help for you in this situation, sorry I can't do
more than the obvious.
John
--
Note: To sign off this list, send a "signoff networker" command via email
to listserv AT listserv.temple DOT edu or visit the list's Web site at
http://listserv.temple.edu/archives/networker.html where you can
also view and post messages to the list. Questions regarding this list
should be sent to stan AT temple DOT edu
=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=
--
Note: To sign off this list, send a "signoff networker" command via email
to listserv AT listserv.temple DOT edu or visit the list's Web site at
http://listserv.temple.edu/archives/networker.html where you can
also view and post messages to the list. Questions regarding this list
should be sent to stan AT temple DOT edu
=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=
|
<Prev in Thread] |
Current Thread |
[Next in Thread> |
- Re: [Networker] Problem with ending a savegroup, (continued)
|
|
|