Networker

[Networker] Clones hang up on remote storage nodes

2002-08-27 15:19:22
Subject: [Networker] Clones hang up on remote storage nodes
From: "Martin J. Dellwo" <dellwo AT 3DP DOT COM>
To: NETWORKER AT LISTMAIL.TEMPLE DOT EDU
Date: Tue, 27 Aug 2002 15:22:19 -0400
Product: Legato Networker
server: SGI running IRIX 6.5.14f
         Networker 6.1.2.Build.340
storage nodes: Compaq servers, Windows Server 2000 SP2
        Networker 6.2


I recently established two remote storage nodes which handle the backup
at two separate buildings, both buildings are connected to the main
facility with the main Legato Networker server via private T1s.  Clients
at the respective sites are set to use the local storage node for
backups, and clones are automatic and also set to use the local storage
node.  The exception is a group set up to do three Exchange Servers, one
at each location.  Since the client is set up with two instances, one
for regular file backups and one for Exchange database backups, I have
to have two groups, one per instance.

On most days, the backups seem to be fine and proceed without a hitch;
during the week, these jobs are always incremental in level.  The
problem I describe below occasionally, but rarely, occurs during the
week.  However, typically on the weekend when I do full (Exchange
database) or Level 1 (regular file) backups, the backups complete but
the clones do not start.  I see the 'tape' icon on the group in the GUI.
  But, there are no messages in the 'Pending' or 'Clone Status' tabs in
the GUI, no indication whatsoever that Legato is waiting for tapes.
Tapes are available for cloning, indeed usually they are already in the
drives--it sometimes appears that cloning may have started, but not
finished (I have not verified this).  There are no unusual log messages
on either the remote storage nodes or the main server.  Most often one
of the stopped groups involves the Exchange database backups (the only
group involving clients at disparate locations), but this is not always
true.  The jobs that are local to the main server have never hung up in
this way.  I used to do backups only through the main server across the
T1 WAN links, and never saw this behavior either.

Usually I have to terminate the group, and then either clone by hand or
re-run a job by hand to try again (re-running usually runs successfully
with no hitch).  I changed the parallelization on all the autochangers
thinking this might help.  The remote autochangers are both Overland
PowerLoaders with 2 SDLT drives.  The Autochanger 'max parallelism'
parameter was initially set to '1', is now set to '2' but the problem
still occurs.  The main server has an Quantum ATL M1500 autochanger,
also with 2 SDLT drives.

The first time this happened, I noticed that when I unloaded the drives
on the server's autochanger, went away and came back later, the drives
had reloaded.  Something about doing this woke up a stalled job for the
Exchange group backup and it finished cloning, which I noticed because
the tape drives reloaded; two other groups were also stopped, and
nothing could make them wake back up.  I have not seen this 'wakeup'
occur again even though I have tried it.  I have tried resetting the
jukeboxes, to no avail, and have tried 'Restarting' groups (this doesn't
seem to restart anything for me...).

As a sidenote, when I reset the autochangers, they unload their drives;
the GUI for autochanger operations reflect this, but the main GUI
Monitor tab does not.  Generally I've ended up by stopping the entire
server process and restarting it.  It occurs to me now, I've never
looked to see if there is a remote storage node process I could stop and
restart--I suspect that that would just kill the group.
--
Martin J. Dellwo   (610) 458-5264 x6512   dellwo AT 3dp DOT com
Systems Administrator, 3-Dimensional Pharmaceuticals, Inc.
http://www.3dp.com/

--
Note: To sign off this list, send a "signoff" command via email
to listserv AT listmail.temple DOT edu or visit the list's Web site at
http://listmail.temple.edu/archives/networker.html where you can
also view and post messages to the list.
=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=

<Prev in Thread] Current Thread [Next in Thread>