Networker

[Networker] Lost connection problem

2004-08-03 04:50:18
Subject: [Networker] Lost connection problem
From: Tim Verbois <Tim.Verbois AT ET.VLAANDEREN DOT BE>
To: NETWORKER AT LISTMAIL.TEMPLE DOT EDU
Date: Tue, 3 Aug 2004 10:41:04 +0200
Hello,

we have a serious problem with our legato 5.5 version (yes it is old,
but an update is out of the question for the management).  We backup 6
terrabyte once every 14 days and that goes wrong.  Some data about the
environment:

Our robot is a Scalar 1000 with 6 devices: 3590 tape drives (10 GB
native, but much more compressed).  We use about 200 tapes every full
backup.
We have about 160 clients (windows and unix) and licenses for 180
approximately.

Every day, we make an incremental backup and that goes fine, no problem
with that, but when we make a full backup it goes wrong big time.  Last
weekend 70% of all servers failed with a "lost connection" error.  This
is the case for the last 2 months.  There is no support on the software
anymore (to old).

What have we done so far:

- We had ierrors on the fiber channel that was connected to our backup
lan switch.  The card has been replaced, the card has been moved to
another slot (and another bus).  But the problem remains.

- We played with the settings in legato, but nothing works.  I changed
the max sessions on the drives to 30 (instead of 8), but nothing.


This is an example of a failed backup:

NetWorker Savegroup: (alert) Ferraris.WE completed, 1 client(s)
(li01741-bu.lin.vlaanderen.be Failed)
Start time:   Mon 02 Aug 2004 08:09:59 AM MET DST
End time:     Tue 03 Aug 2004 09:19:45 AM MET DST

--- Unsuccessful Save Sets ---

* li01741-bu.lin.vlaanderen.be:/export reading log file failed
* li01741-bu.lin.vlaanderen.be:/export2 1 retry attempted
* li01741-bu.lin.vlaanderen.be:/export2 save: lost connection to server,
exiting
* li01741-bu.lin.vlaanderen.be:/export3 1 retry attempted
* li01741-bu.lin.vlaanderen.be:/export3 save: lost connection to server,
exiting

--- Successful Save Sets ---

  li01741-bu.lin.vlaanderen.be: /   level=full,    308 MB 00:06:20
25499 files
  li01741-bu.lin.vlaanderen.be: /var level=full,  1265 MB 00:07:23
4722 files
  li01741-bu.lin.vlaanderen.be: /opt level=full,   108 MB 00:00:45
3671 files
  li01741-bu.lin.vlaanderen.be: /export/infx_backup level=full, 1159 MB
00:11:05     13 files
  buux1: /nsr/index/li01741-bu.lin.vlaanderen.be level=full, 1015 MB
00:08:39      6 files

and another one:

NetWorker Savegroup: (alert) Mailcluster.WE completed, 1 client(s)
(s100012-bu.vlaanderen.be Failed)
Start time:   Mon 02 Aug 2004 08:12:31 AM MET DST
End time:     Mon 02 Aug 2004 03:50:04 PM MET DST

--- Unsuccessful Save Sets ---

* s100012-bu.vlaanderen.be:/global/apps 1 retry attempted
* s100012-bu.vlaanderen.be:/global/apps save: Warning -
`/global/apps/dir52/slapd-mvgstore/db/__db.004' changed during save
* s100012-bu.vlaanderen.be:/global/apps lost connection to server,
exiting
* s100012-bu.vlaanderen.be:/global/apps/part1 1 retry attempted
* s100012-bu.vlaanderen.be:/global/apps/part1 lost connection to server,
exiting
* s100012-bu.vlaanderen.be:/global/apps/part2 1 retry attempted
* s100012-bu.vlaanderen.be:/global/apps/part2 lost connection to server,
exiting
* s100012-bu.vlaanderen.be:/global/apps/part3 1 retry attempted
* s100012-bu.vlaanderen.be:/global/apps/part3 lost connection to server,
exiting
* s100012-bu.vlaanderen.be:/global/apps/part4 1 retry attempted
* s100012-bu.vlaanderen.be:/global/apps/part4 lost connection to server,
exiting

--- Successful Save Sets ---

  s100012-bu.vlaanderen.be: /       level=full,   2301 MB 00:13:43
87313 files
  s100012-bu.vlaanderen.be: /var    level=full,   3574 MB 00:17:45
25115 files
* s100012-bu.vlaanderen.be:/em Warning: unsynchronized client clock
detected
  s100012-bu.vlaanderen.be: /em     level=full,     62 MB 00:00:48
1160 files
  s100012-bu.vlaanderen.be: /global/.devices/node@1 level=full, 3741 KB
00:01:11  13417 files
  s100012-bu.vlaanderen.be: /global/apps/part5 level=full, 3 KB
00:02:16      5 files
  s100012-bu.vlaanderen.be: /global/apps/part6 level=full, 3 KB
00:01:45      5 files
  buux1: /nsr/index/s100012-bu.vlaanderen.be level=full, 4389 MB
01:01:52      6 files

Buux1 is our backup server.

Server li01741 (file server) was started manualy yesterday morning after
a failed backup in the weekend.  He backupped 91 chunks of 2000 MB.  And
than he stopped...

Can anyone give me a hint?

Thanx,

Tim

--
Note: To sign off this list, send a "signoff networker" command via email
to listserv AT listmail.temple DOT edu or visit the list's Web site at
http://listmail.temple.edu/archives/networker.html where you can
also view and post messages to the list.
=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=

<Prev in Thread] Current Thread [Next in Thread>