Networker

[Networker] Networker 7 - Adventures with AFD and staging

2003-06-02 04:47:20
Subject: [Networker] Networker 7 - Adventures with AFD and staging
From: Bokkelkamp Ernst <ernst.bokkelkamp AT SIEMENS DOT COM>
To: NETWORKER AT LISTMAIL.TEMPLE DOT EDU
Date: Mon, 2 Jun 2003 10:46:30 +0200
Over the last few days I have been stressing Networker 7 staging in
combination with AFD and tape extensively under Windows 2000 and I
experienced some nasty problems that will need some attention.

The first minor bug I noticed is that after staging the numbers for space
released for the individual savesets is formatted incorrectly. Any saveset
in KB/MB size is reported with KB/MB in the daemon.log, but savesets in GB
size are reported in bytes. It looks that the either the developers never
tested with GB size savesets or never looked at the messages reported.

The next problem is a bit nasty. I ran into a problem with the volume
consisting out of 3 spanned disks I was using for the AFD. The last disk
started giving I/O errors, fortunately not in the space used by the AFD, and
I had to recreate the volume. I decided to replace the disk and use staging
to migrate the savesets first to tape and then from tape back to the AFD.

The first action, staging from AFD to tape (GUI), completed without
problems. During the staging savegroups caused new saveset to be written to
the AFD, while the other savesets were being staged from the AFD_RO to tape.

The problems started after recreating the volume while staging back from
tape to the AFD (command line). The first thing is that a device can not be
used for backup while savesets are being staged to it and it looks as if AFD
behaves differently compared to tape devices. All savegroups targetting the
AFD hung while staging. For this reason I killed nsrstage which caused some
more problems (later). The first thing I noticed is that ANY savegroup
targetting the AFD being staged to will hang and have to be cancelled and
restarted, and it does not matter whether the staging process was abended or
completed normally. It looks that Networker does not notice when the staging
activity to the AFD has completed. (note: I have not checked, but I seem to
remember that this problem does not occur while staging to tape).

The killed nsrstage. and something I did, caused some additional problems.I
used mminfo to produce a list of SSIDs to be migrated based on the tape
volumes. The first time (killed) this worked without additional problems.
The next time I started using a newly created list of SSIDs the staging
reported that some SSID's will be ignored because of an invalid volume entry
(= savesets already staged back), then a mount message was produced to
request a new volume to be labelled and mounted on the AFD. The reason seems
to be that nsrstage does not exclude the savesets with the invalid volume
entries and notices that an SSIDs on the AFD should be staged to same AFD.

Then I made another mistake, I decided to do a partial stage by restricting
mminfo to one volume only to cause only those savesets to be staged. (The
reason behind this decision was to try to complete staging before scheduled
backups). This worked quite well, except that during the staging process not
only the selected volume was used but also the volume before this one. The
staging process needed this volume to be able to stage a saveset that
started on the other volume. Once this volume was read it continued on the
selected volume to complete staging.

The next time I generated a list of all SSID's again to stage the remaining
savesets, this time the same message occured and I found that several SSID's
had entries with the tape volume AND the AFD listed. These savesets had been
recovered already but the media index was somehow corrupted (killed
nsrstage?). Nsrck and nsrim did not help, in the end I did a manual
selection of only those SSID's on tape to complete the staging. After
everything completed I relabeled the tape volumes to remove the volumes from
the mediadb.

The next activity wasn't really necessary, but I decided to do it anyway. I
noticed that I made a mistake in the volume label given to the AFD and I
wanted to correct because it was causing me having to lookup the name
everytime I needed it. (device =ADFD1, label=AFDF1). I created a new AFD on
the same disk volume and staged all savesets to the new AFD without any
problems. Then I relabelled the original AFD and started the staging back.
The reason for staging back is that changing the AFD would also cause many
changes in the pool resources. A few savegroups targetting the AFD were due
to start and I estimated that there would be sufficient time to complete
staging before the backups would start. That was a mistake, the staging did
complete in time but all the save groups hung requesting for a new volume to
be labelled and mounted on the AFD. I stopped the savegroups but that did
not clear the message, in the end I had to cycle networker to get control
back again.

The moral of the story: Backup and staging targetting an AFD can be
hazerdous to your health. It looks that we will have to wait for another
major release before it becomes really usable.

Bye
Ernie

--
Note: To sign off this list, send a "signoff networker" command via email
to listserv AT listmail.temple DOT edu or visit the list's Web site at
http://listmail.temple.edu/archives/networker.html where you can
also view and post messages to the list.
=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=

<Prev in Thread] Current Thread [Next in Thread>