Re: [Networker] LGTPA 30132

On Mon, 12 Aug 2002 12:04:54 -0400, colin_mcfadyen AT carleton DOT ca
<colin_mcfadyen AT CARLETON DOT CA> wrote:

>Davina,
>Where would I get the hotfix?  I am running 6.0.2 on W2K.

Legato should supply it on request as part of the support call you have
open with them. They sent it to me for the same problem, and it worked.
Perhaps I was luckier with the engineer that dealt with my call, you seem
to have been landed with one who doesn't know as much. Here is the info
that Legato sent me about the bug.

---snip---
Here are the information about the LGTpa30132:

    Fact   NetWorker 6.0.2

    Fact   Backup works fine

    Symptom   Error: 'media emergency: Bad or missing record: save set ID
(#), low water mark (#), current offset (#)'

    Symptom   Error: 'nsrclone: error, bad or missing record: ssid (#), low
(#), current (#)'

    Symptom   Error: '(server_name): bad or missing record: ssid (#), low
(#);, current (#)'

    Symptom   Error: 'recover: Unable to read checksum from save stream'

    Symptom   Error: 'recover: error recovering (filename)'

    Symptom   Recover operation fails and results in error

    Cause   This problem will only happen when multiplexing immediate and
non-immediate save streams onto the same tape drive.

During backup, nsrmmd is updating mediaDB together while writing to tape.
In this problem, the information written to tape is correct, but reference
file and record numbers are not updated properly when immediate and non-
immediate save are multiplexing onto the same file and record on tape,
thereby causing NetWorker to have a discrepancy between the file and record
number reflecting physical data on tape and those recorded in mediaDB.

This problem exists on all version, except the following which contains the
solution:
(1) NetWorker 5.5.3, 5.5.4 and above (for the 5.5.x series)
(2) NetWorker 6.1.1 and above (for all future releases)

    Fix   Known issue: resolved in LGTpa30132. Fix expected to be in
NetWorker 5.5.3, 5.5.4 , 6.1.1 and above. If these version are not
available yet, please contact Legato Customer Support to get temporary
hotfixes for the desired platform, hotfixes are available only for 6.0.1,
6.0.2 and 6.1.

[Hotfix binaries]
nsrmmd
scanner
tapeexer

NOTE #1: The above hotfix takes care of the problem in two parts:
-- ensure all future entries into the media database is updated correctly
from the day this hotfix is applied;
-- ensure nsrmmd will try its best to recover from old media database
entries (containing invalid file and record numbers); only in exceptions
when the available information is not sufficient to recover the data, the
end-user is prompted to run 'scanner -i (tape_device)' or 'scanner -m
(tape_device)' in the daemon.log, which will definitely be able to restore
the required data, except it takes extra time to run scanner.

---snip---

>
>On Mon, 12 Aug 2002 11:44:59 -0400, Davina Treiber <treiber AT HOTPOP DOT COM>
>wrote:
>
>>That seems like a rather involved procedure. Why not just install the
>>hotfix for the bug? It is supposed to allow you to scanner in any problem
>>save sets in most cases. Is this too obvious?
>>
>>On Mon, 12 Aug 2002 11:41:02 -0400, colin_mcfadyen AT carleton DOT ca
>><colin_mcfadyen AT CARLETON DOT CA> wrote:
>>
>>>Hi all,
>>>I am seeing the same behaviour as Mark.  I am willing to try to clean
>>>things up using the instructions listed below.
>>>
>>>However, when I use the mminfo command, it lists many many incidences
>>>rather than the single incident reported below.
>>>
>>>My question is, do I have to scanner each incidence?  That will take
>>>forever.
>>>
>>>Thanks.
>>>
>>>On Thu, 9 Aug 2001 17:45:18 +0100, Mark Kilpatrick <Kilpatrick AT XNET DOT 
>>>IE>
>>>wrote:
>>>
>>>>Is this a known issue with Networker 6.0.1???  I've been receiving them
>>>>while cloning savesets but put it down to bad media.  I received the
>>>>following instructions from Legato Support....
>>>>
>>>>
>>>>...Solution involves having NetWorker position the  device's heads at
>the
>>>>record & offset where the save set is located on the volume and begin
>>>>scanning from there.  The  media database will be updated properly.
>>>>
>>>>First, get the SSID from the error message:
>>>>Error encountered by NSR server `myhost': Bad or missing record: save
>set
>>>ID
>>>>1144599809, low water mark  139303368, current offset 139283040
>>>>In this case the SSID is 1144599809,
>>>>
>>>>
>>>>Then get the file and record number for the SSID:
>>>>mminfo -avVot -q 'ssid=1144599809'
>>>>
>>>>
>>>>This will generate a report:
>>>>myhost{root}41: mminfo -avVot -q 'ssid=1144599809'
>>>>volume       client           size       level   name      ssid
>>>save
>>>>time    date      time     browse   retent
>>>>000911       moo.cow.com        20 MB   full   <19>/d06  1144599809
>>>>994319861    07/05/01  00:57:41 08/30/01 08/30/01
>>>>
>>>>first   last        file  rec    volid           total fl
>>>>  0     20555463    273   1468   1090286593      2048055264 hb
>>>>
>>>>
>>>>Then run scanner on the SSID at the specific file and record offset:
>>>>
>>>>scanner -i -f 273 -r 1468 /dev/rmt/0cbn
>>>>
>>>>You should get a message something like:
>>>>scanner: ssid 1144634369: 18 MB, 1301 file(s)
>>>>scanner: correcting overlapping fragment for ssid 1144599809, low
>>>139283040
>>>>        got volid 1090286593, ffn 277, frn 1
>>>>        had volid 1090286593, ffn 277, frn 2
>>>>
>>>>
>>>>Do this for each SSID listed in the logs and then carry out recovery
>>>>procedures.  Please note that if the media and drives  are functional,
>the
>>>>data should recover properly.
>>>>
>>>>It's realistic to expect that there may be more then one bad mm db entry
>>>on
>>>>the tape, so please let scanner finish the tape.
>>>>
>>>>You could, at your own risk, hit Ctrl-C after the correcting of the
>>>>overlapping fragment is completed for that entry AND  the next record is
>>>>being looked at (give it 5 mins or so).  An attempt of the recovery can
>be
>>>>attempted at that point.    Please note that if this method is used, and
>>>>there are other occurances of bad or missing record for the volume(s) in
>>>>question then the recovery will fail and and the method would have to be
>>>>repeated for those save sets.
>>>>
>>>>
>>>>
>>>>
>>>>-----Original Message-----
>>>>From: Mason, Andrew [mailto:Andrew.Mason AT GETRONICS DOT COM]
>>>>Sent: Thursday, August 09, 2001 10:06 AM
>>>>To: NETWORKER AT LISTSERV.TEMPLE DOT EDU
>>>>Subject: Re: [Networker] LGTPA 30132
>>>>
>>>>
>>>>I've seen quite a lot of these errors but didn't have any ideas as to
>>>>what was causing them.  I have received them under 6.0.1 and 6.0.2.
>>>>
>>>>I'm sorry I can't answer any of your questions.
>>>>
>>>>-----Original Message-----
>>>>From: Nikiforuk, Kevin [mailto:knikifor AT EPCOR DOT CA]
>>>>Sent: 31 July 2001 16:05
>>>>To: NETWORKER AT LISTSERV.TEMPLE DOT EDU
>>>>Subject: [Networker] LGTPA 30132
>>>>
>>>>
>>>>Is anyone else suffering from this 6.0.2 bug?  The symptom I've seen is:
>>>>"nsrd: media emergency: Bad or missing record: save set ID 1540407553,
>>>>low
>>>>water mark 1824823152, current offset 1824790384" during cloning of
>>>>savesets.
>>>>
>>>>According to Legato, this is the result of a media index corruption
>>>>which
>>>>causes restores and cloning to fail.  Right now I'm contemplating
>>>>rolling
>>>>back my upgraded server, even though it's been running on for three
>>>>weeks.
>>>>Has anyone found a way around this?
>>>>
>>>>If I roll back, am I going to lose my three weeks of indexes.  Does
>>>>anyone
>>>>know if you can use scanner to read tapes that were written on 6.0.2
>>>>into a
>>>>5.1 server?
>>>>
>>>>Regards,
>>>>Kevin

--
Note: To sign off this list, send a "signoff" command via email
to listserv AT listmail.temple DOT edu or visit the list's Web site at
http://listmail.temple.edu/archives/networker.html where you can
also view and post messages to the list.
=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=