Networker

Re: [Networker] Intermitten NDMP failures on NetApp

2009-09-28 09:55:10
Subject: Re: [Networker] Intermitten NDMP failures on NetApp
From: Thierry FAIDHERBE <thierry.faidherbe AT FOREM DOT BE>
To: NETWORKER AT LISTSERV.TEMPLE DOT EDU
Date: Mon, 28 Sep 2009 15:50:38 +0200
You can try increasing inactivity timeout. (eg 60 Min) 
When backuping big files for sample, if client file index 
is not updated within the interval, networker may wrongly 
consider the save to be "dead", then wrongly aborts current
save and then retries.

Different factors can cause speed variation or "freeze-like" 
(for sample drive, media, NetApps load, backup server load, 
if a bootstrap is being saved, ...) that may highlight such a problem.

I saw such behaviours not only with NDMP and NetApps but also with
different big DB servers with small amount of files.

HTH

Th


-----Original Message-----
From: EMC NetWorker discussion [mailto:NETWORKER AT LISTSERV.TEMPLE DOT EDU] On
Behalf Of dmitri
Sent: lundi 28 septembre 2009 15:29
To: NETWORKER AT LISTSERV.TEMPLE DOT EDU
Subject: [Networker] Intermitten NDMP failures on NetApp

I am getting intermittent failures on some of NetApp volumes, backed up via
NDMP.  It's not always the same volume, not even the same NetApp unit.
Every once in a blue moon Networker reports a network failure and aborts a
volume, usually in the middle of a FULL backup:

XXX:xxx 42617:nsrndmp_save: NDMP Service Log: DUMP: Sat Sep 26 19:53:54 2009
: We have written 35297975 KB.
XXX:xxx 42957:nsrdsa_save: Error : readlen = -1 (avail = 58648) 
XXX:xxx read: Connection reset by peer42950:nsrdsa_save: Save failed
XXX:xxx 42617:nsrndmp_save: NDMP Service Log: DUMP: Network communication
error
XXX:xxx 42738:nsrndmp_save: Data server halted: The backup is aborted by
operator.
XXX:xxx 42617:nsrndmp_save: NDMP Service Log: Aborted by client

I am running 7.4.4 on Sparc T2000, NetApps are FAS3040 with Ontap7.3.3
NDMP options:
HIST=y
DIRECT=y
UPDATE=y
OPTION=NT
UTF8=y

Inactivity timeout - 15
Client retries - 1
when such a failure happens - both retries fail, obviously.
I haven't noticed anything unusual during these failures, no network
problems (the rest of the volumes ran just fine), no high load anywhere.

I was wondering if any of you folks has ever observed such behaviour?

TIA
dmitri

+----------------------------------------------------------------------
|This was sent by dmitri_ryjikh AT ml DOT com via Backup Central.
|Forward SPAM to abuse AT backupcentral DOT com.
+----------------------------------------------------------------------

To sign off this list, send email to listserv AT listserv.temple DOT edu and 
type
"signoff networker" in the body of the email. Please write to
networker-request AT listserv.temple DOT edu if you have any problems with this
list. You can access the archives at
http://listserv.temple.edu/archives/networker.html or
via RSS at http://listserv.temple.edu/cgi-bin/wa?RSS&L=NETWORKER

To sign off this list, send email to listserv AT listserv.temple DOT edu and 
type "signoff networker" in the body of the email. Please write to 
networker-request AT listserv.temple DOT edu if you have any problems with this 
list. You can access the archives at 
http://listserv.temple.edu/archives/networker.html or
via RSS at http://listserv.temple.edu/cgi-bin/wa?RSS&L=NETWORKER