Networker

Re: [Networker] xdr of Linux extended attributes failed for `/home/'

2011-08-22 18:47:21
Subject: Re: [Networker] xdr of Linux extended attributes failed for `/home/'
From: Tim Mooney <Tim.Mooney AT NDSU DOT EDU>
To: NETWORKER AT LISTSERV.TEMPLE DOT EDU
Date: Mon, 22 Aug 2011 17:45:52 -0500
In regard to: Re: [Networker] xdr of Linux extended attributes failed for...:

Hi Tim,

The answers to your questions appear below:

On 08 16, 2011, at 1:05 PM, Tim Mooney wrote:

In regard to: [Networker] xdr of Linux extended attributes failed for...:

I am  having trouble backing up a pair of new Red Hat Linux servers. The
only error msg is what appears in the subject of this message. It
happens with all the file systems on both clients, but not consistently.
I opened a case with EMC about this yesterday, but it is in research
status. While I am waiting for feedback from EMC, I am wondering if
anyone else on this list has run into this problem. The NetWorker server
is running Red Hat Linux. So is the client. The server is running Power
Edition 7.6 SP1 and the client has 7.6 SP2.

What version of RHEL is on the clients?

[root@prd-tds3 ~]# cat /etc/redhat-release
Red Hat Enterprise Linux Server release 5.6 (Tikanga)

What's the filesystem type that's in use on the clients?

ext3

Is SELinux active on the clients?

No.

What does

        lsattr -R /home/inst1ids | egrep -v '^-------------'

report?

So basically there's nothing "exotic" in play.

We back up lots of RHEL clients, the majority of which are now 5.7, and
I've never seen this particular error (at least on backup, I've seen xdr
errors when doing cross-UNIX file recovery).  Only our smaller "test"
NetWorker server is running 7.6(.2.2), though -- our production server is
still 7.5.3.  It will be a couple more weeks before I take that to 7.6.2.2.

The fact that it doesn't always happen on the same file (or directory)
also adds to the difficulty.

If EMC hasn't been able to come up with anything yet, I would probably
continue the debugging by looking for core files in /nsr/cores on the
client(s).  If there are some, especially for any of the components of
the backup process, I would prod EMC to examine them.

If there aren't any, my approach would be to attach to nsrexecd with
strace and follow (-f) forks and then initiate a backup.  That may give
you a clue about what's happening when the save is failing, though it's
going to be challenging to interpret.  Without binaries with debugging
symbols gdb isn't very helpful, and on RHEL 5 systemtap is pretty limited,
so strace is the most useful forensic probe you have.

It might be worth trying to initiate the backup with

        savegrp -vvv -c clientname groupname

but I have my doubts about how much useful information you'll get from
that.

Tim
--
Tim Mooney                                             Tim.Mooney AT ndsu DOT 
edu
Enterprise Computing & Infrastructure                  701-231-1076 (Voice)
Room 242-J6, IACC Building                             701-231-8541 (Fax)
North Dakota State University, Fargo, ND 58105-5164

To sign off this list, send email to listserv AT listserv.temple DOT edu and type 
"signoff networker" in the body of the email. Please write to networker-request 
AT listserv.temple DOT edu if you have any problems with this list. You can access the 
archives at http://listserv.temple.edu/archives/networker.html or
via RSS at http://listserv.temple.edu/cgi-bin/wa?RSS&L=NETWORKER