I have cleaned up some of our
“DNS” problems, although they were not the clients in question, and will see how
it goes tonight. It also turns out that the medias servers had “files dns” in
/etc/nsswitch.conf whereas the master had “dns host”. I’ve changed the master to
match. They also changed the NIC cards on the master to 1GB instead of auto
negotiate. So we will see what happens tonight. If it is a problem with hitting
the DNS servers too hard it should get worse tonight. J
Thank all of you for your
suggestions.
From:
Bahnmiller, Bryan E. [mailto:bbahnmiller AT dtcc DOT com]
Sent: 27 September
2011 15:44
To: Patrick
Subject: RE: [Veritas-bu]
HELP!!!!
Patrick,
That is strange. I’m wondering if something else is going on. I have seen
situations where you beef up your environment and it introduces you to other
problems that used to be masked by a limited environment. With that many drives
and that much memory, you are going to be able to queue up and run more jobs. If
you are creating jobs faster, I wonder if you are running into name resolution
problems now. Can you find out how loaded your DNS server is during the same
time frame? I have seen where one of the older NBU environments I had was
pounding the DNS servers to the point that they were running 100% cpu. I thought
6.x was much better at this, but it could possibly be related to the way your
Linux servers are doing name caching and how hard they hit the DNS
servers.
One other possibility would be the VTL. I’ve had better luck with the newer
DataDomain’s from EMC than their older “DL’s”. It may be possible that they are
slow in responding to requests when they get busy, but I wouldn’t think those
would show up as error 47’s.
Does /var/log/messages show anything around the same time
frame?
Bryan
Hi All,
The situation is getting crazy. Last night 17% of our backups
failed with error code 47. It happened on only 6 of the 58 media servers. All
the jobs were trying to backup up to one of two of the four VTL libraries.
Looking at the <16> and <32> errors in /usr/openv/netbackup/logs I
see CORBA errors on 3 of the six and Robot Failures on the other three. While we
have many 47 errors on the weekends, this is a first of this magnitude for a
week day. The only change I am aware of is: last week we increased the memory of
5 of the 6 media servers from 12GB to 32GB. Is it possible to have TOO much
memory?
Environment:
RedHat Linux 64bit running 32Bit NetBackup
6.5.6
4 EMC VTL Libraries (sorry don’t know model #) 164 drives
configured on each.
The failing clients are both UNIX and Windoze with one Oracle
backup failure.
OH, and they only seem to happen between 23:00 and 04:00
(approximately)
ANY suggestions would be greatly appreciated.
Regards,
Patrick
Whelan
VERITAS Certified
NetBackup Support Engineer for UNIX.
VERITAS Certified
NetBackup Support Engineer for Windows.
netbackup AT whelan-consulting.co DOT uk
_____________________________________________________________
DTCC
DISCLAIMER: This email and any files transmitted with it are confidential and
intended solely for the use of the individual or entity to whom they are
addressed. If you have received this email in error, please notify us
immediately and delete the email and any attachments from your system. The
recipient should check this email and any attachments for the presence of
viruses. The company accepts no liability for any damage caused by any virus
transmitted by this email.