Re: [Veritas-bu] HELP!!!!

I would be careful with locking a 1GbE NIC. The definition of 1GbE mandates autonegotiation, i.e. it is not valid to lock the speed. You can of course only advertise 1000/FDX, so that would be the only possibility for autonegotiation.

DNS can be slowed down if you have lots of domain names in the domain search list, as it will try them all. You can avoid this by using fully-qualified names with a terminal dot (e.g. server.bigco.com.) but I must admit I don’t as it would confuse people who don’t know what it is for and some tools/scripts will just break with it.

Maybe worth checking with traceroute to your DNS servers and between your servers, to make sure it is using the NICs that you expect (if you have > 1 in any server).

You can use a tool like ‘ping plotter’ to see if there is something really slow in your network, but it is more aimed at WAN testing.

William D L Brown

From: veritas-bu-bounces AT mailman.eng.auburn DOT edu [mailto:veritas-bu-bounces AT mailman.eng.auburn DOT edu] On Behalf Of Patrick
Sent: 27 September 2011 15:17
To: veritas-bu AT mailman.eng.auburn DOT edu
Subject: Re: [Veritas-bu] HELP!!!!

I have cleaned up some of our “DNS” problems, although they were not the clients in question, and will see how it goes tonight. It also turns out that the medias servers had “files dns” in /etc/nsswitch.conf whereas the master had “dns host”. I’ve changed the master to match. They also changed the NIC cards on the master to 1GB instead of auto negotiate. So we will see what happens tonight. If it is a problem with hitting the DNS servers too hard it should get worse tonight. J

Thank all of you for your suggestions.

Regards,

Patrick Whelan

VERITAS Certified NetBackup Support Engineer for UNIX.

VERITAS Certified NetBackup Support Engineer for Windows.

netbackup AT whelan-consulting.co DOT uk

From: Bahnmiller, Bryan E. [mailto:bbahnmiller AT dtcc DOT com]
Sent: 27 September 2011 15:44
To: Patrick
Subject: RE: [Veritas-bu] HELP!!!!

Patrick,

That is strange. I’m wondering if something else is going on. I have seen situations where you beef up your environment and it introduces you to other problems that used to be masked by a limited environment. With that many drives and that much memory, you are going to be able to queue up and run more jobs. If you are creating jobs faster, I wonder if you are running into name resolution problems now. Can you find out how loaded your DNS server is during the same time frame? I have seen where one of the older NBU environments I had was pounding the DNS servers to the point that they were running 100% cpu. I thought 6.x was much better at this, but it could possibly be related to the way your Linux servers are doing name caching and how hard they hit the DNS servers.

One other possibility would be the VTL. I’ve had better luck with the newer DataDomain’s from EMC than their older “DL’s”. It may be possible that they are slow in responding to requests when they get busy, but I wouldn’t think those would show up as error 47’s.

Does /var/log/messages show anything around the same time frame?

Bryan

From: veritas-bu-bounces AT mailman.eng.auburn DOT edu [mailto:veritas-bu-bounces AT mailman.eng.auburn DOT edu] On Behalf Of Patrick
Sent: Tuesday, September 27, 2011 4:17 AM
To: veritas-bu AT mailman.eng.auburn DOT edu
Subject: [Veritas-bu] HELP!!!!

Hi All,

The situation is getting crazy. Last night 17% of our backups failed with error code 47. It happened on only 6 of the 58 media servers. All the jobs were trying to backup up to one of two of the four VTL libraries. Looking at the <16> and <32> errors in /usr/openv/netbackup/logs I see CORBA errors on 3 of the six and Robot Failures on the other three. While we have many 47 errors on the weekends, this is a first of this magnitude for a week day. The only change I am aware of is: last week we increased the memory of 5 of the 6 media servers from 12GB to 32GB. Is it possible to have TOO much memory?

Environment:

RedHat Linux 64bit running 32Bit NetBackup 6.5.6

4 EMC VTL Libraries (sorry don’t know model #) 164 drives configured on each.

The failing clients are both UNIX and Windoze with one Oracle backup failure.

OH, and they only seem to happen between 23:00 and 04:00 (approximately)

ANY suggestions would be greatly appreciated.

Regards,

Patrick Whelan

VERITAS Certified NetBackup Support Engineer for UNIX.

VERITAS Certified NetBackup Support Engineer for Windows.

netbackup AT whelan-consulting.co DOT uk

_____________________________________________________________
DTCC DISCLAIMER: This email and any files transmitted with it are confidential and intended solely for the use of the individual or entity to whom they are addressed. If you have received this email in error, please notify us immediately and delete the email and any attachments from your system. The recipient should check this email and any attachments for the presence of viruses. The company accepts no liability for any damage caused by any virus transmitted by this email.

This e-mail was sent by GlaxoSmithKline Services Unlimited
(registered in England and Wales No. 1047315), which is a
member of the GlaxoSmithKline group of companies. The
registered address of GlaxoSmithKline Services Unlimited
is 980 Great West Road, Brentford, Middlesex TW8 9GS.

_______________________________________________
Veritas-bu maillist  -  Veritas-bu AT mailman.eng.auburn DOT edu
http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu