Veritas-bu

Re: [Veritas-bu] HELP!!!!

2011-09-27 18:26:07
Subject: Re: [Veritas-bu] HELP!!!!
From: Rusty Major <rusty.major AT sungard DOT com>
To: Patrick <netbackup AT whelan-consulting.co DOT uk>, veritas-bu AT mailman.eng.auburn DOT edu
Date: Tue, 27 Sep 2011 17:25:57 -0500

Patrick,

 

What network do your NetBackup servers use for inter-server communication? Is it a shared backup network or is it a dedicated network? Either way, see how much bandwidth is being used.

 

You mentioned you set the NIC on the master to 1GB. If that wasn?t running at 1G, it is quite possible that you were dropping packets or they were timing out because if the network was overloaded or misconfigured (half duplex). Some of those packets are the inter-NBU communication between the media servers and may be the CORBA errors you see.

 

We have run into this problem a couple of times before and we eventually settled on a separate network for inter-server communication. It works well, but it?s a bit complex and troublesome if you don?t understand how it was setup.

 

Have you considered working with support to run an Apparenet scan on the network (I think they changed the name of the tool, but it tests the network and provides you a report of what?s wrong).

 

-Rusty

 

From: veritas-bu-bounces AT mailman.eng.auburn DOT edu [mailto:veritas-bu-bounces AT mailman.eng.auburn DOT edu] On Behalf Of Patrick
Sent: Tuesday, September 27, 2011 9:17 AM
To: veritas-bu AT mailman.eng.auburn DOT edu
Subject: Re: [Veritas-bu] HELP!!!!

 

I have cleaned up some of our ?DNS? problems, although they were not the clients in question, and will see how it goes tonight. It also turns out that the medias servers had ?files dns? in /etc/nsswitch.conf whereas the master had ?dns host?. I?ve changed the master to match. They also changed the NIC cards on the master to 1GB instead of auto negotiate. So we will see what happens tonight. If it is a problem with hitting the DNS servers too hard it should get worse tonight. J

 

Thank all of you for your suggestions.

 

Regards,

 

Patrick Whelan

VERITAS Certified NetBackup Support Engineer for UNIX.

VERITAS Certified NetBackup Support Engineer for Windows.

 

netbackup AT whelan-consulting.co DOT uk

 

 

From: Bahnmiller, Bryan E. [mailto:bbahnmiller AT dtcc DOT com]
Sent: 27 September 2011 15:44
To: Patrick
Subject: RE: [Veritas-bu] HELP!!!!

 

Patrick,

 

                That is strange. I?m wondering if something else is going on. I have seen situations where you beef up your environment and it introduces you to other problems that used to be masked by a limited environment. With that many drives and that much memory, you are going to be able to queue up and run more jobs. If you are creating jobs faster, I wonder if you are running into name resolution problems now. Can you find out how loaded your DNS server is during the same time frame? I have seen where one of the older NBU environments I had was pounding the DNS servers to the point that they were running 100% cpu. I thought 6.x was much better at this, but it could possibly be related to the way your Linux servers are doing name caching and how hard they hit the DNS servers.

 

                One other possibility would be the VTL. I?ve had better luck with the newer DataDomain?s from EMC than their older ?DL?s?. It may be possible that they are slow in responding to requests when they get busy, but I wouldn?t think those would show up as error 47?s.

 

                Does /var/log/messages show anything around the same time frame?

 

                                Bryan

 

From: veritas-bu-bounces AT mailman.eng.auburn DOT edu [mailto:veritas-bu-bounces AT mailman.eng.auburn DOT edu] On Behalf Of Patrick
Sent: Tuesday, September 27, 2011 4:17 AM
To: veritas-bu AT mailman.eng.auburn DOT edu
Subject: [Veritas-bu] HELP!!!!

 

Hi All,

 

The situation is getting crazy. Last night 17% of our backups failed with error code 47. It happened on only 6 of the 58 media servers. All the jobs were trying to backup up to one of two of the four VTL libraries. Looking at the <16> and <32> errors in /usr/openv/netbackup/logs I see CORBA errors on 3 of the six and Robot Failures on the other three. While we have many 47 errors on the weekends, this is a first of this magnitude for a week day. The only change I am aware of is: last week we increased the memory of 5 of the 6 media servers from 12GB to 32GB. Is it possible to have TOO much memory?

 

Environment:

RedHat Linux 64bit running 32Bit NetBackup 6.5.6

4 EMC VTL Libraries (sorry don?t know model #) 164 drives configured on each.

The failing clients are both UNIX and Windoze with one Oracle backup failure.

 

OH, and they only seem to happen between 23:00 and 04:00 (approximately)

 

ANY suggestions would be greatly appreciated.

 

Regards,

 

Patrick Whelan

VERITAS Certified NetBackup Support Engineer for UNIX.

VERITAS Certified NetBackup Support Engineer for Windows.

 

netbackup AT whelan-consulting.co DOT uk

 

 


_____________________________________________________________

DTCC DISCLAIMER: This email and any files transmitted with it are confidential and intended solely for the use of the individual or entity to whom they are addressed. If you have received this email in error, please notify us immediately and delete the email and any attachments from your system. The recipient should check this email and any attachments for the presence of viruses. The company accepts no liability for any damage caused by any virus transmitted by this email.

_______________________________________________
Veritas-bu maillist  -  Veritas-bu AT mailman.eng.auburn DOT edu
http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu
<Prev in Thread] Current Thread [Next in Thread>