Veritas-bu

Re: [Veritas-bu] Master server can't see media servers

2007-07-09 16:43:42
Subject: Re: [Veritas-bu] Master server can't see media servers
From: "Sponsler, Michael" <Michael.Sponsler AT ngc DOT com>
To: "Ed Wilts" <ewilts AT ewilts DOT org>
Date: Mon, 9 Jul 2007 16:27:43 -0400
Well, fixed it.  Well...by fixed I mean I was forced to recover a Catalog backup.
 
My /usr/openv directory sits on a 9990 Raid, connected via fibre, and running veritas file system (vxfs), storage foundation 5.0.  Well, the file system was extremely corrupted.  I ran a fsck on the partition, and I got inode errors out the wazoo.  I recovered the catalog, and got everything picked up.  But there was a weekend lost.....
 
Why can't things break on monday, or tuesday when you can spend the whole week fixing them.  No, everything *has* to break on a friday....
 
Thanks everyone.
 
--
Mike Sponsler
Northrop Grumman Information Technology
 


From: Ed Wilts [mailto:ewilts AT ewilts DOT org]
Sent: Friday, July 06, 2007 7:52 AM
To: Sponsler, Michael
Cc: VERITAS-BU AT mailman.eng.auburn DOT edu
Subject: RE: [Veritas-bu] Master server can't see media servers

50+ media servers should not be that risky with 6.0 – one of the driving reasons behind the 6.0 release was for scalability, especially with a large number of media servers.

 

Are the pbx processes all running properly?  If they weren’t, then I could picture the symptoms you’re seeing with ssh/ping working, but NetBackup not working.

 

               …/Ed

--

Ed Wilts, Mounds View, MN, USA

mailto:ewilts AT ewilts DOT org

I GoodSearch for Bundles Of Love:  http://www.goodsearch.com/?charityid=821118

 

From: veritas-bu-bounces AT mailman.eng.auburn DOT edu [mailto:veritas-bu-bounces AT mailman.eng.auburn DOT edu] On Behalf Of Sponsler, Michael
Sent: Friday, July 06, 2007 1:46 AM
To: Martin Ruslan; veritas-bu AT mailman.eng.auburn DOT edu
Subject: Re: [Veritas-bu] Master server can't see media servers

 

Yeah, I know having 50+ media servers is risky....but there really is no other way to properly do it.

 

I'm beginning to think it isn't netbackup.  My /usr/openv directory is mounted via a direct connect fibre on a Sun 9990 raid.  I'm running veritas file system (vxfs) 5.0.  I'm also doing Veritas Volume Replicator (VVR) to another Sun 9990 raid at a DRS site.  Both sites are connected via a large, private pipe...so bandwidth between the two sites isn't an issue.  But I'm seeing this in my /var/adm/messages file:

 

vxio V-5-0-0 disconnecting rlink rlk_<hostname>_bkprvg due to exessive retries.

 

Also...when some stuff hangs, and I go into /usr/openv (or any directory under that) my terminal hangs.  I can ssh back into the box, and I'm okay until I navigate into /usr/openv.  I've done a full fsck of the vxfs file system, it wasn't clean the first time...but has come back clean since.

 

So I may be having vxfs issues.  :-/  Yippie....

 

--

Mike Sponsler

Michael.Sponsler AT ngc DOT com

Northrop Grumman Information Technology

 

 


From: Martin Ruslan [mailto:mit.martin AT gmail DOT com]
Sent: Friday, July 06, 2007 1:49 AM
To: veritas-bu AT mailman.eng.auburn DOT edu
Cc: Sponsler, Michael
Subject: Re: [Veritas-bu] Master server can't see media servers

Yeap..
it's odd.. :)
well.. as far as I know, it's too risky if you have that much of media server.
Because they alway communicate each other, and when even one media server couldn't talk, the process will be hung.

Are you already check this:

- on /usr/openv/netbackup/bp.conf, did all the media server registered there, with the "SERVER = media_server_name" (without quotes) ?

yes
for all of the media server too?


check on: /usr/openv/netbackup/bin/bpps -a for the hung processes.
try to kill the hung process with ./kill -9 <Hunged PID process>
Then you'll know which media server had the problems.

Regards,
mTz

On 7/6/07, Sponsler, Michael <Michael.Sponsler AT ngc DOT com> wrote:

- Did all of your media server registerd on your /etc/hosts file at the master server?
yes.  The environment had been working for several months.  No recent patch updates or changes (that I'm aware of).  The master server lost "netbackup communication" with the media servers upon restarting the netbackup daemons.

 

- did the master server name and ip address listed on /etc/hosts at all the media server?

 yes.  I can ping and ssh to all media servers and vice versa

 

- on /usr/openv/netbackup/bp.conf, did all the media server registered there, with the "SERVER = media_server_name" (without quotes) ?

yes

 

- check on the media server:  "./usr/openv/volmgr/bin/vmglob  get_gdbhost"

It gives me the master server's hostname

 

 

It's odd, huh?

 

--

Mike Sponsler

Michael.Sponsler AT ngc DOT com

Northrop Grumman Information Technology

 

 


From: Martin Ruslan [mailto:mit.martin AT gmail DOT com ]
Sent: Friday, July 06, 2007 1:36 AM
To: veritas-bu AT mailman.eng.auburn DOT edu
Cc: Sponsler, Michael
Subject: Re: [Veritas-bu] Master server can't see media servers

- Did all of your media server registerd on your /etc/hosts file at the master server?
- did the master server name and ip address listed on /etc/hosts at all the media server?
- on /usr/openv/netbackup/bp.conf, did all the media server registered there, with the "SERVER = media_server_name" (without quotes) ?
- check on the media server:  "./usr/openv/volmgr/bin/vmglob  get_gdbhost"
  is the result was the master server? if not, run: "./usr/openv/volmgr/bin/vmglob  set_gdbhost master_server_name"

Check it, and give us the result.. :)

Regards,
mTz

On 7/6/07, Sponsler, Michael <Michael.Sponsler AT ngc DOT com > wrote:

Netbackup 6.0 MP4, solaris 10 master server; Netbackup 6.0 MP4, solaris 8 san media servers

Roughly 55 media servers.

 

After I rebooted the netbackup daemons, I had the following issue:

Master server can ping and ssh to all media servers, but Master server cannot communicate to media servers via netbackup.  vmoprcmd command hangs.  Any jobs for media servers that start up come back with "Media server is not active".  Netbackup is running on master server and all media servers.  Tried rebooting the master server with same outcome.  There is no firewall between the master server and any media servers.

 

Something obviously changed before I restarted the Netbackup daemons....anyone have any ideas?

 

--

Mike Sponsler

Northrop Grumman Information Technology

 



--
Best Regards,
Martin Ruslan
Support Service Engineer
PT. Millennia Infokom Teknologi
Rahardjo Building 3rd Floor
Jl. Roa Malaka Utara No. 5 & 6
Jakarta 11230 - Indonesia
Tel     :     [62-21] 693-0380
Fax    :     [62-21] 692-2481
Mobile:     [62-81] 808888879  
E-mail:     martin AT mit.co DOT id
Information Made Powerfull

_______________________________________________
Veritas-bu maillist  -  Veritas-bu AT mailman.eng.auburn DOT edu
http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu