Veritas-bu

Re: [Veritas-bu] big issue with netbackup upgrade; any insight?

2008-03-12 13:56:16
Subject: Re: [Veritas-bu] big issue with netbackup upgrade; any insight?
From: rascal <rascal1981 AT gmail DOT com>
To: "Nardello, John" <john.nardello AT wamu DOT net>
Date: Wed, 12 Mar 2008 12:28:40 -0500
Sorry, I meant to mention that this is on our HOT catalog backups.  See correction below!

On Wed, Mar 12, 2008 at 11:24 AM, rascal <rascal1981 AT gmail DOT com> wrote:
Awesome John; this resolved our problems!  Thanks very much and I can tell you I am very excited about the prospect of more EMM aliases (read: sarcasm)!!

I had tried the aliases before but I think that is where I made my mistake in that I didn't use the command correctly but with your help, we got it resolved.  Now we have another fun issue that perhaps you can offer some insight on:

Now that we got this resolved with the aliases I wanted to drop this on everyone and see what their thoughts are.  Same details as the previous issue (recap):


1.  mix environment; primarily AIX5xxx boxes, master/media servers, all have multiple NIC interfaces (pub/private/backup)
2.  going from NB5MP5 to NB651
3.  procedure went like so:
     a.  stop services
     b.  upgrade environment
     c.  run nbpushdata and note errors (go back and fix in nb5 if required)
     d.  patch
     e.  start services
     f.  push client
     g. test backups/restores
4.  Fixed alias issue

WRONG Catalog backups (cold) cause a bpdbm core dump (here are the contents of the log file):
RIGHT    Catalog backups (HOT) cause a bpdbm core dump (here are the contents of the log file):
 

dbx) where
pthread_kill(??, ??) at 0xd0063288
_p_raise(??) at 0xd0062d20
raise.raise(??) at 0xd01fa0a0
abort.abort() at 0xd021a7c4
ut_onsig_sig_handler__FiPvT2(??, ??, ??) at 0xd05a281c
.() at 0x0
thread_lane_resources_manager__12TAO_ORB_CoreFv() at 0xd553bea4
lane_resources__12TAO_ORB_CoreFv() at 0xd553be18
leader_follower__12TAO_ORB_CoreFv() at 0xd553bdd8
reactor__12TAO_ORB_CoreFv() at 0xd55b5190
init__12TAO_ORB_CoreFRiPPc() at 0xd56030d0
ORB_init__5CORBAFRiPPcPCcR17CORBA_Environment() at 0xd55965cc
ORB_init__5CORBAFRiPPcPCc() at 0xd559589c
Init_3OrbFP17ACE_Timer_Queue_TXTP17ACE_Event_HandlerT39ACE_Event_Handler_Handle_Timeout_UpcallXT14ACE_Null_Mutex_T14ACE_Null_Mutex_() at 0xd2a87a68
initializeJmComm__FPCcPCcP12hidecorbaobjPFiPCcPCc_iPCc() at 0x1016d2ac
initializeJobInstInterface() at 0x1016f720
jmcomm_UpdateActionStatus() at 0x1016f87c
image_db() at 0x107ec704
process_request() at 0x107d6bf0
listen_loop() at 0x107d7c3c
bpdbm.main() at 0x107d55c4

Any ideas folks?


On Tue, Mar 11, 2008 at 11:44 AM, Nardello, John <john.nardello AT wamu DOT net> wrote:
I'm assuming you have multiple hostnames defined for the Master/Media servers, one per interface ? If so, welcome to the fun word of EMM aliases. =)
 
# nbemmcmd -machinealias -help
NBEMMCMD, Version:6.0MP5(20060530)
Help requested.
Usage:
nbemmcmd -machinealias [-brief]
    [-addalias -alias <string> -machinename <string>]
    [-deletealias -alias <string>]
    [-deleteallaliases -machinename <string>]
    [-getaliases -machinename <string>]
    -machinetype <api | app_cluster | cluster | master | media | ndmp >
Command completed successfully.
I've seen this kind of thing when the Media Servers in particular have multiple interefaces and are attempting communication over a different one than normal. You basically just need to add an alias to authorize that interface.
 
So if your alternate interface is myhost-backup on your media server, then you'd run:
    nbemmcmd -machinealias -addalias -alias myhost-backup -machinename myhost -machinetype media
 
Then you can use the 'getaliases' option to list out what you've currently got defined and make sure all your interfaces are listed.
 
Hope that helps.
John Nardello


From: veritas-bu-bounces AT mailman.eng.auburn DOT edu [mailto:veritas-bu-bounces AT mailman.eng.auburn DOT edu] On Behalf Of rascal
Sent: Tuesday, March 11, 2008 8:38 AM
To: veritas-bu AT mailman.eng.auburn DOT edu
Subject: [Veritas-bu] big issue with netbackup upgrade; any insight?

Good morning/afternoon/evening all,

     So I have run into a fun, fun issue with my netbackup upgrade.  Here are the details:

1.  mix environment; primarily AIX5xxx boxes, master/media servers, all have multiple NIC interfaces (pub/private/backup)
2.  going from NB5MP5 to NB651
3.  procedure went like so:
     a.  stop services
     b.  upgrade environment
     c.  run nbpushdata and note errors (go back and fix in nb5 if required)
     d.  patch
     e.  start services
     f.  push client
     g. test backups/restores

     So here is what is interesting. 

1.  Hot catalogs fail;  here is an error:
03/09/2008 11:00:20 - begin Catalog Backup
03/09/2008 11:00:24 - Error bpbackupdb (pid=5779600) jmcomm_RequestMultipleResources() failed with stat = 800
03/09/2008 11:00:24 - Error bpbackupdb (pid=5779600) NBJM returned an extended error status: resource request failed (800)
03/09/2008 11:00:24 - end Catalog Backup; elapsed time 0:00:04
03/09/2008 11:00:25 - Error bpbackupdb (pid=5779600) Offline catalog backup to media id XXXXXX FAILED
03/09/2008 11:00:22 - requesting resource XXXXXX
03/09/2008 11:00:22 - Error nbjm (pid=1810436) NBU status: 800, EMM status: The host is not defined in EMM
An extended error status has been encountered, check detailed status (252)

2.  Restores to different locations other than local server:

03/11/2008 09:09:25 - begin Restore
03/11/2008 09:09:30 - number of images required: 1
03/11/2008 09:09:30 - media needed: XXXXXX
03/11/2008 09:10:09 - restoring from image XXXXXXXX_1191236706
03/11/2008 09:10:13 - connecting
03/11/2008 09:10:16 - connected; connect time: 0:00:00
03/11/2008 09:10:17 - Error bptm (pid=4772066) NBJM returned an extended error status: resource request failed (800)
03/11/2008 09:10:15 - requesting resource XXXXXX
03/11/2008 09:10:15 - Error nbjm (pid=5107866) NBU status: 800, EMM status: The host is not defined in EMM
03/11/2008 09:10:15 - Error nbjm (pid=5107866) NBU status: 800, EMM status: The host is not defined in EMM
03/11/2008 09:10:18 - Error bptm (pid=1904656) The following files/folders were not restored:
03/11/2008 09:10:22 - Error bptm (pid=1904656) more than 10 files were not restored, remaining ones are shown in the progress log.
03/11/2008 09:10:23 - restored from image XXXXXXXX_1191236706; restore time: 0:00:14
03/11/2008 09:10:25 - Warning bprd (pid=5132452) Restore must be resumed prior to first image expiration on Sun Sep 29 07:05:06 EDT 2013
03/11/2008 09:10:26 - end Restore; elapsed time 0:01:01
the restore failed to recover the requested files (5)

So here I sit, trying to figure out what the issue is that I am experiencing but I can't seem to get it nailed to anything in particular.  It seems that it is only happening on the upgrades though; these same boxes can be just installed with 6.5.1 (no upgrade) and it works fine.  Anyone else see this?  Anyone else run into the emm 800 errors?

--

Matthew MCP, MCSA, MCTS, OCA
rascal1981 AT gmail DOT com

Define Trouble:
Why did you apply THAT patch??....



--
Matthew MCP, MCSA, MCTS, OCA
rascal1981 AT gmail DOT com

Define Trouble:
Why did you apply THAT patch??....



--
Matthew MCP, MCSA, MCTS, OCA
rascal1981 AT gmail DOT com

Define Trouble:
Why did you apply THAT patch??....
_______________________________________________
Veritas-bu maillist  -  Veritas-bu AT mailman.eng.auburn DOT edu
http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu
<Prev in Thread] Current Thread [Next in Thread>