Networker

[Networker] Networker 7.3.3 on HACMP clustered AIX Host badly troubled

2008-01-14 02:17:37
Subject: [Networker] Networker 7.3.3 on HACMP clustered AIX Host badly troubled
From: bl02569 <networker-forum AT BACKUPCENTRAL DOT COM>
To: NETWORKER AT LISTSERV.TEMPLE DOT EDU
Date: Sat, 12 Jan 2008 05:51:37 -0800
We are running a Networker Power Edition Environment on a HACMP AIX Cluster, 
got about 180 Networker Clients to backup and are using a Falconstor VTL as 
well as a SUN STK L180 in this setup. Our Clients vary from RHEL4 over Windows 
2000 & Server2003 to some AIX Systems. We are running some 8 Storage Nodes and 
are saving approx. 2 TB per incremental Save and about 25 TB per Full Backup. 
Policys are Saturday through Thursday Incremental and Fridays Full.

Before we had the need to upgrade - out of some Version dependencies using NMO 
4.x (Oracle Clients) - from a 7.2 Installation to 7.3.2 we had a absolute 
stable and fine perfoming Networker Environment. Before that with the move form 
a Windows Networker Server to a IBM P5 under AIX a Networker Server in 2005 
performance and stability kind of where boosted and this Environment kept my 
job pleasant. Until 7.3.x and my focus to raise availablity of our Networker 
Environment by implmenting a HACMP Cluster came along. While struck by some 
instabilitys after this unfortunate upgrade to 7.3.2 EMC recommended an 
installationwide upgrade to 7.3.3 (all Clients and Servers). These actions took 
place in 09.2007. Ever since we are having massive problems with crashing 
nsrexecd's on the client side, crashing nsrjobd on the server side and even 
potential risk of data loss because of another - EMC committed - networker code 
error. And I don't wanna go into all the NMC Problems we've had and !
 are still having.

After some 25+ changes - where for instance we had to reconfigure the 
auth-method of all networker clients to "old-Auth", a bunch of memory tuning on 
the AIX systems had to be implemented aso. - the operational impacts (before 
almost daily all savegrps/clients hung with "contacting client") are lowered to 
greater intervals. These operational impacts forced us to stop the networker 
services, clear the /nsr/tmp directory on the server (with the consequence of 
loosing all the status info for the hanging savegrps's and therefore the need 
to manually put down the status of all savesets before doing cleanup; of course 
these infos where needed for savegrp restarts/start after stop/start of the 
networker services on the server) and restart the networker services to be 
enabled to restart the havoc't saves. With some 10 or so EMC provided new 
"hotfixed" binarys - of those at last only two where usable because all the 
other fixes lead to even new problems or did'nt fix anything or simp!
 ly dropped dead with first use  - the EMC troubleshooting performance with 
this was pretty poor too.

All this effort put in to these issues - here some 500+ hours, mostly at night 
time, on weekends and of couse on the holydays - lead to quite some frustation 
in our organisation as you can surely figure. We've got a quite extensive 
Networker support contract running with TIM AG in Wiesbaden who do Networker 
support for our company since 1997 or so. After the regular escalation via our 
supporter TIM and them popping one Issue after the other into EMC's powerlink 
with no solution in sight I've decided to give EMC a wake up call via punching 
the whole story into a customer feedback/satisfaction - and satisfied I was I 
can tell you - Web Form in late November 2007. At least this lead to the 
activation of the EMC Critical Response Team EMEA. Since then there was some 
progress made in terms of the the ability to lower operational efforts. The Key 
Problems though are'nt solved 'til now. The tec guys from TIM and one of the 
EMC critical reponse team technicians of EMC are doing quit!
 e a good job. Still the trouble shooting situation seems deadlocked and even a 
by me proposed switch to a other platform, os ...whatever helps did not lead to 
a EMC reaction that gives hope in some solution.

To worsten or to top the problems we have heard on monday that we are at risk 
of data loss because EMC told us that successfully saved data maybe cannot be 
restored because of another problem they are having with the 7.3.3 Code. 

Well all this forces one into some questions, does'nt it? The main question we 
have to raise is "why save data with networker - wich only is possible with all 
the extensive extra effort I have written about - if you can't recover data 
with networker if needed?" Another interesting part ... Is'nt EMC interested in 
Networker customers? Or does EMC for some sick reason wanna push sales of IBM 
TSM by bleeding out Networker Installations and so force customers to the 
competition?

My questions to the Forum are:

Does anyone else have a similar situation with EMC lacking to provide some 
solution for a problem for such a long time? You see I am not talking about all 
the other known but not documented Issues like misinterpretation of schedules, 
VSS misbehaviour, all NMC bugs aso.aso... just the absolute essential basics 
like no successfull backup or recovey, problem is known by EMC but simply not 
fixed for such a long time. You see I am not talking about all the other known 
but not documented Issues like misinterpretation of schedules, VSS 
misbehaviour, all NMC bugs aso.aso... just the absolute essential basics like 
no successfull backup or recovey, problem is known by EMC but simply not fixed 
for such a long time. I am also not talking about the chaos EMC has managed to 
keep up with there licensing and entitlement management... Boy o boy here ever 
since Legato was aquired by EMC they are so messed up there in powerlink.... I 
mean before the whole licensing and enabling thing never rea!
 lly was a piece a cake thing, or was it? But for some reason EMC managed to 
mess up all of the multicoporate Enterprise Licenses / entitlements in there 
Licensemangement. The result was that I am ever since the stored responsible 
person in the EMC License administration for the multicoporate enterprise 
"AVIVA Group plc." to wich Deltalloyd Germany who I work for belongs to. We 
have over and over troubleticketed this to EMC since September 2006!!! Do you 
believe that they still could'nt manage to correct this? Well thats a fact! 

Are there AIX Networker Installations in a comparable environment out there 
without any problems?

I really would be thankful for some Feedback...


Peter Handloser 

System Engineer Networking / Storage / Backup-Recovery 

Detalloyd Germany

+----------------------------------------------------------------------
|This was sent by peter.handloser AT deltalloyd DOT de via Backup Central.
|Forward SPAM to abuse AT backupcentral DOT com.
+----------------------------------------------------------------------

To sign off this list, send email to listserv AT listserv.temple DOT edu and 
type "signoff networker" in the body of the email. Please write to 
networker-request AT listserv.temple DOT edu if you have any problems with this 
list. You can access the archives at 
http://listserv.temple.edu/archives/networker.html or
via RSS at http://listserv.temple.edu/cgi-bin/wa?RSS&L=NETWORKER

<Prev in Thread] Current Thread [Next in Thread>
  • [Networker] Networker 7.3.3 on HACMP clustered AIX Host badly troubled, bl02569 <=