Information for my NBU colleagues.
We came across something in our environment about a month or
more ago where we had a very large number of jobs fail with various network
related issues. And it was random clients or random media servers. We
went thru and upgraded our backup infrastructure, built new media server etc
etc. All with no good results. Our issue started around the end of November
and anything you can think of we have tried. We eventually had a Symantec
Consultant come onsite for two weeks to find our issue, in conjunction with
Symantec support we FINALLY believe we found our culprit…and here it is…McAfee
McShield 8.5 patch 4. Broke us bad…real bad.
And simply disabling the services WILL NOT fix the issue, we
had to manually remove McAfee from all of our Media and the Master server.
Below is the tech note. The gentleman who found it,
reads these forums. So publicly I like to thank-you, really appreciate
your efforts.
McAfee
is supposed to have fixes for this issue.
3RD
PARTY: NetBackup Services randomly shut down.
Details:
Vendor/Product:
McAfee McShield 8.5 patch 3 or patch 4.
***NOTE*** If confirmation of this issue has occurred, it is Highly
Recommended for the customer to open a ticket directly with McAfee
Support for the latest update on how to handle this problem.
Detail/Symptom(s):
* NetBackup services randomly shut down including:
NetBackup
Resource Broker Service
NetBackup
Notification Service
NetBackup
Policy Execution Manager Service
NetBackup
Service Layer Service
* Active Jobs finish but tapes are not moved from drives back to slots
* Active Jobs which need to span media sit at "Waiting for next media:
Any"
* Queued Jobs do not go active
The above symptoms can happen once or twice per day and can occur on the
smallest or largest installations.
We have been able to associate these symptoms with the existence of McAfee
McShield 8.5 patch 3 or patch 4 running concurrently with NetBackup 6.0.
The NetBackup Services are shutting themselves down because the inter-process
sockets are being disconnected. The processes attempt to reconnect, but
are unable to do so and the services shut down.
Log Files:
12/20/07
01:53:56.227 137 PID:7184 TID:7556 [TAO] ACE_Select_Reactor_Notify::notify
[handle=0x1f8]: write to notification pipe handle failed: An existing
connection was forcibly closed by the remote host. (10054)
12/20/07
01:53:56.227 137 PID:7184 TID:7556 [TAO] sleep_hook failed: An existing
connection was forcibly closed by the remote host.
12/20/07
01:53:56.242 137 PID:7184 TID:920 [TAO] handle_notify_pipe_close - taking
action REOPEN
12/20/07
01:54:17.336 137 PID:7184 TID:920 [TAO] handle_notify_pipe_close: failed to
re-open notification pipe: A connection attempt failed because the connected
party did not properly respond after a period of time, or established
connection failed because connected host has failed to respond.
Workaround:
There several workarounds which can be implemented:
1. Uninstall McAfee McShield 8.5 and reboot. (simply stopping McShield services
is not sufficient)
2. Roll-back to McAfee McShield 8.0.
3. McAfee recommends:
A.
Opening a ticket with McAfee Support on this issue.
B.
Renaming this driver file MFETDIK.sys and rebooting. The lost
functionality is, Port Blocking access protection rules, and identification of
Source IP address for a remote attacker.
4.
Implement these excludes (mixed success with this potential workaround)
McAfee McShield can exclude by directory structure or by process - its up to
the customer:
Good
luck all and figured I’d pass this along in the event anyone out there is
running NBU 6.0 and MacAfee…
Dan
Cruice.