Information
for my NBU colleagues.
We
came across something in our environment about a month or more ago where we had
a very large number of jobs fail with various network related issues. And
it was random clients or random media servers. We went thru and upgraded
our backup infrastructure, built new media server etc etc. All with no
good results. Our issue started around the end of November and anything
you can think of we have tried. We eventually had a Symantec Consultant
come onsite for two weeks to find our issue, in conjunction with Symantec
support we FINALLY believe we found our culprit…and here it is…McAfee
McShield 8.5 patch 4. Broke us bad…real bad.
And
simply disabling the services WILL NOT fix the issue, we had to manually remove
McAfee from all of our Media and the Master server.
Below
is the tech note. The gentleman who found it, reads these forums.
So publicly I like to thank-you, really appreciate your efforts.
McAfee is supposed to have fixes for this
issue.
3RD PARTY: NetBackup Services randomly
shut down.
Details:
Vendor/Product:
McAfee McShield 8.5 patch 3 or patch 4.
***NOTE*** If confirmation of this issue has occurred, it is Highly Recommended for the customer to
open a ticket directly with McAfee Support for the latest update on how to
handle this problem.
Detail/Symptom(s):
* NetBackup services randomly shut down including:
NetBackup Resource Broker Service
NetBackup Notification Service
NetBackup Policy Execution Manager Service
NetBackup Service Layer Service
* Active Jobs finish but tapes are not moved from drives back to slots
* Active Jobs which need to span media sit at "Waiting for next media:
Any"
* Queued Jobs do not go active
The above symptoms can happen once or twice per day and can occur on the
smallest or largest installations.
We have been able to associate these symptoms with the existence of McAfee
McShield 8.5 patch 3 or patch 4 running concurrently with NetBackup 6.0.
The NetBackup Services are shutting themselves down because the inter-process
sockets are being disconnected. The processes attempt to reconnect, but
are unable to do so and the services shut down.
Log Files:
12/20/07 01:53:56.227 137 PID:7184
TID:7556 [TAO] ACE_Select_Reactor_Notify::notify [handle=0x1f8]: write to
notification pipe handle failed: An existing connection was forcibly closed by
the remote host. (10054)
12/20/07 01:53:56.227 137 PID:7184
TID:7556 [TAO] sleep_hook failed: An existing connection was forcibly closed by
the remote host.
12/20/07 01:53:56.242 137 PID:7184
TID:920 [TAO] handle_notify_pipe_close - taking action REOPEN
12/20/07 01:54:17.336 137 PID:7184
TID:920 [TAO] handle_notify_pipe_close: failed to re-open notification pipe: A
connection attempt failed because the connected party did not properly respond
after a period of time, or established connection failed because connected host
has failed to respond.
Workaround:
There several workarounds which can be implemented:
1. Uninstall McAfee McShield 8.5 and reboot. (simply stopping McShield services
is not sufficient)
2. Roll-back to McAfee McShield 8.0.
3. McAfee recommends:
A. Opening a ticket with McAfee Support
on this issue.
B. Renaming this driver file MFETDIK.sys
and rebooting. The lost functionality is, Port Blocking access protection
rules, and identification of Source IP address for a remote attacker.
4. Implement these excludes (mixed
success with this potential workaround)
McAfee McShield can exclude by directory structure or by process - its up to
the customer:
Good luck all and figured I’d pass
this along in the event anyone out there is running NBU 6.0 and MacAfee…
Dan Cruice.