Veritas-bu

[Veritas-bu] Tru64 NetBackup Performance

2010-03-09 14:12:03
Subject: [Veritas-bu] Tru64 NetBackup Performance
From: Heathe Yeakley <hkyeakley AT gmail DOT com>
To: Veritas-bu AT mailman.eng.auburn DOT edu
Date: Tue, 9 Mar 2010 13:11:49 -0600
--== Warning: Wall of text incoming ==--

I have a NetBackup environment consisting of:

-= Local Site =-
1 Red Hat Linux AS 4 Master running NBU 6.0 MP7
2 Red Hat Linux AS 4 Media Servers running NBU 6.0 MP7
3 Tru64 V5.1B (Rev. 2650) SAN Media running NBU 6.0 MP7 (Mix between
O/S patch kit 6 and 7)
1 Spectra Logic T380 with 12 IBM LTO4 drives running latest BlueScale
patches and drive firmware.
1 NetApp 1400 VTL running latest firmware.

-= DR Site =-
1 Red Hat Linux AS 4 Master running NBU 6.0 MP7
1 Tru64 V5.1B (Rev. 2650) SAN Media running NBU 6.0 MP7
1 Spectra Logic T200 with 12 IBM LTO4 drives running latest BlueScale
patches and drive firmware.

Last July we replaced our ADIC i2000 library (LTO2 drives) with a
Spectra Logic T380. Once we got the library deployed we noticed that
our Linux systems are able to write to the library at LTO4 speeds, and
the regular network clients even get decent throughput over a 1gb
ethernet network. But the 3 Tru64 SAN media servers absolutely crawl.
In spite of the fact that I have the SAN media server license
installed, I can only get about 10 - 20 MB/s on the policies using the
Tru64 storage units.

Our main production database sits on a GS1280 (30 CPUS ,114 GB
memory), and we have a ES80 attached to another Spectra Logic library
at our DR site. Every Sunday morning, I backup an RMAN backup to tape,
mail the tapes to my DR site, and restore the RMAN files using a
Spectra Logic T200 attached to the ES80, which also has the SAN Media
Server software installed. My GS1280 system takes 15-20 hours to
backup, but my DR system can restore the same files in 6-7 hours
running at 80 - 110 MB/s. I'm completely baffled how the smaller
system gets such awesome throughput while my production box plods
along at sub-ethernet speeds.

I've spent the past several months researching performance and tuning
suggestions and I've applied settings 1 at a time when I can get an
outage.

To speed up testing, we have another GS1280 with 1/2 the CPU and
memory as the production system, and it only runs test databases, so
it's easier to ask to reboot it if I want to try tuning a particular
kernel parm or what not. I installed the SAN media server software on
this second 1280 and I've been trying to tune it to NetBackup for the
last couple of months.

Within NetBackup, I've tuned the Size and Number of data buffers, and
it has no visible effect.

I've used the hwmgr command to look at the driver and firmware level
of just about every piece of equipment on both systems, up to and
including the individual busses. The GS1280 has everything the ES80
does, it just has more of it.

I've verified HBA drivers on all boxes and all appear to be at the
latest firmware.

I've asked my SAN guys to double check the zoning, LUN masking,
configuration and firmware levels on the SAN switches here and at my
DR site to see if there's anything that might be preventing Tru64 from
writing to either of my libraries at SAN speeds. They have checked and
everything seems to be in order on both SAN environments. Furthermore,
I've asked them to look at port utilization on the SAN switches during
test backups from the 1280 and they tell me that the HBAs are hardly
being utilized.

We recently deployed a NetApp VTL, and I was curious if perhaps the
VTL got better performance (which would indicate some type of
incompatibility between Tru64 and Spectra Logic). There isn't one that
I can find. If I setup a test policy to write to the VTL from my test
GS1280 and let it write to all 80 virtual drives, no one stream
exceeds about 10 - 20 MB/s.

Next, I looked at the fragmentation level of the AdvFS domains on both
systems. While some are heavily fragmented, the I/O performance on
both systems is 100% for every file domain I've checked.

The fact that all my clients (Windows, Linux and the handful of
Solaris 10) work well with both libraries makes me think that this is
something in Tru64. If that's true, then I'm trying to figure out what
is set correctly on my DR ES 80 that's jacked up on my local 1280.

According to section 1.9 of the Tru64 tuning manual
(http://h30097.www3.hp.com/docs/base_doc/DOCUMENTATION/V51B_HTML/ARH9GCTE/TITLE.HTM)
the 5 most commonly tuned kernel subsystems are: vm, ipc, proc, inet,
and socket. Furthermore,
http://seer.entsupport.symantec.com/docs/235845.htm is a technote
advising Tru64 kernel changes for NetBackup. I have examined the
values across all my systems. In most cases, the values on both
systems meet or exceed tuning suggestions I have found in manufacturer
documentation. The two or three values I have tuned so far have had no
effect.

http://www.scribd.com/doc/19213788/Net-Backup-6
I found this TechNote which recommends setting the sem_mni and sem_msl
values to 1,000. sem_msl is currently set to 500 on my local 1280, and
I think this is perhaps the only kernel parm I have yet to tune. I'm
going to ask for an outage this week to increase this setting to
1,000. If that doesn't work, then I believe I will be officially
stumped.

I've also watched the EVM channels and the binary error log and
haven't seen anything alarming. The tape drives aren't throwing errors
and appear to be working fine.

This is leading me to believe that there is something not tuned
correctly between the Tru64 O/S and the NetBackup client. If it's not
in the kernel then I simply don't know where else to look.

I'll be posting this to the NetBackup forums on Symantec.com, the ITRC
forums on HP.com and the NetBackup mailing list.

Can anyone think of any stone I've left unturned? Thanks.
_______________________________________________
Veritas-bu maillist  -  Veritas-bu AT mailman.eng.auburn DOT edu
http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu

<Prev in Thread] Current Thread [Next in Thread>