Veritas-bu

[Veritas-bu] Backups slow to a crawl

2005-03-23 13:32:27
Subject: [Veritas-bu] Backups slow to a crawl
From: jeffm AT nicusa DOT com (Jeff McCombs)
Date: Wed, 23 Mar 2005 13:32:27 -0500
Yeah, I originally thought that this might be a network problem myself.
However I have checked the network settings on the Sun systems and the Cisco
switches in-between. I'm even forcing a 100FDX on the switch and system just
to be safe (auto negotiation never works, regardless of what the vendors
say)

Seems that this is a MPX thing. I did some further testing and backing up
systems without multiplexing enabled, and the problem goes away. The rmt/1
device stops with the 100% busy and 0 kw/s, client full backups drop back
down into the 15 minute range...




On 3/23/05 10:56 AM, "Jorgensen, Bill" <Bill_Jorgensen AT csgsystems DOT com>
wrote:

> Jeff:
> 
> A few things to consider (assuming a Sun sever as the NBU master):
> 
> 1.) Are you aware of anything that has changed on your NBU server?
> 2.) Are you aware of anything that has changed with your network?
> (Providing you are doing Ethernet-based backups. If not, what about the
> SAN?)
> 3.) Are you aware of any changes to the policies?
> 
> If no to the above try the following:
> 
> 1.) Find out what Veritas recommends for your environment for these two
> variables:
> NUMBER_DATA_BUFFERS
> SIZE_DATA_BUFFERS
> These are found in /usr/openv/netbackup/db/config. They may not give
> them to you if you open a ticket with the solution center (Professional
> Services). Ask around if they do not.
> 
> 2.) Check the network driver settings for a few things. This depends on
> the network type you are using. 100Mb-switched, 10Mb-switched, etc.
> 
> root[prod-backup:/]# ndd -get /dev/qfe adv_autoneg_cap
> 1
> root[prod-backup:/]# ndd -get /dev/qfe adv_100hdx_cap
> 1
> root[prod-backup:/]# ndd -get /dev/qfe adv_100fdx_cap
> 1
> What the output above is stating is that the qfe driver is set at 100
> half and full duplex, and autonegotiate. Once you know how the network
> driver is configured go to your network guys and ask them to see how the
> port on the switch is configured (unless you are the network guy). If
> the port is NOT set to 100-full or autonegotiate have them set it
> accordingly.
> 
> 3.) Reseat the RJ-45 connectors for the physical connections.
> 
> These are some things that have bit us in the past.
> 
> Good luck,
> 
> Bill
> 
> --------------------------------------------------------
>      Bill Jorgensen
>      CSG Systems, Inc.
>      (w) 303.200.3282
>      (p) 303.947.9733
> --------------------------------------------------------
>      UNIX... Spoken with hushed and
>      reverent tones.
> --------------------------------------------------------
> 
> -----Original Message-----
> From: veritas-bu-admin AT mailman.eng.auburn DOT edu
> [mailto:veritas-bu-admin AT mailman.eng.auburn DOT edu] On Behalf Of Jeff
> McCombs
> Sent: Wednesday, March 23, 2005 6:51 AM
> To: veritas-bu AT mailman.eng.auburn DOT edu
> Subject: [Veritas-bu] Backups slow to a crawl
> 
> Gurus,
> 
>     NB 5.0 MP4, single combination media/master server, Solaris 9.
> Overland
> Neo 2000 26-slot 2 drive DLT.
> 
>     I'm noticing that for some reason or another, all of my client
> backups
> have slowed to a _crawl_. A _cumulative_ (!) backup of local disk on a
> Sun
> V100 is taking somewhere on the order of 2 hours at this point, and with
> over 40 systems, I'm blowing past my window  consistently.
> 
>     I'm not quite sure what's going on here, but as I sit and watch the
> output from 'iostat', I'm noticing that rmt/1 (the 2nd drive in the Neo)
> is
> fluxuating between 100% busy, with kw/s at close to zero, and busy @
> 1-15%
> and kw/s up into the 1000's.
> 
>     rmt/0 seems to be fine, kw/s sits consistently up in the 1.8-2K
> range,
> while busy is anywhere from 2% - 30% on average. My other disks aren't
> working hard, CPU isn't loaded and I've got plenty of memory.
> 
>     The policy I'm using allows for multiple datastreams, no limits on
> jobs,
> and most schedules allow for an MPX of 2. I'm backing up
> ALL_LOCAL_DRIVES on
> all clients, and I'm not using any NEW_STREAM directives. I'm not seeing
> any
> errors on the media either.
> 
>     Can anyone shed some light on what might be happening here? Am I
> looking
> at a drive that might be having some problems, or am I barking up the
> wrong
> tree, and it's something else entirely?
> 
>     A small sample of iostat output covering the affected devices is
> below.
> 
> sample (extra disks removed from putput);
> root@backup(pts/1):~# iostat -nx 1 100
>                     extended device statistics
>     r/s    w/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
>     0.0    4.1    0.0  252.2  0.0  0.0    0.0    5.9   0   2 rmt/0
>     0.0    4.6    0.0  278.4  0.0  0.1    0.0   27.3   0  12 rmt/1
> 
>     0.0    4.1    0.0  252.3  0.0  0.0    0.0    5.9   0   2 rmt/0
>     0.0    4.6    0.0  278.4  0.0  0.1    0.0   27.3   0  12 rmt/1
> 
>     0.0   33.0    0.0 2076.4  0.0  0.2    0.0    5.8   0  19 rmt/0
>     0.0    2.0    0.0  125.8  0.0  1.0    0.0  490.0   0  98 rmt/1
> 
>     0.0   38.0    0.0 2394.0  0.0  0.2    0.0    5.4   0  21 rmt/0
>     0.0    8.0    0.0  504.0  0.0  1.0    0.0  124.9   0 100 rmt/1
> 
>     0.0   27.0    0.0 1701.1  0.0  0.2    0.0    6.5   0  17 rmt/0
>     0.0    2.0    0.0  126.0  0.0  1.0    0.0  499.9   0 100 rmt/1
> 
>     0.0   33.0    0.0 2078.9  0.0  0.2    0.0    5.3   0  18 rmt/0
>     0.0    0.0    0.0    0.0  0.0  1.0    0.0    0.0   0 100 rmt/1
> 
>     0.0   16.0    0.0 1008.0  0.0  0.1    0.0    6.2   0  10 rmt/0
>     0.0   13.0    0.0  819.0  0.0  0.6    0.0   48.4   0  63 rmt/1
> 
>     0.0   40.0    0.0 2520.1  0.0  0.2    0.0    5.9   0  24 rmt/0
>     0.0    0.0    0.0    0.0  0.0  1.0    0.0    0.0   0 100 rmt/1
> 
>     0.0   33.0    0.0 2078.9  0.0  0.2    0.0    5.3   0  18 rmt/0
>     0.0   10.0    0.0  630.0  0.0  1.0    0.0   99.9   0 100 rmt/1
> 

-- 
Jeff McCombs                 |                                    NIC, Inc
Systems Administrator        |                       http://www.nicusa.com
jeffm AT nicusa DOT com             |                                NASDAQ: 
EGOV
Phone: (703) 909-3277        |        "NIC - the People Behind eGovernment"
--
    "My favorite thing about the internet, is that you get to go into
     the private world of real creeps without having to smell them."
            - Penn Jillett.