Veritas-bu

[Veritas-bu] Problems since upgrading master server hardware (NB 4.5, MP3, Solaris8)

2003-01-21 21:55:16
Subject: [Veritas-bu] Problems since upgrading master server hardware (NB 4.5, MP3, Solaris8)
From: Len.Boyle AT sas DOT com (Len Boyle)
Date: Tue, 21 Jan 2003 21:55:16 -0500
Hello Suzi
 
You might want to     issue the comman netstat -a and look for a large number 
of sockets in a close or timed wait status. 
If so you might you might be seeing the problem listed in the following 
technote. It is a problem that we ran into after an hardware upgrade. 
 
len boyle
 
=============================================================================================

Symptom: 

When performing backups, or restores, socket errors are being produced.         

Exact Error Message: 

Status codes 13, 14, 23, 24, 25 may occur. 13 = file read failed, 14 = file 
write failed, 23 = socket read failed, 24 = socket write failed, 25 = cannot 
connect on socket 

Solution: 

Possibly the NetBackup server is getting a lot of traffic from clients; 
therefore, sockets are not becoming free soon enough to satisfy the demand.

For Solaris 2.6 or previous use the following command:
ndd -get /dev/tcp tcp_close_wait_interval

For Solaris 7 or above use the following command:
ndd -get /dev/tcp tcp_time_wait_interval

For HP-UX 11 use the following command:
ndd -get /dev/tcp tcp_time_wait_interval
 (NOTE:  The equivalent command on HP-UX 10 is "nettune" instead of "ndd".) 

These commands will produce a large number, like the default 240000 (value is 
in milliseconds, so 240 seconds or 4 minutes).  This is the amount of time to 
wait after a socket is closed before it can be reused.  In most cases this can 
be shortened to about 1 second (1000) and may alleviate the problem. 

The command to set it to 1000 on Solaris 2.6 and previous versions is:
ndd -set /dev/tcp tcp_close_wait_interval 1000

The command to set it to 1000 on Solaris 7 and later versions is:
ndd -set /dev/tcp tcp_time_wait_interval 1000

or 
The command to set it to 1000 on HP-UX 11 is:
ndd -set /dev/tcp tcp_time_wait_interval 1000

The "ndd" command makes the change immediately, without a need for a reboot.  
This setting will go back to default after a reboot. To make it set to this 
value after each reboot, the command can be added to the appropriate TCP/IP 
startup script. On Solaris, this is /etc/rc2.d/S69inet or on HP-UX 11 see 
/etc/rc.config.d/nddconf for examples of how to set it.




Acknowledgments: 

Numerous prior cases 

  _____  

        -----Original Message----- 
        From: Suzi Archer [mailto:sarcher AT connect.com DOT au] 
        Sent: Tue 1/21/2003 8:44 PM 
        To: veritas-bu AT mailman.eng.auburn DOT edu 
        Cc: 
        Subject: [Veritas-bu] Problems since upgrading master server hardware 
(NB 4.5, MP3, Solaris8)
        
        

        Hi,
        
        We upgraded our hardware on our master server about a week ago, and 
we've had no
        end of grief ever since. (note that we upgraded from 3.4 to 4.5 at 
essentially the same time)
        
        The errors we are getting are mostly:
        
        Subject: Backup on fred - 134 started
        Status = unable to process request because the server resources are 
busy.
        
        but closely followed by:
        
        Subject: Backup on fred - 24 started
        Status = socket write failed.
        
        Subject: Backup on fred - 40 started
        Status = network connection broken.
        
        Subject: Backup on fred - 49 started
        Status = client did not start.
        
        Subject: Backup on fred - 13 started
        Status = file read failed.
        
        Subject: Backup on fred - 228 started
        Status = unable to process request.
        
        And we are talking in the hundreds of messages/trys per day
        (3500 odd since last wednesday)
        And appears to be affecting all clients (sol2.6 and Sol6)
        
        Some backups end up completing after many tries, some simply fall out 
of their
        window and fail.
        
        
        Our hardware chaged from:
        Sun Microsystems 280R, 2 x UltraSparcIII 750MHz, 2GB ram, GB eth
        To:
        Sun Microsystems 420R, 4 X UltraSPARC-II 450MHz, 4GB ram, GB eth
        
        IBM 6 Drive anaconda robot (LTO) attached by fibre to the same machine
        
        Could there possibly be some setting that are uber important that i 
missed ?
        
        The things I can think of are (and match the previous server):
        
        bob[/]# head -15 /etc/system
        *ident  "@(#)system     1.18    97/06/27 SMI" /* SVR4 1.5 */
        *
        * SYSTEM SPECIFICATION FILE
        *
        set shmsys:shminfo_shmmax=4294967295
        set shmsys:shminfo_shmmin=1
        set shmsys:shminfo_shmmni=512
        set shmsys:shminfo_shmseg=100
        set semsys:seminfo_semmns=500
        set semsys:seminfo_semmni=150
        set semsys:seminfo_semmap=350
        set semsys:seminfo_semmnu=700
        set semsys:seminfo_semume=100
        set semsys:seminfo_semmsl=100
        
        bob[/]# cat /usr/openv/netbackup/db/config/NUMBER_DATA_BUFFERS
        16
        bob[/]# cat /usr/openv/netbackup/db/config/SIZE_DATA_BUFFERS
        131072
        
        load on the server is not particularly high, plenty of memory and swap 
free,
        ethernet interface not saturated by far
        Is there a list of what resources available that NB coulld possibly be 
referring to when it claims
        "server resources are busy"
        As usual, veritas knowledgebase was useless :(
        
        Anyone spotting anything dead obvious here, or know where else i should 
look ?
        
        TIA
        
        -suz
        
        --
        Suzi Archer               @connect.com.au          /"\
        Systems Group             180-188 Burnley Street   \ /  ASCII Ribbon 
Campaign
        sarcher AT connect.com DOT au    Richmond                  X   Against 
HTML Mail
        (03) 8686 2321            VIC,  3121               / \
                ** Note Address and Phone number changed on Nov 1 **
        _______________________________________________
        Veritas-bu maillist  -  Veritas-bu AT mailman.eng.auburn DOT edu
        http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu
        



<Prev in Thread] Current Thread [Next in Thread>