Veritas-bu

Re: [Veritas-bu] Digging into tape performance, long delays on duplications

2010-05-19 16:19:09
Subject: Re: [Veritas-bu] Digging into tape performance, long delays on duplications
From: Jim VandeVegt <Jim.Vandevegt AT physiciansmutual DOT com>
To: "veritas-bu AT mailman.eng.auburn DOT edu" <veritas-bu AT mailman.eng.auburn DOT edu>
Date: Wed, 19 May 2010 15:19:01 -0500
NUMBER_DATA_BUFFERS=64
SIZE_DATA_BUFFERS=262144
 
Drive firmware is I24Z
 
I may have found the biggest problem. Continuing to look at the switch port statistics, I found one drive was accumulating a few receive and decode errors. And I caught it twice go offline on the switch port. I reseated all the fiber on its connection to the switch and it has run constant since. Will continue to look for further problems. The 'blocked' issue has plagued the other 4 drives as well.
------------------

Jim VandeVegt | Technical Integrator, ETG


From: Kevin Corley [Kevin.Corley AT apollogrp DOT edu]
Sent: Wednesday, May 19, 2010 15:03
To: Jim VandeVegt
Subject: RE: [Veritas-bu] Digging into tape performance, long delays on duplications

What are you NUMBER_DATA_BUFFERS and SIZE_DATA_BUFFERS files set to? Also, what firmware on the tape drives are you using?

 

How long do the 100% busy periods last?

 

From: veritas-bu-bounces AT mailman.eng.auburn DOT edu [mailto:veritas-bu-bounces AT mailman.eng.auburn DOT edu] On Behalf Of Jim VandeVegt
Sent: Wednesday, May 19, 2010 12:27 PM
To: veritas-bu AT mailman.eng.auburn DOT edu
Subject: [Veritas-bu] Digging into tape performance, long delays on duplications

 

Just implemented HP LTO5 tape drives, although I have seen this behavior on the older LTO2 drives as well.

 

8gbps Qlogic switch. NetBackup 6.5.6 w/ vault

 

Sun T5240 master/media, Solaris 10 with last Thursday's recommended patch set, 4Gbps HBA (getting an 8 soon).

 

The main monitoring tool I have at my disposal is 'iostat -xn' on Solaris. I like to use a 6-second interval.

 

I also consult the switch port statistics which do not show an increasing error count.

 

When doing tape-to-tape duplication, the tape drive devices will show periods of data movement (typically 15-350 MB/s depending on the size of the image file it is working on) and then several successive (iostat) intervals of zero throughput where the %b column reads a constant 100% blocked. During this time it also shows 1.0 under the actv column. Zero everywhere else. The NetBackup it also shows the amount of data transferred as constant. It is these periods I'm trying to understand.

 

Specifically, how can I dig into this? Will syslog changes get more information on the topic? NetBackup logs to increase verbosity? dtrace?

 

Is it tape backhitching? (Seems to be far too long a time period.)

 

On a few of the occasions, after a few minutes a message like this shows up in /var/adm/messages:

scsi: [ID 107833 kern.warning] WARNING: /pci@400/pci@0/pci@d/SUNW,qlc@0/fp@0,0/st@w500308c0a0c8b001,0 (st4):

SCSI transport failed: reason 'timeout': giving up

(This seems to indicate a command got dropped somewhere.)

on most occasions after a short time period, 15-60 seconds maybe, the data just starts going again.

 

Whatever it is, it seems to really kill the overall duplication rate. Thanks,

------------------

Jim VandeVegt | Technical Integrator, ETG

Physicians Mutual | 2600 Dodge Street | Omaha, NE 68131

402.930.2649 | PhysiciansMutual.com | Jim.VandeVegt AT PhysiciansMutual DOT com

 

Insurance for all of us.™

health | life | retirement

 

 
____________________________________________________________
This message and any attachments are confidential, may contain privileged
information, and are intended solely for the recipient named above.
If you are not the intended recipient, or a person responsible for
delivery to the named recipient, you are notified that any review,
distribution, dissemination or copying is prohibited.  If you have
received this message in error, you should notify the sender by return
email and delete the message from your computer system.
 


This message is private and confidential. If you have received it in error, please notify the sender and remove it from your system.

____________________________________________________________
This message and any attachments are confidential, may contain privileged
information, and are intended solely for the recipient named above.
If you are not the intended recipient, or a person responsible for
delivery to the named recipient, you are notified that any review,
distribution, dissemination or copying is prohibited.  If you have
received this message in error, you should notify the sender by return
email and delete the message from your computer system.

_______________________________________________
Veritas-bu maillist  -  Veritas-bu AT mailman.eng.auburn DOT edu
http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu
<Prev in Thread] Current Thread [Next in Thread>
  • Re: [Veritas-bu] Digging into tape performance, long delays on duplications, Jim VandeVegt <=