Amanda-Users

Problems with HP SureStore VS80e (DLT1)

2004-11-25 08:51:08
Subject: Problems with HP SureStore VS80e (DLT1)
From: Andreas Haumer <andreas AT xss.co DOT at>
To: amanda-users AT amanda DOT org
Date: Thu, 25 Nov 2004 14:22:42 +0100
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hi!

I have nasty problems with a HP SureStore VS80e DLT1 drive
which is used as backup tape drive by amanda and I'm
running out of ideas...

The situation: in March 2004 I installed a new HP
SureStore VS80e DLT1 backup drive to be used by Amanda-2.4.4p2
to store backups of a Linux file- and mailserver on a regular basis.
It was decided to do full backups on each run, so on each run there
get about 40-50GB of data dumped to the tape.

The tape drive is connected to the Linux fileserver.
The fileserver has a dual U320 Fusion MTP SCSI controller:

02:04.0 SCSI storage controller: LSI Logic / Symbios Logic 53c1030 (rev 07)
02:04.1 SCSI storage controller: LSI Logic / Symbios Logic 53c1030 (rev 07)

On the first channel there are four Seagate ST336607LC drives
in a software RAID5 configuration, the DLT1 drive is connected
on the second SCSI channel and is the only device there. The
drive uses it's built-in active termination to terminate the
external SCSI bus.

The software RAID5 delivers a throughput of about 70MB/s on
sequential write and about 90MB/s on sequential read. The system
runs rock solid since march 2004 and delivers good performance
as fileserver (1GB of RAM, single Intel Xeon 2.4GHz CPU with HT)

I tried to optimize the tape throughput by increasing buffer size.
So I set the buffer of the Linux st driver to 512k using the
following option in /etc/modules.conf:

options st buffer_kbs=512

This should make sure that the SCSI tape driver has enough buffer
space to store the data before it get's transferred to the drive.

I use the following Amanda tapetype definition to have a tape
buffer size of 128kB. The HP docs say that the tape block size
should be at least 64kB so I decided to use 128kB (Amanda was
compiled by myself with the "--with-maxtapeblocksize=1024"
configure-option)

define tapetype DLT1 {
    comment "DLT1 VS80"
    length 80 gbytes
    filemark 1 byte
    speed  4 mbytes
    blocksize 128 kbytes
}

In amanda.conf I set tapebufs to 64 to have Amanda allocate
a buffer for the "taper" with a size of about 2MB or about
16 tape blocks

We have about 20 DLT1 cartridges which get rotated manually
and we insert the cleaning cartridge every friday. No cartridge
is older than 9 months.

We use the drive's hardware compression function, and depending
on the filetypes there seem to fit about 50GB on the DLT1 tape
(40GB native)

We also have a holding disk on the RAID5 array which is used
by Amanda to store the backups of the mail- and fileserver
before they get dumped to tape.

We currently have 18 DLE's in our disklist and the average
throughput for the "taper" is about 3.5 MB/s with a minimum
of about 2.9MB/s and a maximum of about 5MB/s, depending on
size and type of the DLE. I think, these are typical values
for a mixture of typical files (E-Mail, Word- and Excel-documents,
PDF files, JPEG images, user-profiles, etc)
We have DDS4 drives on other installations which show the same
values, and we also have LTO drives which show a throughput of
about 8-10MB/s. According to the specifications of the drives,
this is what I'd expected.

"amstatus" for a typical backup run shows the following:

SUMMARY          part      real  estimated
                           size       size
partition       :  18
estimated       :  18             45960630k
flush           :   0         0k
failed          :   0                    0k           (  0.00%)
wait for dumping:   0                    0k           (  0.00%)
dumping to tape :   0                    0k           (  0.00%)
dumping         :   0         0k         0k (  0.00%) (  0.00%)
dumped          :  18  45892830k  45960630k ( 99.85%) ( 99.85%)
wait for writing:   0         0k         0k (  0.00%) (  0.00%)
wait to flush   :   0         0k         0k (100.00%) (  0.00%)
writing to tape :   0         0k         0k (  0.00%) (  0.00%)
failed to tape  :   0         0k         0k (  0.00%) (  0.00%)
taped           :  18  45892830k  45960630k ( 99.85%) ( 99.85%)
  tape 1        :  18  45892830k  45960630k ( 54.71%) MO1
10 dumpers idle : not-idle
taper idle
network free kps:    215040
holding space   :  47440472k (100.00%)
 dumper0 busy   :  1:09:35  ( 29.51%)
 dumper1 busy   :  0:33:52  ( 14.36%)
 dumper2 busy   :  0:04:25  (  1.87%)
 dumper3 busy   :  0:00:41  (  0.29%)
 dumper4 busy   :  0:00:32  (  0.23%)
 dumper5 busy   :  0:02:42  (  1.14%)
 dumper6 busy   :  0:03:13  (  1.37%)
   taper busy   :  3:39:41  ( 93.15%)
 0 dumpers busy :  2:46:13  ( 70.48%)            not-idle:  2:46:13  (100.00%)
 1 dumper busy  :  0:35:43  ( 15.15%)  client-constrained:  0:32:22  ( 90.66%)
                                               start-wait:  0:01:59  (  5.60%)
                                                 not-idle:  0:01:20  (  3.75%)
 2 dumpers busy :  0:29:28  ( 12.50%)  client-constrained:  0:28:43  ( 97.46%)
                                               start-wait:  0:00:44  (  2.54%)
 3 dumpers busy :  0:00:41  (  0.29%)  client-constrained:  0:00:41  (100.00%)
 4 dumpers busy :  0:01:01  (  0.44%)  client-constrained:  0:00:46  ( 75.73%)
                                               start-wait:  0:00:15  ( 24.27%)
 5 dumpers busy :  0:02:09  (  0.91%)  client-constrained:  0:02:09  (100.00%)
 6 dumpers busy :  0:00:20  (  0.15%)          start-wait:  0:00:17  ( 85.85%)
                                       client-constrained:  0:00:02  ( 14.15%)
 7 dumpers busy :  0:00:11  (  0.08%)          start-wait:  0:00:11  (100.00%)


This setup worked fine for about 5 months. Then the backups
began to fail every now and then for no apparent reason and
the SCSI driver reported SCSI write errors like that:

SCSI Error: (1:5:0) Status=02h (CHECK CONDITION)
 Key=3h (MEDIUM ERROR); FRU=00h
 ASC/ASCQ=0Ch/00h ""
 CDB: 0A 00 01 00 00 00 - "WRITE(6)"

st0: Error with sense data: Info fld=0x0, Deferred st09:00: sense key Medium 
Error
Additional sense indicates Write error
st0: Error with sense data: Info fld=0x0, Current st09:00: sense key Medium 
Error
Additional sense indicates Write error
st0: Error on write filemark.


The situation quickly became worse and at some point the tape drive refused
to eject the tape cartridge. We called the HP support and they replaced the
drive without problems. With the new drive, the nightly backup again started
to work fine.

After about 2 weeks, the same errors occured again. The nightly
backup runs began to fail up to the point where the drive refused
to eject the cartridge. So, HP replaced the drive again and we
started to use our third drive in about 7 months of usage.

The third tape drive worked well for about 2 months, but is
now starting to show the same errors as the two drives we had
before and I'm afraid we have to replace this drive as well.
It looks like we would have to replace the drive more often than
the cartridges. I don't know what the HP support guy will tell
me on that... :-(

It's a mystery to me. I can not believe that three tape drives
in a row are that bad. On the other hand it looks like the rest
of the setup (SCSI bus, termination, amanda configuration) is ok
because in the beginning this setup works fine for weeks and
even months, nothing in this setup is changed and still after some
time the tape suddenly begins to show errors.

As mentioned we have several Amanda installations with different
tape drives and they all work fine. I never saw such a problem
there.

Questions:
*) Any comments on my backup configuration?
*) Any comments on the symptoms I see?
*) Does anyone have similar experience with this drive model?
*) Is it possible that any detail in my configuration does wear
   out the drive so it gets broken in a short period of time?
*) Any other suggestion?

Any help is appreciated!

Regards,

- - andreas

- --
Andreas Haumer                     | mailto:andreas AT xss.co DOT at
*x Software + Systeme              | http://www.xss.co.at/
Karmarschgasse 51/2/20             | Tel: +43-1-6060114-0
A-1100 Vienna, Austria             | Fax: +43-1-6060114-71
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.1 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFBpdyfxJmyeGcXPhERAtgMAJ9QNP+duJpBPfRbOq5Xxil/TyJE9QCeL6rw
43DTQvdQL4huwjIkEEBf3C8=
=AcAT
-----END PGP SIGNATURE-----


<Prev in Thread] Current Thread [Next in Thread>