Re: Strange NAS to Surestore backup behaviour
2005-08-10 23:43:29
--On Thursday, August 11, 2005 11:52:10 +1000 "Keenan, Greg John (Greg)** CTR
**" <gjkeenan AT lucent DOT com> wrote:
> Hi,
>
> I'm having inconsistant problems with the backups of a NAS device. This
> backup uses the amanda-netapp-dump-0.1setuidump/dump utils.
>
> I apologise for the size of this email but I'm hoping someone with a
> similar setup might have some pointers, opinions, guesses...
>
> FreeBSD 2.0
> Amanda 2.4.4p4
> HP Surestore 20 slot DLT8000 Library
> Network Appliance FAS250 (approx 97GB data)
>
> 1. Some times there are no writes to the first tape even though amcheck
> was OK including write test e.g.:
>
> ---Start Mail Report---
> 002118D 0:00 0.0 0.0 0
> 002119D 0:18 4826.7 12.6 13
> NOTES:
> planner: Full dump of bkup02.anz.lucent.com:/dev/netapp/users promoted
> from 26 days ahead.
> taper: tape 002118D kb 0 fm 0 writing filemark: Input/output error
> taper: retrying bkup02.anz.lucent.com:/dev/netapp/users.0 on new tape:
> [writing filemark: Input/output error]
> taper: tape 002119D kb 4943264 fm 13 [OK]
> ---End Mail Report---
>
> I have put different length sleeps at the end of each function in the
> chg-chio script but this has made no difference.
>
>
>
> 2. Some times backups that could fit on 1 tape are spread over multiple
> tapes only utilising a small percentage of each available tape e.g.:
>
> ---Start Mail Report---
> These dumps were to tapes 002104D, 002124D, 002111D.
> The next 4 tapes Amanda expects to used are: 002118D, 002119D, 002120D,
> 002121D.
>
> STATISTICS:
> Total Full Daily
> -------- -------- --------
> Estimate Time (hrs:min) 0:38
> Run Time (hrs:min) 36:03
> Dump Time (hrs:min) 35:02 35:00 0:03
> Output Size (meg) 35209.5 35209.4 0.1
> Original Size (meg) 88129.0 88127.4 1.6
> Avg Compressed Size (%) 40.0 40.0 3.9 (level:#disks
> ...)
> Filesystems Dumped 13 12 1 (1:1)
> Avg Dump Rate (k/s) 285.8 286.2 0.4
>
> Tape Time (hrs:min) 2:09 2:09 0:00
> Tape Size (meg) 35209.5 35209.4 0.1
> Tape Used (%) 91.9 91.9 0.0 (level:#disks
> ...)
> Filesystems Taped 13 12 1 (1:1)
> Avg Tp Write Rate (k/s) 4672.5 4673.9 29.7
>
> USAGE BY TAPE:
> Label Time Size % Nb
> 002104D 0:03 813.6 2.1 4
> 002124D 0:18 4820.5 12.6 1
> 002111D 1:48 29575.4 77.2 8
>
> taper: tape 002104D kb 833344 fm 4 writing filemark: Input/output
> error
> taper: retrying bkup02.anz.lucent.com:/dev/netapp/users.0 on new tape:
> [writing filemark: Input/output error]
> taper: tape 002124D kb 4936224 fm 1 writing filemark: Input/output
> error
> taper: retrying bkup02.anz.lucent.com:/dev/netapp/usr/jna.0 on new
> tape: [writing filemark: Input/output error]
Anytime you see I/O errors at random offsets you should first check for
a) dirty heads on the tape drive
b) bad tapes (although not likely that many go bad at once, unless they have
all been heavily used
c) SCSI errors (check your system logs) due to improper termination (none or
multiply), bad cable (or poor connection, try disconnecting and reconnecting
the cable), or possibly even a bad controller
d) bad drive
You may be experiencing a totally different problem, but start with the easy
stuff first.
Also, look into the 'columnspec' config option. It won't help your I/O errors
but it will make your daily report easier to read.
Frank
> taper: tape 002111D kb 30285632 fm 8 [OK]
>
> DUMP SUMMARY:
> DUMPER STATS TAPER STATS
>
> HOSTNAME DISK L ORIG-KB OUT-KB COMP% MMM:SS KB/s MMM:SS
> KB/s
> -------------------------- ---------------------------------
> ------------
> bkup02.anz.l -etapp/blah 0 3038 528 17.4 0:20 26.1 0:02
> 234.8
> bkup02.anz.l -netapp/etc 0 104291 46749 44.8 0:59 791.3
> 0:095089.9
> bkup02.anz.l -tapp/users 0 101528074936168 48.6 73:421116.4
> 17:344683.0 bkup02.anz.l -/usr/hwcad 0 123085454105162 33.4 94:15
> 725.9 14:174792.7
>
> bkup02.anz.l -sr/include 0 1950 101 5.2 0:13 8.0 0:02
> 50.8
> bkup02.anz.l -pp/usr/jna 0 6471434426179304 40.51912:46 228.1
> 93:504649.6
> bkup02.anz.l -pp/usr/lib 0 1610 8 0.5 0:11 0.6 0:02
> 3.7
> bkup02.anz.l -/usr/local 1 1627 65 4.0 2:40 0.4 0:02
> 29.7
> bkup02.anz.l -usr/lucent 0 1609 9 0.6 0:10 0.8 0:02
> 4.1
> bkup02.anz.l -pp/usr/ncd 0 200036 85204 42.6 1:51 766.5
> 0:155525.6
> bkup02.anz.l -pp/usr/net 0 111856 43823 39.2 1:08 648.4
> 0:085251.9
> bkup02.anz.l -pp/usr/nms 0 1612 10 0.6 0:11 0.8 0:02
> 4.6
> bkup02.anz.l -sr/swtools 0 2640785 657395 24.9 13:56 786.7
> 2:095083.7
> ---End Mail Report---
>
> 3. Dump times to the holding disk can sometimes vary greatly e.g.:
>
> Amanda Dump 20050606 Elapsed Time = 9:40:43
> Bandwidth = 25120 Final Status = TAPE ERROR
> Holding disk = 66560 Dumped/Failed = 13/0
> Tape Policy = FIRST Output data size = 25563
> Dumpers = 4 Estimated data size = 25608
> Driver alg = drain-ends At big end 0
>
> Amanda Dump 20050531 Elapsed Time = 32:34:04
> Bandwidth = 25120 Final Status = TAPE ERROR
> Holding disk = 66560 Dumped/Failed = 13/0
> Tape Policy = FIRST Output data size = 25589
> Dumpers = 4 Estimated data size = 25607
> Driver alg = drain-ends At big end 0
>
> Apart from the length of time for the dumps to complete I can see no
> difference between the sessions when they're running. Both the Amanda
> server and the NAS device appear to be running well with no CPU, memory
> or disk bottlenecks. No apparent network problems though I have been
> unable to get LAN utilization stats.
>
> ---Start amanda.conf---
> org "Toaster"
> mailto "backup"
> dumpuser "amanda"
>
> inparallel 4
> dumporder "BTBTBTBTBTBT"
> netusage 10000 Kbps
> dumpcycle 4 weeks
> runspercycle 20
> tapecycle 100 tapes
>
> bumpsize 20 Mb
> bumpdays 1
> bumpmult 4
>
> etimeout 3600
> dtimeout 6000
> ctimeout 30
> tapebufs 20
>
> runtapes 4
> tpchanger "chg-chio"
> tapedev "/dev/nrst1"
> rawtapedev "/dev/null"
> changerfile "/usr/pkg/etc/amanda/Toaster/changer.conf"
> changerdev "/dev/ch0"
> maxdumpsize -1
> tapetype CUST-DLT8000
> labelstr "^0021[0-9][0-9]D"
> amrecover_do_fsf yes
> amrecover_check_label yes
> amrecover_changer "/dev/nrst1"
>
> holdingdisk hd1 {
> comment "main holding disk"
> directory "/amanda/hd1/CH0"
> use 65Gb
> chunksize 35Gb
> }
>
> autoflush yes
>
> infofile "/var/amanda/Toaster/curinfo"
> logdir "/var/amanda/Toaster"
> indexdir "/var/amanda/Toaster/index"
>
> define tapetype CUST-DLT8000 {
> comment "DLT8000 Drive generated by amtapetyep"
> length 38295 mbytes
> filemark 30 kbytes
> speed 5800 kps
> }
>
> define dumptype global {
> comment "Global definitions"
> # index yes
> # record no
> }
>
> define dumptype comp-high-fast {
> global
> comment "very important partitions on fast machines"
> compress client fast
> priority high
> }
>
> define interface local {
> comment "a local disk"
> use 10000 kbps
> }
>
> define interface fxp0 {
> comment "100 Mbps ethernet"
> use 5120 kbps
> }
> ---End amanda.conf---
>
>
> About 70% of the backup sessions run well but once or twice a week
> something goes wrong.
>
> I have a second Surestore library doing normal system backups and these
> run without problem though no config needs to span multiple tapes i.e.
> each backup config fits on 1 tape.
>
> Any ideas on where to start troubleshooting these problems greatly
> appreciated. Does the config file look OK or does anyone recommend
> changes?
>
> Thanks,
> Greg.
>
--
Frank Smith fsmith AT hoovers
DOT com
Sr. Systems Administrator Voice: 512-374-4673
Hoover's Online Fax: 512-374-4501
|
|
|