Hi,
I'm having inconsistant problems with the backups of a NAS device. This backup uses the amanda-netapp-dump-0.1setuidump/dump utils.
I apologise for the size of this email but I'm hoping someone with a similar setup might have some pointers, opinions, guesses...
FreeBSD 2.0
Amanda 2.4.4p4
HP Surestore 20 slot DLT8000 Library
Network Appliance FAS250 (approx 97GB data)
1. Some times there are no writes to the first tape even though amcheck was OK including write test e.g.:
---Start Mail Report---
002118D 0:00 0.0 0.0 0
002119D 0:18 4826.7 12.6 13
NOTES:
planner: Full dump of bkup02.anz.lucent.com:/dev/netapp/users promoted from 26 days ahead.
taper: tape 002118D kb 0 fm 0 writing filemark: Input/output error
taper: retrying bkup02.anz.lucent.com:/dev/netapp/users.0 on new tape: [writing filemark: Input/output error]
taper: tape 002119D kb 4943264 fm 13 [OK]
---End Mail Report---
I have put different length sleeps at the end of each function in the chg-chio script but this has made no difference.
2. Some times backups that could fit on 1 tape are spread over multiple tapes only utilising a small percentage of each available tape e.g.:
---Start Mail Report---
These dumps were to tapes 002104D, 002124D, 002111D.
The next 4 tapes Amanda expects to used are: 002118D, 002119D, 002120D, 002121D.
STATISTICS:
Total Full Daily
-------- -------- --------
Estimate Time (hrs:min) 0:38
Run Time (hrs:min) 36:03
Dump Time (hrs:min) 35:02 35:00 0:03
Output Size (meg) 35209.5 35209.4 0.1
Original Size (meg) 88129.0 88127.4 1.6
Avg Compressed Size (%) 40.0 40.0 3.9 (level:#disks ...)
Filesystems Dumped 13 12 1 (1:1)
Avg Dump Rate (k/s) 285.8 286.2 0.4
Tape Time (hrs:min) 2:09 2:09 0:00
Tape Size (meg) 35209.5 35209.4 0.1
Tape Used (%) 91.9 91.9 0.0 (level:#disks ...)
Filesystems Taped 13 12 1 (1:1)
Avg Tp Write Rate (k/s) 4672.5 4673.9 29.7
USAGE BY TAPE:
Label Time Size % Nb
002104D 0:03 813.6 2.1 4
002124D 0:18 4820.5 12.6 1
002111D 1:48 29575.4 77.2 8
taper: tape 002104D kb 833344 fm 4 writing filemark: Input/output error
taper: retrying bkup02.anz.lucent.com:/dev/netapp/users.0 on new tape: [writing filemark: Input/output error]
taper: tape 002124D kb 4936224 fm 1 writing filemark: Input/output error
taper: retrying bkup02.anz.lucent.com:/dev/netapp/usr/jna.0 on new tape: [writing filemark: Input/output error]
taper: tape 002111D kb 30285632 fm 8 [OK]
DUMP SUMMARY:
DUMPER STATS TAPER STATS
HOSTNAME DISK L ORIG-KB OUT-KB COMP% MMM:SS KB/s MMM:SS KB/s
-------------------------- --------------------------------- ------------
bkup02.anz.l -etapp/blah 0 3038 528 17.4 0:20 26.1 0:02 234.8
bkup02.anz.l -netapp/etc 0 104291 46749 44.8 0:59 791.3 0:095089.9
bkup02.anz.l -tapp/users 0 101528074936168 48.6 73:421116.4 17:344683.0 bkup02.anz.l -/usr/hwcad 0 123085454105162 33.4 94:15 725.9 14:174792.7
bkup02.anz.l -sr/include 0 1950 101 5.2 0:13 8.0 0:02 50.8
bkup02.anz.l -pp/usr/jna 0 6471434426179304 40.51912:46 228.1 93:504649.6
bkup02.anz.l -pp/usr/lib 0 1610 8 0.5 0:11 0.6 0:02 3.7
bkup02.anz.l -/usr/local 1 1627 65 4.0 2:40 0.4 0:02 29.7
bkup02.anz.l -usr/lucent 0 1609 9 0.6 0:10 0.8 0:02 4.1
bkup02.anz.l -pp/usr/ncd 0 200036 85204 42.6 1:51 766.5 0:155525.6
bkup02.anz.l -pp/usr/net 0 111856 43823 39.2 1:08 648.4 0:085251.9
bkup02.anz.l -pp/usr/nms 0 1612 10 0.6 0:11 0.8 0:02 4.6
bkup02.anz.l -sr/swtools 0 2640785 657395 24.9 13:56 786.7 2:095083.7
---End Mail Report---
3. Dump times to the holding disk can sometimes vary greatly e.g.:
Amanda Dump 20050606 Elapsed Time = 9:40:43
Bandwidth = 25120 Final Status = TAPE ERROR
Holding disk = 66560 Dumped/Failed = 13/0
Tape Policy = FIRST Output data size = 25563
Dumpers = 4 Estimated data size = 25608
Driver alg = drain-ends At big end 0
Amanda Dump 20050531 Elapsed Time = 32:34:04
Bandwidth = 25120 Final Status = TAPE ERROR
Holding disk = 66560 Dumped/Failed = 13/0
Tape Policy = FIRST Output data size = 25589
Dumpers = 4 Estimated data size = 25607
Driver alg = drain-ends At big end 0
Apart from the length of time for the dumps to complete I can see no difference between the sessions when they're running. Both the Amanda server and the NAS device appear to be running well with no CPU, memory or disk bottlenecks. No apparent network problems though I have been unable to get LAN utilization stats.
---Start amanda.conf---
org "Toaster"
mailto "backup"
dumpuser "amanda"
inparallel 4
dumporder "BTBTBTBTBTBT"
netusage 10000 Kbps
dumpcycle 4 weeks
runspercycle 20
tapecycle 100 tapes
bumpsize 20 Mb
bumpdays 1
bumpmult 4
etimeout 3600
dtimeout 6000
ctimeout 30
tapebufs 20
runtapes 4
tpchanger "chg-chio"
tapedev "/dev/nrst1"
rawtapedev "/dev/null"
changerfile "/usr/pkg/etc/amanda/Toaster/changer.conf"
changerdev "/dev/ch0"
maxdumpsize -1
tapetype CUST-DLT8000
labelstr "^0021[0-9][0-9]D"
amrecover_do_fsf yes
amrecover_check_label yes
amrecover_changer "/dev/nrst1"
holdingdisk hd1 {
comment "main holding disk"
directory "/amanda/hd1/CH0"
use 65Gb
chunksize 35Gb
}
autoflush yes
infofile "/var/amanda/Toaster/curinfo"
logdir "/var/amanda/Toaster"
indexdir "/var/amanda/Toaster/index"
define tapetype CUST-DLT8000 {
comment "DLT8000 Drive generated by amtapetyep"
length 38295 mbytes
filemark 30 kbytes
speed 5800 kps
}
define dumptype global {
comment "Global definitions"
# index yes
# record no
}
define dumptype comp-high-fast {
global
comment "very important partitions on fast machines"
compress client fast
priority high
}
define interface local {
comment "a local disk"
use 10000 kbps
}
define interface fxp0 {
comment "100 Mbps ethernet"
use 5120 kbps
}
---End amanda.conf---
About 70% of the backup sessions run well but once or twice a week something goes wrong.
I have a second Surestore library doing normal system backups and these run without problem though no config needs to span multiple tapes i.e. each backup config fits on 1 tape.
Any ideas on where to start troubleshooting these problems greatly appreciated. Does the config file look OK or does anyone recommend changes?
Thanks,
Greg.
|