Bacula-users

[Bacula-users] btape spouradically times out on test command

2010-05-26 05:43:25
Subject: [Bacula-users] btape spouradically times out on test command
From: Christoph Lehmann <lehmann AT schwedler DOT com>
To: bacula-users AT lists.sourceforge DOT net
Date: Wed, 26 May 2010 11:40:21 +0200
Hello list,

i have a problem with btape (bacula-sd too) and writing on a tape. After
a while it times out sporadically and i can't find out the reason. The
problem is reproducable.

Filling the tape with tar works well, I tested all the commands (rewind,
fsr,bsr ...) by hand and there is no problem with that. The mtx-changer
script from bacula works well too.

My environment:

Hardware:
- a linux server running Debian Linux 2.6.32-bpo.4-xen-686 #1 SMP Tue
Apr 13 16:03:38 UTC 2010 i686 GNU/Linux
- a new Tapechanger Quantum Superloader 3 LTO4 with an Ultrium 4 drive
(2190)
- its coupled with an Adaptec 1045 SAS-adapter

Since kernel 2.6.30 the mvsas-module supports the adapter

Software:
- Bacula 5.0.1-1 installed from the Backports
- mt-st 1.1-4
- i recompiled the st driver with debugging: st: Version 20081215, fixed
bufsize 32768, s/g segs 256

backup:~# lsscsi 
[1:0:0:0]    tape    QUANTUM  ULTRIUM 4        2190  /dev/st0
[1:0:0:1]    mediumx QUANTUM  UHDL             0061  /dev/sch0

backup:~# lsmod ...stripped...
scsi_transport_sas     15093  2 mvsas,libsas
libata                114448  3 ata_piix,ata_generic,libsas
scsi_mod              101065  9
sg,st,osst,ch,sd_mod,mvsas,libsas,scsi_transport_sas,libata

backup:~# tapeinfo -f /dev/sg4
Product Type: Tape Drive
Vendor ID: 'QUANTUM '
Product ID: 'ULTRIUM 4       '
Revision: '2190'
Attached Changer API: No
SerialNumber: 'JN0933AMJ50492'
MinBlock: 1
MaxBlock: 16777215
SCSI ID: 0
SCSI LUN: 0
Ready: yes
BufferedMode: yes
Medium Type: Not Loaded
Density Code: 0x46
BlockSize: 0
DataCompEnabled: yes
DataCompCapable: yes
DataDeCompEnabled: yes
CompType: 0x1
DeCompType: 0x1
Block Position: 14111
Partition 0 Remaining Kbytes: 853050
Partition 0 Size in Kbytes: 853060
ActivePartition: 0
EarlyWarningSize: 0

backup:~# cat /etc/stinit.def
# Defaults, i added read-ahead and scsi2logical
{buffer-writes read-ahead can-bsr scsi2logical async-writes}
# Entry from EMC NetWorker Technical Note
manufacturer=QUANTUM model="ULTRIUM 4" {
auto-lock=0
timeout=800
long-timeout=3600
can-partition=0
scsi2logical=1
# This gives the highest performance, but with old versions of st module
it can cause problems with spaning tapes
async-writes=1
# linux does not have full LTO-4 support to set modeline properly
(should be 0x46 for LTO-4), so use device defaults instead
mode1 blocksize=0 density=0x00
}

In the document there was the option "drive-buffering", but it's not
available (and covered with buffer-writes?)

backup:~# mt-st -f /dev/nst0 stshowopt
The options set: buffer-writes async-writes read-ahead can-bsr
scsi2logical

backup:~# cat /proc/sys/kernel/hung_task_timeout_secs
300

Storage-Daemon configuration:

Autochanger {
  Name = wechsler
  Device = band
  Changer Command = "sudo /etc/bacula/scripts/mtx-changer %c %o %S %a %
d"
  Changer Device = /dev/sg5
}
Device {
  Name = band
  Drive Index = 0
  Media Type = LTO-4
  Archive Device = /dev/nst0
  Device Type = Tape
  Automatic Mount = yes;               # when device opened, read it
  AlwaysOpen = yes;
  RemovableMedia = yes;
  RandomAccess = no;
  Maximum File Size = 1GB #2GB #5GB
  Changer Command = "sudo /etc/bacula/scripts/mtx-changer %c %o %S %a %
d"
  Changer Device = /dev/sg5
  AutoChanger = yes
  Maximum Changer Wait = 300
}

Okay, I can run btape test and it times out spouradically

command: strace -ftt -ostrace_btape4 btape -c /etc/bacula/bacula-sd.conf
band

dmesg output since loading the recompiled module, maybe interesting: 

[59066.912097] st: Version 20081215, fixed bufsize 32768, s/g segs 256
[59066.912504] st 1:0:0:0: Attached scsi tape st0
[59066.912548] st 1:0:0:0: st0: try direct i/o: yes (alignment 4 B)
[59066.973150] st0: Block limits 1 - 16777215 bytes.
[59066.973818] st0: Mode sense. Length 11, medium 0, WBS 10, BLL 8
[59066.973867] st0: Density 46, tape length: 0, drv buffer: 1
[59066.973914] st0: Block size: 0, buffer size: 4096 (1 blocks).
[59066.980191] st0: Block limits 1 - 16777215 bytes.
[59066.980856] st0: Mode sense. Length 11, medium 0, WBS 10, BLL 8
[59066.980905] st0: Density 46, tape length: 0, drv buffer: 1
[59066.980952] st0: Block size: 0, buffer size: 4096 (1 blocks).
[59066.981004] st0: Normal timeout set to 800 seconds.
[59066.981040] st0: Long timeout set to 3600 seconds.
[59066.981079] st0: Mode 0 options: buffer writes: 1, async writes: 1,
read ahead: 1
[59066.981138] st0:    can bsr: 1, two FMs: 0, fast mteom: 0, auto lock:
0,
[59066.981178] st0:    defs for wr: 0, no block limits: 0, partitions:
0, s2 log: 0
[59066.981237] st0:    sysv: 0 nowait: 0 sili: 0
[59066.981271] st0:    debugging: 1
[59066.981303] st0: Mode 0 options: buffer writes: 1, async writes: 1,
read ahead: 1
[59066.981363] st0:    can bsr: 1, two FMs: 0, fast mteom: 0, auto lock:
0,
[59066.981403] st0:    defs for wr: 0, no block limits: 0, partitions:
0, s2 log: 1
[59066.981463] st0:    sysv: 0 nowait: 0 sili: 0
[59066.981497] st0:    debugging: 1
[59066.981528] st0: Default block size set to 0 bytes.
[59066.981564] st0: Density default set to 0
[59066.981597] st0: Setting block size to 0 bytes.
[59066.981631] st0: Setting density code to 0.
[59111.178879] st0: Block limits 1 - 16777215 bytes.
[59111.179531] st0: Mode sense. Length 11, medium 0, WBS 10, BLL 8
[59111.179580] st0: Density 46, tape length: 0, drv buffer: 1
[59111.179627] st0: Block size: 0, buffer size: 4096 (1 blocks).
[59111.179787] st0: Rewinding tape.
[59111.190845] st0: Block limits 1 - 16777215 bytes.
[59111.191525] st0: Mode sense. Length 11, medium 0, WBS 10, BLL 8
[59111.191574] st0: Density 46, tape length: 0, drv buffer: 1
[59111.191621] st0: Block size: 0, buffer size: 4096 (1 blocks).
[59111.191753] st0: Locking drive door.
[59111.192529] st0: Setting block size to 0 bytes.
[59111.193439] st0: Mode 0 options: buffer writes: 1, async writes: 1,
read ahead: 1
[59111.193510] st0:    can bsr: 1, two FMs: 0, fast mteom: 0, auto lock:
0,
[59111.193554] st0:    defs for wr: 0, no block limits: 0, partitions:
0, s2 log: 1
[59111.193616] st0:    sysv: 0 nowait: 0 sili: 0
[59111.193650] st0:    debugging: 1
[59111.195264] st0: Block limits 1 - 16777215 bytes.
[59111.195921] st0: Mode sense. Length 11, medium 0, WBS 10, BLL 8
[59111.195970] st0: Density 46, tape length: 0, drv buffer: 1
[59111.196723] st0: Block size: 0, buffer size: 4096 (1 blocks).
[59111.196857] st0: Rewinding tape.
[59111.207772] st0: Block limits 1 - 16777215 bytes.
[59111.208423] st0: Mode sense. Length 11, medium 0, WBS 10, BLL 8
[59111.208471] st0: Density 46, tape length: 0, drv buffer: 1
[59111.208518] st0: Block size: 0, buffer size: 4096 (1 blocks).
[59111.208652] st0: Locking drive door.
[59111.209338] st0: Setting block size to 0 bytes.
[59111.210314] st0: Mode 0 options: buffer writes: 1, async writes: 1,
read ahead: 1
[59111.210375] st0:    can bsr: 1, two FMs: 0, fast mteom: 0, auto lock:
0,
[59111.210415] st0:    defs for wr: 0, no block limits: 0, partitions:
0, s2 log: 1
[59111.210474] st0:    sysv: 0 nowait: 0 sili: 0
[59111.210510] st0:    debugging: 1
[59140.637500] st0: Rewinding tape.
[59140.648091] st0: Setting block size to 0 bytes.
[59140.648129] st0: Setting density code to 0.
[59150.021243] st0: Writing 1 filemarks.
[59159.801495] st0: Writing 1 filemarks.
[59162.264608] st0: Rewinding tape.
[59182.339981] st0: Error: 8000002, cmd: 8 0 0 fc 0 0
[59182.340021] st0: Sense Key : No Sense [current] 
[59182.340068] Info fld=0xfc00, FMK
[59182.340099] st0: Add. Sense: Filemark detected
[59182.340143] st0: Sense: f0  0 80  0  0 fc  0 1e
[59182.340177] st0: EOF detected (0 bytes read).
[59182.340764] st0: EOF/EOM flag up (2). Bytes 0
[59182.341791] st0: EOF up (2). Left 64512, needed 64512.
[59191.023694] st0: Rewinding tape.
[59191.025548] st0: Spacing tape forward over 4 blocks.
[59191.033038] st0: Spacing tape forward over 195 blocks.
[59191.094513] st0: Spacing tape forward over 9798 blocks.
[59191.400620] st0: Spacing tape forward over 1 filemarks.
[59191.423898] st0: EOF/EOM flag up (2). Bytes 0
[59191.425424] st0: EOF up (2). Left 64512, needed 64512.
[59191.426159] st0: Spacing tape forward over 599 blocks.
[59191.574577] st0: Spacing tape forward over 9398 blocks.
[59191.962161] st0: Rewinding tape.
[59196.152816] st0: Writing 1 filemarks.
[59200.007350] st0: Writing 1 filemarks.
[59200.050531] st0: Writing 1 filemarks.
[59200.089635] st0: Unlocking drive door.
[59200.090563] st0: Number of r/w requests 40013, dio used in 40013,
pages 680221.
[59200.092058] st0: Block limits 1 - 16777215 bytes.
[59200.092714] st0: Mode sense. Length 11, medium 0, WBS 10, BLL 8
[59200.092765] st0: Density 46, tape length: 0, drv buffer: 1
[59200.092802] st0: Block size: 0, buffer size: 4096 (1 blocks).
[59200.092933] st0: Rewinding tape.
[59204.136462] st0: Block limits 1 - 16777215 bytes.
[59204.137134] st0: Mode sense. Length 11, medium 0, WBS 10, BLL 8
[59204.137187] st0: Density 46, tape length: 0, drv buffer: 1
[59204.137242] st0: Block size: 0, buffer size: 4096 (1 blocks).
[59204.137399] st0: Locking drive door.
[59204.138218] st0: Setting block size to 0 bytes.
[59204.139185] st0: Mode 0 options: buffer writes: 1, async writes: 1,
read ahead: 1
[59204.139247] st0:    can bsr: 1, two FMs: 0, fast mteom: 0, auto lock:
0,
[59204.139287] st0:    defs for wr: 0, no block limits: 0, partitions:
0, s2 log: 1
[59204.139346] st0:    sysv: 0 nowait: 0 sili: 0
[59204.139380] st0:    debugging: 1
[59204.139652] st0: Rewinding tape.
[59204.141046] st0: Spacing tape forward over 8388607 filemarks.
[59204.142287] st0: Error: 8000002, cmd: 11 1 7f ff ff 0
[59204.142334] st0: Sense Key : Blank Check [current] 
[59204.142403] Info fld=0x7ffffc
[59204.142433] st0: Add. Sense: End-of-data detected
[59204.427601] st0: Writing 1 filemarks.
[59206.061925] st0: Rewinding tape.
[59218.160250] st0: Error: 8000002, cmd: 8 0 0 fc 0 0
[59218.160299] st0: Sense Key : No Sense [current] 
[59218.160369] Info fld=0xfc00, FMK
[59218.160399] st0: Add. Sense: Filemark detected
[59218.160445] st0: Sense: f0  0 80  0  0 fc  0 1e
[59218.160479] st0: EOF detected (0 bytes read).
[59218.160960] st0: EOF/EOM flag up (2). Bytes 0
[59218.162043] st0: EOF up (2). Left 64512, needed 64512.
[59218.183776] st0: Error: 8000002, cmd: 8 0 0 fc 0 0
[59218.183824] st0: Sense Key : No Sense [current] 
[59218.183892] Info fld=0xfc00, FMK
[59218.183923] st0: Add. Sense: Filemark detected
[59218.183969] st0: Sense: f0  0 80  0  0 fc  0 1e
[59218.184004] st0: EOF detected (0 bytes read).
[59218.184800] st0: EOF/EOM flag up (2). Bytes 0
[59218.185862] st0: EOF up (2). Left 64512, needed 64512.
[59218.236914] st0: Error: 8000002, cmd: 8 0 0 fc 0 0
[59218.236962] st0: Sense Key : No Sense [current] 
[59218.237031] Info fld=0xfc00, FMK
[59218.237062] st0: Add. Sense: Filemark detected
[59218.237109] st0: Sense: f0  0 80  0  0 fc  0 1e
[59218.237143] st0: EOF detected (0 bytes read).
[59218.237657] st0: EOF/EOM flag up (2). Bytes 0
[59218.238703] st0: EOF up (2). Left 64512, needed 64512.
[59223.238547] st0: Error: 8000002, cmd: 8 0 0 fc 0 0
[59223.238596] st0: Sense Key : No Sense [current] 
[59223.238661] Info fld=0xfc00, FMK
[59223.238692] st0: Add. Sense: Filemark detected
[59223.238741] st0: Sense: f0  0 80  0  0 fc  0 1e
[59223.238775] st0: EOF detected (0 bytes read).
[59223.239319] st0: EOF/EOM flag up (2). Bytes 0
[59223.240723] st0: Error: 8000002, cmd: 8 0 0 fc 0 0
[59223.240760] st0: Sense Key : Blank Check [current] 
[59223.240809] Info fld=0xfc00
[59223.240838] st0: Add. Sense: End-of-data detected
[59223.240886] st0: Sense: f0  0  8  0  0 fc  0 1e
[59223.240920] st0: Tape error while reading.
[59223.240953] st0: Zero returned for first BLANK CHECK after EOF.
[59223.241790] st0: Rewinding tape.
[59223.243540] st0: Setting block size to 0 bytes.
[59223.243576] st0: Setting density code to 0.
[59227.330717] st0: Writing 1 filemarks.
[59231.179419] st0: Spacing tape backward over 1 filemarks.
[59235.158520] st0: Spacing tape backward over 1 blocks.
[59235.169386] st0: Rewinding tape.
[59239.568865] st0: Writing 1 filemarks.
[59243.425475] st0: Writing 1 filemarks.
[59243.468538] st0: Writing 1 filemarks.
[59243.510459] st0: Writing 1 filemarks.
[59243.551497] st0: Writing 1 filemarks.
[59243.590430] st0: Rewinding tape.
[59251.730032] st0: Spacing tape forward over 1 filemarks.
[59255.792254] st0: Spacing tape forward over 2 filemarks.
[59255.873336] st0: Rewinding tape.
[59260.470836] st0: Spacing tape forward over 4 filemarks.
[59260.476113] st0: Spacing tape forward over 1 filemarks.
[59287.652420] st0: Rewinding tape.
[59300.778049] st0: Writing 1 filemarks.
[59772.344077] INFO: task btape:15348 blocked for more than 300 seconds.
[59772.344132] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[59772.344206] btape         D c23e1668     0 15348  15347 0x00000001
[59772.344254]  e901d100 00200286 c23e0eec c23e1668 00008020 c144ed20
c144ed20 00000000
[59772.344348]  e901d2bc c45bed20 00000000 c10b6d5d 00000000 e98ba540
ed9f204c c2399079
[59772.344441]  c2399000 e901d2bc 00000000 c12880e8 00000000 ed9d47c1
c1005df4 00000000
[59772.344533] Call Trace:
[59772.344569]  [<c10b6d5d>] ? kmem_cache_alloc+0x79/0xe5
[59772.344626]  [<c12880e8>] ? _spin_lock_irq+0xb/0x21
[59772.344681]  [<ed9d47c1>] ? sas_queuecommand+0x1f2/0x211 [libsas]
[59772.344723]  [<c1005df4>] ? xen_force_evtchn_callback+0xc/0x10
[59772.344764]  [<c1287371>] ? schedule_timeout+0x20/0xb0
[59772.344803]  [<c112d1ac>] ? blk_peek_request+0x135/0x143
[59772.344861]  [<ed862987>] ? scsi_dispatch_cmd+0x185/0x1e5 [scsi_mod]
[59772.344920]  [<ed867387>] ? scsi_request_fn+0x3c1/0x47a [scsi_mod]
[59772.344961]  [<c128727a>] ? wait_for_common+0xa5/0x101
[59772.345000]  [<c10349b0>] ? default_wake_function+0x0/0x8
[59772.345040]  [<edf02984>] ? st_scsi_execute_end+0x0/0x45 [st]
[59772.345080]  [<edf03099>] ? st_do_scsi+0x2a2/0x2ca [st]
[59772.345118]  [<edf07828>] ? st_write+0x4f1/0x926 [st]
[59772.345157]  [<c1044dc3>] ? ptrace_stop+0xc0/0xf2
[59772.345195]  [<edf07337>] ? st_write+0x0/0x926 [st]
[59772.345232]  [<c10bc156>] ? vfs_write+0x7e/0xd6
[59772.345269]  [<c10bc246>] ? sys_write+0x3c/0x63
[59772.345306]  [<c1008c3c>] ? syscall_call+0x7/0xb

... repeats several times ...

[60130.146514] 
/tmp/buildd/linux-2.6-2.6.32/debian/build/source_i386_xen/drivers/scsi/mvsas/mv_sas.c
 1669:mvs_abort_task:rc= -5
[60154.292977] 
/tmp/buildd/linux-2.6-2.6.32/debian/build/source_i386_xen/drivers/scsi/mvsas/mv_sas.c
 1608:mvs_query_task:rc= -5
[60178.439443] 
/tmp/buildd/linux-2.6-2.6.32/debian/build/source_i386_xen/drivers/scsi/mvsas/mv_sas.c
 1669:mvs_abort_task:rc= -5
[60202.585918] 
/tmp/buildd/linux-2.6-2.6.32/debian/build/source_i386_xen/drivers/scsi/mvsas/mv_sas.c
 1608:mvs_query_task:rc= -5
[60226.732387] 
/tmp/buildd/linux-2.6-2.6.32/debian/build/source_i386_xen/drivers/scsi/mvsas/mv_sas.c
 1669:mvs_abort_task:rc= -5
[60250.878860] 
/tmp/buildd/linux-2.6-2.6.32/debian/build/source_i386_xen/drivers/scsi/mvsas/mv_sas.c
 1608:mvs_query_task:rc= -5
[60275.025323] 
/tmp/buildd/linux-2.6-2.6.32/debian/build/source_i386_xen/drivers/scsi/mvsas/mv_sas.c
 1669:mvs_abort_task:rc= -5
[60299.171787] 
/tmp/buildd/linux-2.6-2.6.32/debian/build/source_i386_xen/drivers/scsi/mvsas/mv_sas.c
 1608:mvs_query_task:rc= -5
[60323.318249] 
/tmp/buildd/linux-2.6-2.6.32/debian/build/source_i386_xen/drivers/scsi/mvsas/mv_sas.c
 1669:mvs_abort_task:rc= -5
[60347.464715] 
/tmp/buildd/linux-2.6-2.6.32/debian/build/source_i386_xen/drivers/scsi/mvsas/mv_sas.c
 1608:mvs_query_task:rc= -5
[60372.344064] INFO: task btape:15348 blocked for more than 300 seconds.
[60372.344119] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[60372.344193] btape         D c23e1668     0 15348  15347 0x00000001
[60372.344241]  e901d100 00200286 c23e0eec c23e1668 00008020 c144ed20
c144ed20 00000000
[60372.344334]  e901d2bc c45bed20 00000000 c10b6d5d 00000000 e98ba540
ed9f204c c2399079
[60372.344428]  c2399000 e901d2bc 00000000 c12880e8 00000000 ed9d47c1
c1005df4 00000000
[60372.344520] Call Trace:
[60372.344556]  [<c10b6d5d>] ? kmem_cache_alloc+0x79/0xe5
[60372.344613]  [<c12880e8>] ? _spin_lock_irq+0xb/0x21
[60372.344668]  [<ed9d47c1>] ? sas_queuecommand+0x1f2/0x211 [libsas]
[60372.344710]  [<c1005df4>] ? xen_force_evtchn_callback+0xc/0x10
[60372.344750]  [<c1287371>] ? schedule_timeout+0x20/0xb0
[60372.344790]  [<c112d1ac>] ? blk_peek_request+0x135/0x143
[60372.344848]  [<ed862987>] ? scsi_dispatch_cmd+0x185/0x1e5 [scsi_mod]
[60372.344907]  [<ed867387>] ? scsi_request_fn+0x3c1/0x47a [scsi_mod]
[60372.344948]  [<c128727a>] ? wait_for_common+0xa5/0x101
[60372.344987]  [<c10349b0>] ? default_wake_function+0x0/0x8
[60372.345026]  [<edf02984>] ? st_scsi_execute_end+0x0/0x45 [st]
[60372.345066]  [<edf03099>] ? st_do_scsi+0x2a2/0x2ca [st]
[60372.345105]  [<edf07828>] ? st_write+0x4f1/0x926 [st]
[60372.345143]  [<c1044dc3>] ? ptrace_stop+0xc0/0xf2
[60372.345181]  [<edf07337>] ? st_write+0x0/0x926 [st]
[60372.345219]  [<c10bc156>] ? vfs_write+0x7e/0xd6
[60372.345255]  [<c10bc246>] ? sys_write+0x3c/0x63
[60372.345292]  [<c1008c3c>] ? syscall_call+0x7/0xb

the INFO-part just says the process is waiting for something and repeats
every 300 seconds.

i traced the btape process using strace -ftt. Its very big, so i
stripped repeating parts and published it here:

http://pastebin.com/87HdsCRG

I'm thinking of too short ioctl-timeouts, but I could be wrong.

Can you help me or give me some hints?

Thanks for working on this,

Christoph Lehmann


--
Christoph Lehmann
Dr. Ralf Schwedler e.K.     Amtsgericht Pinneberg, HRA 5609 PI
Internet / Sicherheit       Bankverbindung:
                            Sparkasse Westholstein
Steinbr�ckstrasse 13b       BLZ 222 500 20
25524 Itzehoe, Germany      Konto-Nr. 11 00 84 323
T +49 4821 40 87 58         International Account Number
  0700-SCHWEDLER            IBAN DE05 2225 0020 0110 0843 23
F +49 4821 40 87 59         BIC HSHNDEH1IZH
----                        ----
www.schwedler.com           Steuer-Nr. 18019 34593
lehmann AT schwedler DOT com       USt.-Id. DE 214230181

------------------------------------------------------------------------------

_______________________________________________
Bacula-users mailing list
Bacula-users AT lists.sourceforge DOT net
https://lists.sourceforge.net/lists/listinfo/bacula-users
<Prev in Thread] Current Thread [Next in Thread>
  • [Bacula-users] btape spouradically times out on test command, Christoph Lehmann <=