Bacula-users

Re: [Bacula-users] ERR=Input/output error / only some times

2008-10-14 14:38:14
Subject: Re: [Bacula-users] ERR=Input/output error / only some times
From: Arno Lehmann <al AT its-lehmann DOT de>
To: bacula-users AT lists.sourceforge DOT net
Date: Tue, 14 Oct 2008 20:33:12 +0200
Hi,

14.10.2008 15:47, Stefan Lubitz wrote:
> Hi Arno,
> 
> thanks for the infos. I found the following messages in /var/log/messages.
> Could this show a problem?

Definitely.

> I am not familiar with this
> Until now I have not worked with smartmontools. I have installed them 
> and added the /dev/sg3 in the smartd.conf file.
> Let's see what happens.
> The Tapes I am using are new, also is the Library and all the cabels.

That doesn't guarantee that all connections are firmly done, and it 
also doesn't guarantee that the SCSI bus termination is correct. You 
should check those, probably removing all connections and 
re-connecting them.

> Thanks for further help,
> Stefan
> 
> Messages:
> Oct 14 14:04:56 BACKUP3 kernel: [101067.953408] mptscsih: ioc1: 
> attempting task abort! (sc=ffff81011bd15a80)

That looks like a SCSI problem... I haven't worked with the mptscsi 
driver, IIRC. Others report different texts...

Tapealert might tell you more, if the problem is triggered and noticed 
by the tape drive itself.

Unfortunately, these log messages only show that the HBA firmware 
tries to recover from some problem, but I don't see which the original 
problem was.

Arno

> Oct 14 14:04:56 BACKUP3 kernel: [101067.953413] ch 2:0:4:1: CDB: 
> Inquiry: 12 00 00 00 24 00
> Oct 14 14:04:58 BACKUP3 kernel: [101070.275853] mptbase: ioc1: 
> Initiating recovery
> Oct 14 14:05:00 BACKUP3 kernel: [101072.206197] st 2:0:4:0: mptscsih: 
> ioc1: completing cmds: fw_channel 0, fw_id 4, sc=ffff810119901580, mf = 
> ffff81011bd42bc0, idx=a
> Oct 14 14:05:00 BACKUP3 kernel: [101072.206203] ch 2:0:4:1: mptscsih: 
> ioc1: completing cmds: fw_channel 0, fw_id 4, sc=ffff81011bd15a80, mf = 
> ffff81011bd42c20, idx=b
> Oct 14 14:05:03 BACKUP3 kernel: [101076.299678] mptscsih: ioc1: Issue of 
> TaskMgmt failed!
> Oct 14 14:05:03 BACKUP3 kernel: [101076.299683] mptscsih: ioc1: task 
> abort: FAILED (sc=ffff81011bd15a80)
> Oct 14 14:05:03 BACKUP3 kernel: [101076.299685] mptscsih: ioc1: 
> attempting task abort! (sc=ffff810119901580)
> Oct 14 14:05:03 BACKUP3 kernel: [101076.299689] st 2:0:4:0: CDB: 
> Read(6): 08 00 00 fc 00 00
> Oct 14 14:05:03 BACKUP3 kernel: [101076.299712] mptscsih: ioc1: task 
> abort: SUCCESS (sc=ffff810119901580)
> Oct 14 14:05:03 BACKUP3 kernel: [101076.299670] mptscsih: ioc1: 
> attempting target reset! (sc=ffff81011bd15a80)
> Oct 14 14:05:03 BACKUP3 kernel: [101076.299670] ch 2:0:4:1: CDB: 
> Inquiry: 12 00 00 00 24 00
> Oct 14 14:05:04 BACKUP3 kernel: [101077.027907] mptscsih: ioc1: target 
> reset: SUCCESS (sc=ffff81011bd15a80)
> Oct 14 14:05:04 BACKUP3 kernel: [101077.027907] st0: Sense Key : Unit 
> Attention [current]
> Oct 14 14:05:04 BACKUP3 kernel: [101077.027907] st0: Add. Sense: Power 
> on, reset, or bus device reset occurred
> Oct 14 14:05:04 BACKUP3 kernel: [101077.401595]  target2:0:4: Beginning 
> Domain Validation
> Oct 14 14:05:04 BACKUP3 kernel: [101077.421570]  target2:0:4: Ending 
> Domain Validation
> Oct 14 14:05:04 BACKUP3 kernel: [101077.421570]  target2:0:4: FAST-80 
> WIDE SCSI 160.0 MB/s DT (12.5 ns, offset 127)
> Oct 14 14:05:04 BACKUP3 kernel: [101077.636087]  target2:0:4: Beginning 
> Domain Validation
> Oct 14 14:05:04 BACKUP3 kernel: [101077.659128]  target2:0:4: Ending 
> Domain Validation
> Oct 14 14:05:04 BACKUP3 kernel: [101077.659128]  target2:0:4: FAST-80 
> WIDE SCSI 160.0 MB/s DT (12.5 ns, offset 127)
> Oct 14 14:05:04 BACKUP3 kernel: [101077.659128]  target2:0:5: Beginning 
> Domain Validation
> Oct 14 14:05:04 BACKUP3 kernel: [101077.663128]  target2:0:5: Domain 
> Validation skipping write tests
> Oct 14 14:05:04 BACKUP3 kernel: [101077.663128]  target2:0:5: Ending 
> Domain Validation
> Oct 14 14:05:04 BACKUP3 kernel: [101077.663128]  target2:0:5: FAST-80 
> WIDE SCSI 160.0 MB/s DT (12.5 ns, offset 127)
> Oct 14 14:16:35 BACKUP3 -- MARK --
> Oct 14 14:20:05 BACKUP3 kernel: [102122.127816] mptscsih: ioc1: 
> attempting task abort! (sc=ffff81011e54fe40)
> Oct 14 14:20:05 BACKUP3 kernel: [102122.127821] st 2:0:4:0: CDB: 
> Read(6): 08 00 00 fc 00 00
> Oct 14 14:20:05 BACKUP3 kernel: [102122.247790] mptscsih: ioc1: task 
> abort: FAILED (sc=ffff81011e54fe40)
> Oct 14 14:20:05 BACKUP3 kernel: [102122.247796] mptscsih: ioc1: 
> attempting target reset! (sc=ffff81011e54fe40)
> Oct 14 14:20:05 BACKUP3 kernel: [102122.247798] st 2:0:4:0: CDB: 
> Read(6): 08 00 00 fc 00 00
> Oct 14 14:20:07 BACKUP3 kernel: [102124.943766] mptbase: ioc1: 
> Initiating recovery
> Oct 14 14:20:09 BACKUP3 kernel: [102126.894740] st 2:0:4:0: mptscsih: 
> ioc1: completing cmds: fw_channel 0, fw_id 4, sc=ffff81011e54fe40, mf = 
> ffff81011bd46760, idx=a9
> Oct 14 14:20:12 BACKUP3 kernel: [102131.260388] mptscsih: ioc1: Issue of 
> TaskMgmt failed!
> Oct 14 14:20:12 BACKUP3 kernel: [102131.260391] mptscsih: ioc1: target 
> reset: FAILED (sc=ffff81011e54fe40)
> Oct 14 14:20:12 BACKUP3 kernel: [102131.260394] mptscsih: ioc1: 
> attempting bus reset! (sc=ffff81011e54fe40)
> Oct 14 14:20:12 BACKUP3 kernel: [102131.260396] st 2:0:4:0: CDB: 
> Read(6): 08 00 00 fc 00 00
> Oct 14 14:20:13 BACKUP3 kernel: [102132.196536] mptscsih: ioc1: bus 
> reset: SUCCESS (sc=ffff81011e54fe40)
> Oct 14 14:20:23 BACKUP3 kernel: [102143.700212] st0: Error 80000 (sugg. 
> bt 0x0, driver bt 0x0, host bt 0x8).
> Oct 14 14:20:23 BACKUP3 kernel: [102143.821719]  target2:0:4: Beginning 
> Domain Validation
> Oct 14 14:20:23 BACKUP3 kernel: [102144.177692]  target2:0:4: Ending 
> Domain Validation
> Oct 14 14:20:23 BACKUP3 kernel: [102144.177692]  target2:0:4: FAST-80 
> WIDE SCSI 160.0 MB/s DT (12.5 ns, offset 127)
> Oct 14 14:20:23 BACKUP3 kernel: [102144.390932]  target2:0:4: Beginning 
> Domain Validation
> Oct 14 14:20:23 BACKUP3 kernel: [102144.398927]  target2:0:4: Ending 
> Domain Validation
> Oct 14 14:20:23 BACKUP3 kernel: [102144.398927]  target2:0:4: FAST-80 
> WIDE SCSI 160.0 MB/s DT (12.5 ns, offset 127)
> Oct 14 14:20:23 BACKUP3 kernel: [102144.398927]  target2:0:5: Beginning 
> Domain Validation
> Oct 14 14:20:24 BACKUP3 kernel: [102144.422927]  target2:0:5: Domain 
> Validation skipping write tests
> Oct 14 14:20:24 BACKUP3 kernel: [102144.422927]  target2:0:5: Ending 
> Domain Validation
> 
> 
> 
> 
> Arno Lehmann schrieb:
>> Hi,
>>
>> 14.10.2008 11:54, Stefan Lubitz wrote:
>>  
>>> Hi all,
>>>
>>> I have a strange behavior when I run btape.
>>> The test abborts at different positions. Sometimes it runs 
>>> completely. Sometimes it stops after the rewind or after the "Got EOF 
>>> on tape".
>>> I have no idea where I can search what's wrong. The strange thing is, 
>>> that I was able to write (bconsole / run a job) on two tapes.
>>> After writing I tried to restore the job, but the restore failed. So 
>>> I started from beginning with testing my Autoloader.
>>> And guess what......... It failed.
>>>     
>>
>> Looks like a hardware-related problem.
>>
>>  
>>> If I write an EOF manually (mt -f /dev/nst0 weof), the the test will 
>>> work one time. The second time it fails again.
>>> Is this normal?
>>>
>>> I start btape with the following syntax:
>>> btape -c /dev/bacula/bacula-sd.conf -v -d 6 /dev/nst0
>>>
>>> Here ist the btape output:
>>> btape: btape.c:831 Wrote 1000 blocks of 64412 bytes.
>>> btape: btape.c:505 Wrote 1 EOF to "Drive-1" (/dev/nst0)
>>> btape: btape.c:847 Wrote 1000 blocks of 64412 bytes.
>>> btape: btape.c:505 Wrote 1 EOF to "Drive-1" (/dev/nst0)
>>> btape: btape.c:856 Rewind OK.
>>> 1000 blocks re-read correctly.
>>> Got EOF on tape.
>>>
>>> Here is the Error Message:
>>> btape: btape.c:856 Rewind OK.
>>> 13-Oct 15:08 btape JobId 0: Error: block.c:999 Read error on fd=3 at 
>>> file:blk 0:976 on device "Drive-1" (/dev/nst0). ERR=Input/output error.
>>> btape: btape.c:868 Read block 977 failed! ERR=Input/output error
>>>
>>> Next try, same result:
>>> 13-Oct 16:33 btape JobId 0: Error: block.c:999 Read error on fd=3 at 
>>> file:blk 1:897 on device "Drive-1" (/dev/nst0). ERR=Input/output error.
>>> btape: btape.c:868 Read block 1898 failed! ERR=Input/output error
>>>
>>> Can you please help me, where I can start to investigate?
>>>     
>>
>> Start by having a look into the system's log - there might be errors 
>> reported there.
>>
>> This might be a broken tape drive, bad cabling or termination, or a 
>> rancid tape.
>>
>> Also, tapelerts might be present for the tape drive - you can see them 
>> using smartmontools or the tapealert program.
>>
>> Arno
>>
>>  
>>> Thanks,
>>> Stefan
>>>
>>>     
>>
>>   
> 

-- 
Arno Lehmann
IT-Service Lehmann
Sandstr. 6, 49080 Osnabrück
www.its-lehmann.de

-------------------------------------------------------------------------
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK & win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100&url=/
_______________________________________________
Bacula-users mailing list
Bacula-users AT lists.sourceforge DOT net
https://lists.sourceforge.net/lists/listinfo/bacula-users

<Prev in Thread] Current Thread [Next in Thread>