Bacula-users

Re: [Bacula-users] [Bacula-devel] Load slots timeout

2014-11-10 11:25:42
Subject: Re: [Bacula-users] [Bacula-devel] Load slots timeout
From: Kern Sibbald <kern AT sibbald DOT com>
To: Andrey Chebotarev <a AT 525 DOT su>, heitor AT bacula.com DOT br
Date: Mon, 10 Nov 2014 17:22:35 +0100
There is no need to modify the code.  Just modify the bacula-sd.conf file.  It is much easier.

Best regards,
Kern

On 11/10/2014 04:14 PM, Andrey Chebotarev wrote:
I've modified and recompiled, waiting for result till next drive cleaning...

Probably that's it:

stored/stored_conf.c:

   {"maximumchangerwait",    store_time,   ITEM(res_dev.max_changer_wait), 0, ITEM_DEFAULT, 5 * 60},


Here example what's going on.
Job logs:

31-Oct 12:35 baculasrv-dir JobId 10967: Start Backup JobId 10967, Job=StorwizeBSRVData2BJob1.2014-10-31_12.35.06_34
31-Oct 12:35 baculasrv-dir JobId 10967: Recycled volume "346AAJL6"
31-Oct 12:35 baculasrv-dir JobId 10967: Using Device "IBM_TS3200_2_drive1" to write.
31-Oct 12:35 baculasrv-sd JobId 10967: 3307 Issuing autochanger "unload slot 46, drive 1" command.
31-Oct 12:36 baculasrv-sd JobId 10967: 3304 Issuing autochanger "load slot 12, drive 1" command.
31-Oct 12:36 baculasrv-sd JobId 10967: 3305 Autochanger "load slot 12, drive 1", status is OK.
31-Oct 12:36 baculasrv-sd JobId 10967: Recycled volume "346AAJL6" on device "IBM_TS3200_2_drive1" (/dev/nst0), all previous data lost.
31-Oct 12:36 baculasrv-dir JobId 10967: Max Volume jobs=1 exceeded. Marking Volume "346AAJL6" as Used.
31-Oct 23:25 baculasrv-sd JobId 10967: End of Volume "346AAJL6" at 2555:8545 on device "IBM_TS3200_2_drive1" (/dev/nst0). Write of 64512 bytes got -1.
31-Oct 23:25 baculasrv-sd JobId 10967: Re-read of last block succeeded.
31-Oct 23:25 baculasrv-sd JobId 10967: End of medium on Volume "346AAJL6" Bytes=2,555,387,735,040 Blocks=39,611,044 at 31-Oct-2014 23:25.
31-Oct 23:25 baculasrv-sd JobId 10967: 3307 Issuing autochanger "unload slot 12, drive 1" command.
31-Oct 23:28 baculasrv-dir JobId 10967: Recycled volume "349AAJL6"
31-Oct 23:28 baculasrv-sd JobId 10967: 3304 Issuing autochanger "load slot 34, drive 1" command.
31-Oct 23:28 baculasrv-sd JobId 10967: 3305 Autochanger "load slot 34, drive 1", status is OK.
31-Oct 23:28 baculasrv-sd JobId 10967: Recycled volume "349AAJL6" on device "IBM_TS3200_2_drive1" (/dev/nst0), all previous data lost.
31-Oct 23:28 baculasrv-dir JobId 10967: Max Volume jobs=1 exceeded. Marking Volume "349AAJL6" as Used.
31-Oct 23:28 baculasrv-sd JobId 10967: New volume "349AAJL6" mounted on device "IBM_TS3200_2_drive1" (/dev/nst0) at 31-Oct-2014 23:28.
01-Nov 10:25 baculasrv-sd JobId 10967: End of Volume "349AAJL6" at 2541:9287 on device "IBM_TS3200_2_drive1" (/dev/nst0). Write of 64512 bytes got -1.
01-Nov 10:25 baculasrv-sd JobId 10967: Re-read of last block succeeded.
01-Nov 10:25 baculasrv-sd JobId 10967: End of medium on Volume "349AAJL6" Bytes=2,541,436,498,944 Blocks=39,394,786 at 01-Nov-2014 10:25.
01-Nov 10:25 baculasrv-sd JobId 10967: 3307 Issuing autochanger "unload slot 34, drive 1" command.
01-Nov 10:27 baculasrv-dir JobId 10967: Recycled volume "225AAAL6"
01-Nov 10:27 baculasrv-sd JobId 10967: 3304 Issuing autochanger "load slot 3, drive 1" command.
01-Nov 10:28 baculasrv-sd JobId 10967: 3305 Autochanger "load slot 3, drive 1", status is OK.
01-Nov 10:28 baculasrv-sd JobId 10967: Recycled volume "225AAAL6" on device "IBM_TS3200_2_drive1" (/dev/nst0), all previous data lost.
01-Nov 10:28 baculasrv-dir JobId 10967: Max Volume jobs=1 exceeded. Marking Volume "225AAAL6" as Used.
01-Nov 10:28 baculasrv-sd JobId 10967: New volume "225AAAL6" mounted on device "IBM_TS3200_2_drive1" (/dev/nst0) at 01-Nov-2014 10:28.
01-Nov 23:20 baculasrv-sd JobId 10967: End of Volume "225AAAL6" at 2652:7176 on device "IBM_TS3200_2_drive1" (/dev/nst0). Write of 64512 bytes got -1.
01-Nov 23:20 baculasrv-sd JobId 10967: Re-read of last block succeeded.
01-Nov 23:20 baculasrv-sd JobId 10967: End of medium on Volume "225AAAL6" Bytes=2,652,293,210,112 Blocks=41,113,175 at 01-Nov-2014 23:20.
01-Nov 23:20 baculasrv-sd JobId 10967: 3307 Issuing autochanger "unload slot 3, drive 1" command.
01-Nov 23:22 baculasrv-dir JobId 10967: Recycled volume "345AAJL6"
01-Nov 23:22 baculasrv-sd JobId 10967: 3304 Issuing autochanger "load slot 11, drive 1" command.
01-Nov 23:23 baculasrv-sd JobId 10967: 3305 Autochanger "load slot 11, drive 1", status is OK.
01-Nov 23:23 baculasrv-sd JobId 10967: Recycled volume "345AAJL6" on device "IBM_TS3200_2_drive1" (/dev/nst0), all previous data lost.
01-Nov 23:23 baculasrv-dir JobId 10967: Max Volume jobs=1 exceeded. Marking Volume "345AAJL6" as Used.
01-Nov 23:23 baculasrv-sd JobId 10967: New volume "345AAJL6" mounted on device "IBM_TS3200_2_drive1" (/dev/nst0) at 01-Nov-2014 23:23.
02-Nov 10:21 baculasrv-sd JobId 10967: End of Volume "345AAJL6" at 2554:12632 on device "IBM_TS3200_2_drive1" (/dev/nst0). Write of 64512 bytes got -1.
02-Nov 10:22 baculasrv-sd JobId 10967: Re-read of last block succeeded.
02-Nov 10:22 baculasrv-sd JobId 10967: End of medium on Volume "345AAJL6" Bytes=2,554,651,459,584 Blocks=39,599,631 at 02-Nov-2014 10:22.
02-Nov 10:22 baculasrv-sd JobId 10967: 3307 Issuing autochanger "unload slot 11, drive 1" command.
02-Nov 10:24 baculasrv-dir JobId 10967: Recycled volume "163AAJL6"
02-Nov 10:24 baculasrv-sd JobId 10967: 3304 Issuing autochanger "load slot 22, drive 1" command.
02-Nov 10:29 baculasrv-sd JobId 10967: Fatal error: 3992 Bad autochanger "load slot 22, drive 1": ERR=Child died from signal 15: Termination.
Results=Program killed by Bacula (timeout)

02-Nov 10:22 sqcompose-fd JobId 10967: Fatal error: backup.c:1019 Network send error to SD. ERR=аЁаОаЕаДаИаНаЕаНаИаЕ баБб€аОбˆаЕаНаО аДб€бƒаГаОаЙ бб‚аОб€аОаНаОаЙ
02-Nov 10:29 baculasrv-sd JobId 10967: Elapsed time=45:43:17, Transfer rate=62.55 M Bytes/second
02-Nov 10:29 baculasrv-dir JobId 10967: Error: Bacula baculasrv-dir 5.2.13 (19Jan13):
  Build OS:               x86_64-unknown-linux-gnu debian 6.0.8
  JobId:                  10967
  Job:                    StorwizeBSRVData2BJob1.2014-10-31_12.35.06_34
  Backup Level:           Full
  Client:                 "storwize-b-srv-fd" 5.0.0 (26Jan10) x86_64-redhat-linux-gnu,redhat,Enterprise release
  FileSet:                "StorwizeBSRVData2-FS" 2014-05-29 15:24:24
  Pool:                   "BigBackupPool3" (From Job resource)
  Catalog:                "BackupCatalog" (From Client resource)
  Storage:                "baculasrv-IBM_TS3200_Autoldr2" (From Job resource)
  Scheduled time:         31-Oct-2014 12:34:50
  Start time:             31-Oct-2014 12:35:08
  End time:               02-Nov-2014 10:29:50
  Elapsed time:           1 day 21 hours 54 mins 42 secs
  Priority:               10
  FD Files Written:       181,053
  SD Files Written:       181,053
  FD Bytes Written:       10,296,099,095,544 (10.29 TB)
  SD Bytes Written:       10,296,128,383,049 (10.29 TB)
  Rate:                   62294.1 KB/s
  Software Compression:   None
  VSS:                    no
  Encryption:             no
  Accurate:               no
  Volume name(s):         346AAJL6|349AAJL6|225AAAL6|345AAJL6
  Volume Session Id:      291
  Volume Session Time:    1412326895
  Last Volume Bytes:      1 (1 B)
  Non-fatal FD errors:    0
  SD Errors:              1
  FD termination status:  Error
  SD termination status:  Error
  Termination:            *** Backup Error ***

mtx-changeg debug logs:

20141102-10:23:28 Parms: /dev/tape/by-id/scsi-1IBM_3573-TL_00L4U78T6368_LL0 unload 11 /dev/nst0 1
20141102-10:23:28 Doing mtx -f /dev/tape/by-id/scsi-1IBM_3573-TL_00L4U78T6368_LL0 unload 11 1
20141102-10:24:38 Parms: /dev/tape/by-id/scsi-1IBM_3573-TL_00L4U78T6368_LL0 loaded 22 /dev/nst0 1
20141102-10:24:38 Doing mtx -f /dev/tape/by-id/scsi-1IBM_3573-TL_00L4U78T6368_LL0 1 -- to find what is loaded
20141102-10:24:39 Parms: /dev/tape/by-id/scsi-1IBM_3573-TL_00L4U78T6368_LL0 loaded 22 /dev/nst0 1
20141102-10:24:39 Doing mtx -f /dev/tape/by-id/scsi-1IBM_3573-TL_00L4U78T6368_LL0 1 -- to find what is loaded
20141102-10:24:39 Parms: /dev/tape/by-id/scsi-1IBM_3573-TL_00L4U78T6368_LL0 load 22 /dev/nst0 1
20141102-10:24:39 Doing mtx -f /dev/tape/by-id/scsi-1IBM_3573-TL_00L4U78T6368_LL0 load 22 1
20141102-10:40:38 Parms: /dev/tape/by-id/scsi-1IBM_3573-TL_00L4U78T6368_LL0 unload 2 /dev/nst1 0
20141102-10:40:38 Doing mtx -f /dev/tape/by-id/scsi-1IBM_3573-TL_00L4U78T6368_LL0 unload 2 0
20141102-10:41:50 Parms: /dev/tape/by-id/scsi-1IBM_3573-TL_00L4U78T6368_LL0 loaded 22 /dev/nst0 1
20141102-10:41:50 Doing mtx -f /dev/tape/by-id/scsi-1IBM_3573-TL_00L4U78T6368_LL0 1 -- to find what is loaded
20141102-10:41:50 Parms: /dev/tape/by-id/scsi-1IBM_3573-TL_00L4U78T6368_LL0 loaded 22 /dev/nst1 0
20141102-10:41:50 Doing mtx -f /dev/tape/by-id/scsi-1IBM_3573-TL_00L4U78T6368_LL0 0 -- to find what is loaded
20141102-10:41:51 Parms: /dev/tape/by-id/scsi-1IBM_3573-TL_00L4U78T6368_LL0 loaded 22 /dev/nst1 0
20141102-10:41:51 Doing mtx -f /dev/tape/by-id/scsi-1IBM_3573-TL_00L4U78T6368_LL0 0 -- to find what is loaded
20141102-10:41:51 Parms: /dev/tape/by-id/scsi-1IBM_3573-TL_00L4U78T6368_LL0 unload 22 /dev/nst0 1
20141102-10:41:51 Doing mtx -f /dev/tape/by-id/scsi-1IBM_3573-TL_00L4U78T6368_LL0 unload 22 1
20141102-10:42:43 Parms: /dev/tape/by-id/scsi-1IBM_3573-TL_00L4U78T6368_LL0 load 22 /dev/nst1 0
20141102-10:42:43 Doing mtx -f /dev/tape/by-id/scsi-1IBM_3573-TL_00L4U78T6368_LL0 load 22 0
20141102-23:34:58 Parms: /dev/tape/by-id/scsi-1IBM_3573-TL_00L4U78T6368_LL0 unload 22 /dev/nst1 0
20141102-23:34:58 Doing mtx -f /dev/tape/by-id/scsi-1IBM_3573-TL_00L4U78T6368_LL0 unload 22 0
20141102-23:36:12 Parms: /dev/tape/by-id/scsi-1IBM_3573-TL_00L4U78T6368_LL0 loaded 22 /dev/nst1 0
20141102-23:36:12 Doing mtx -f /dev/tape/by-id/scsi-1IBM_3573-TL_00L4U78T6368_LL0 0 -- to find what is loaded
20141102-23:36:12 Parms: /dev/tape/by-id/scsi-1IBM_3573-TL_00L4U78T6368_LL0 loaded 22 /dev/nst1 0
20141102-23:36:12 Doing mtx -f /dev/tape/by-id/scsi-1IBM_3573-TL_00L4U78T6368_LL0 0 -- to find what is loaded
20141102-23:36:12 Parms: /dev/tape/by-id/scsi-1IBM_3573-TL_00L4U78T6368_LL0 loaded 22 /dev/nst0 1
20141102-23:36:12 Doing mtx -f /dev/tape/by-id/scsi-1IBM_3573-TL_00L4U78T6368_LL0 1 -- to find what is loaded
20141102-23:36:13 Parms: /dev/tape/by-id/scsi-1IBM_3573-TL_00L4U78T6368_LL0 load 22 /dev/nst1 0
20141102-23:36:13 Doing mtx -f /dev/tape/by-id/scsi-1IBM_3573-TL_00L4U78T6368_LL0 load 22 0
20141102-23:36:53 Parms: /dev/tape/by-id/scsi-1IBM_3573-TL_00L4U78T6368_LL0 unload 22 /dev/nst1 0
20141102-23:36:53 Doing mtx -f /dev/tape/by-id/scsi-1IBM_3573-TL_00L4U78T6368_LL0 unload 22 0
20141102-23:37:38 Parms: /dev/tape/by-id/scsi-1IBM_3573-TL_00L4U78T6368_LL0 loaded 1 /dev/nst1 0
20141102-23:37:38 Doing mtx -f /dev/tape/by-id/scsi-1IBM_3573-TL_00L4U78T6368_LL0 0 -- to find what is loaded
20141102-23:37:38 Parms: /dev/tape/by-id/scsi-1IBM_3573-TL_00L4U78T6368_LL0 loaded 1 /dev/nst0 1
20141102-23:37:38 Doing mtx -f /dev/tape/by-id/scsi-1IBM_3573-TL_00L4U78T6368_LL0 1 -- to find what is loaded
20141102-23:37:39 Parms: /dev/tape/by-id/scsi-1IBM_3573-TL_00L4U78T6368_LL0 load 1 /dev/nst1 0
20141102-23:37:39 Doing mtx -f /dev/tape/by-id/scsi-1IBM_3573-TL_00L4U78T6368_LL0 load 1 0


At the time library was performing cleaning.

Hi,
I've already tried this(I've increased timeout to 1800), but it nothing changed.

Mr. Andrey,

/etc/bacula/scripts/mtx-changer:

wait_for_drive() {
i=0
while [ $i -le 300 ]; do        # Wait max 300 seconds
if mt -f $1 status 2>&1 | grep "${ready}" >/dev/null 2>&1; then
break
fi
debug "Device $1 - not ready, retrying..."
sleep 1
i=`expr $i + 1`
done
}


Regards,
=========================================================================
Heitor Medrado de Faria
Faltam poucos dias - Treinamento Telepresencial Bacula: http://www.bacula.com.br/?p=2174
==========================================================================


De: "Andrey Chebotarev" <a AT 525 DOT su>
Para: "Dan Langille" <dan AT langille DOT org>
Cc: bacula-devel AT lists.sourceforge DOT net, "bacula-users" <bacula-users AT lists.sourceforge DOT net>
Enviadas: Segunda-feira, 10 de novembro de 2014 10:24:48
Assunto: Re: [Bacula-devel] Load slots timeout

I wanted to change the code by myself. I just wanted to know where is in code place where defined 5 minutes timeout.




On Nov 10, 2014, at 3:33 AM, Andrey Chebotarev <a AT 525 DOT su> wrote:

Hi.
It's in devel because the question about editing sources, but not about configuration.
And I think solve the question could only developer.

The conclusion to change the code is premature. 



On Nov 4, 2014, at 6:55 AM, Andrey Chebotarev <a AT 525 DOT su> wrote:


Hi guys.

I use bacula 5.2.13 with IBM TS3200 library.

Periodically I face problem with load slot time out. In logs it looks lilke:

22-Sep 00:35 baculasrv-dir JobId 10471: Recycled volume "220AAAL6"
22-Sep 00:35 baculasrv-sd JobId 10471: 3304 Issuing autochanger "load slot 1, drive 1" command.
22-Sep 00:40 baculasrv-sd JobId 10471: Fatal error: 3992 Bad autochanger "load slot 1, drive 1": ERR=Child died from signal 15: Termination.
Results=Program killed by Bacula (timeout)
22-Sep 00:32 sqcompose-fd JobId 10471: Fatal error: backup.c:1019 Network send error to SD.

As I understood, bacula tries to load slot for 5 minutes and if it's not successful stops job.
I started investigation why bacula sometimes doesn't manage to load slot in 5 minutes and find out library stops responding to commands and starts cleaning drive procedure which takes more than 5 minutes(Library was configured to clean drive automatically)

So how can I solve the problem?
I've found place in mtx-changer script where declared 300 seconds to wait. I've increased it value to 1800 seconds. Is it only place or I have to change somthing else in sources?


Hi everybody.
Increasing timeout in mtx-changer script hasn't helped, I have the same issue. Where in sources I can increase timeout to 30 minutes?

I don’t see why this is on devel  It should be on users, which is cc’d here.





------------------------------------------------------------------------------

_______________________________________________
Bacula-devel mailing list
Bacula-devel AT lists.sourceforge DOT net
https://lists.sourceforge.net/lists/listinfo/bacula-devel




------------------------------------------------------------------------------


_______________________________________________
Bacula-devel mailing list
Bacula-devel AT lists.sourceforge DOT net
https://lists.sourceforge.net/lists/listinfo/bacula-devel



------------------------------------------------------------------------------


_______________________________________________
Bacula-devel mailing list
Bacula-devel AT lists.sourceforge DOT net
https://lists.sourceforge.net/lists/listinfo/bacula-devel



------------------------------------------------------------------------------


_______________________________________________
Bacula-devel mailing list
Bacula-devel AT lists.sourceforge DOT net
https://lists.sourceforge.net/lists/listinfo/bacula-devel



------------------------------------------------------------------------------


_______________________________________________
Bacula-devel mailing list
Bacula-devel AT lists.sourceforge DOT net
https://lists.sourceforge.net/lists/listinfo/bacula-devel

------------------------------------------------------------------------------
_______________________________________________
Bacula-users mailing list
Bacula-users AT lists.sourceforge DOT net
https://lists.sourceforge.net/lists/listinfo/bacula-users
<Prev in Thread] Current Thread [Next in Thread>