Bacula-users

[Bacula-users] 2.4.1 SD freezes frequently

2008-08-04 05:16:24
Subject: [Bacula-users] 2.4.1 SD freezes frequently
From: Leszek Urbański <tygrys AT moo DOT pl>
To: bacula-users AT lists.sourceforge DOT net
Date: Mon, 4 Aug 2008 11:16:01 +0200
Hello,

I've upgraded from bacula 2.2.8 to 2.4.1 and the SD freezes frequently.

System: Debian, i386
Bacula versions: dir: 2.4.1, sd: 2.4.1, all fds: 2.2.7.
Tape library: Dell ML6020 2-drive autochanger.


Once every 4 to 8 jobs bacula-sd freezes up. A new job starts and then:

- the job is listed as "running" in "status dir".

- "status storage" shows that no data has been received from the fd yet:

*status storage=ML6020
Connecting to Storage daemon ML6020 at backup1:9103

backup1-sd Version: 2.4.1 (07 July 2008) i486-pc-linux-gnu debian
Daemon started 04-Aug-08 10:34, 4 Jobs run since started.
 Heap: heap=413,696 smbytes=266,548 max_bytes=333,225 bufs=151 max_bufs=175
 Sizes: boffset_t=8 size_t=4 int32_t=4 int64_t=8
 
 Running Jobs:
 Writing: Incremental Backup job SomeJob JobId=17965 Volume="000044"
     pool="DailyPoolA" device="Drive-2" (/dev/nst1)
     spooling=0 despooling=0 despool_wait=0
     Files=0 Bytes=0 Bytes/sec=0
     FDReadSeqNo=6 in_msg=6 out_msg=4 fd=7
====
                

- "status storage" output waits indefinitely at this point:

In Use Volume status:


- at this point the only thing that helps is killing (SIGTERM) bacula-sd and
restarting it.


- below is a level 200 debug output from SD:

backup1-sd: dircmd.c:173-0 Conn: Hello Director backup1-dir calling
backup1-sd: dircmd.c:188-0 Got a DIR connection at 04-Aug-2008 01:01:29
backup1-sd: jcr.c:603-0 OnEntry JobStatus=backup1-sd: jcr.c:623-0 OnExit 
JobStatus=C set=C
backup1-sd: cram-md5.c:73-0 send: auth cram-md5 <cut> ssl=0
backup1-sd: cram-md5.c:133-0 cram-get received: auth cram-md5 <cut> ssl=0
backup1-sd: cram-md5.c:152-0 sending resp to challenge: cut
backup1-sd: dircmd.c:210-0 Message channel init completed.
backup1-sd: dircmd.c:217-0 <dird: JobId=17914 job=Job.2008-08-04_00.30.15 
job_name=Job client_name=Client type=66 level=73 FileSet=ClientRootFileSet 
NoAttr=0 SpoolAttr=0 FileSetMD5=80/qX7cGYW+Id7AWRR/ZpA SpoolData=0 
WritePartAfterJob=1 PreferMountedVols=0
backup1-sd: dircmd.c:231-0 Do command: JobId=
backup1-sd: job.c:92-0 <dird: JobId=17914 job=Job.2008-08-04_00.30.15 
job_name=Job client_name=client type=66 level=73 FileSet=ClientRootFileSet 
NoAttr=0 SpoolAttr=0 FileSetMD5=80/qX7cGYW+Id7AWRR/ZpA SpoolData=0 
WritePartAfterJob=1 PreferMountedVols=0
backup1-sd: job.c:156-0 >dird jid=17914: 3000 OK Job SDid=46 SDtime=1217790409 
Authorization=cut
backup1-sd: pythonlib.c:237-0 No startup module.
backup1-sd: dircmd.c:217-0 <dird: use storage=ML6020 media_type=LTO3 
pool_name=DailyPoolA pool_type=Backup append=1 copy=0 stripe=0

backup1-sd: dircmd.c:231-0 Do command: use storage=
backup1-sd: reserve.c:695-0 jid=17914 <dird: use storage=ML6020 media_type=LTO3 
pool_name=DailyPoolA pool_type=Backup append=1 copy=0 stripe=0
backup1-sd: reserve.c:724-0 jid=17914 <dird device: use device=Autochanger
backup1-sd: reserve.c:741-0 jid=17914 Storage=ML6020 media_type=LTO3 
pool=DailyPoolA pool_type=Backup append=1
backup1-sd: reserve.c:743-0 jid=17914     Device=Autochanger
backup1-sd: reserve.c:919-0 jid=17914 PrefMnt=0 exact=0 suitable=0 chgronly=1
backup1-sd: reserve.c:1084-0 jid=17914 search res for Autochanger
backup1-sd: reserve.c:1087-0 jid=17914 Try match changer res=Autochanger
backup1-sd: reserve.c:1098-0 jid=17914 Try changer device Drive-1
backup1-sd: reserve.c:1161-0 jid=17914 chk MediaType device=LTO3 request=LTO3
backup1-sd: reserve.c:1184-0 try reserve Drive-1
backup1-sd: reserve.c:1198-0 jid=17914 have_vol=0 vol=
backup1-sd: reserve.c:1381-0 jid=17914 reserve_append device is "Drive-1" 
(/dev/nst0)
backup1-sd: reserve.c:1459-0 jid=17914 PrefMnt=0 exact=0 suitable=1 chgronly=1 
any=0
backup1-sd: reserve.c:1430-0 MaxJobs=0 Jobs=0 reserves=0 Status= Vol=
backup1-sd: reserve.c:1406-0 OK dev: "Drive-1" (/dev/nst0) num_writers=0, 
reserved, pool matches
backup1-sd: reserve.c:481-0 Inc reserve=0 dev="Drive-1" (/dev/nst0)
backup1-sd: reserve.c:1208-0 jid=17914 Reserved=1 dev_name=Autochanger 
mediatype=LTO3 pool=DailyPoolA ok=1
backup1-sd: askdir.c:259-0 dir_find_next_appendable_volume: reserved=1 Vol=
backup1-sd: askdir.c:276-0 >dird CatReq Job=Job.2008-08-04_00.30.15 FindMedia=1 
pool_name=DailyPoolA media_type=LTO3
backup1-sd: askdir.c:182-0 <dird 1000 OK VolName=000044 VolJobs=349 
VolFiles=902 VolBlocks=10255803 VolBytes=661622427648 VolMounts=453 VolErrors=0 
VolWrites=140167474 MaxVolBytes=0 VolCapacityBytes=0 VolStatus=Append Slot=4 
MaxVolJobs=0 MaxVolFiles=0 InChanger=1 VolReadTime=44066182 
VolWriteTime=17330364980 EndFile=901 EndBlock=3554 VolParts=0 LabelType=0 
MediaId=4
backup1-sd: askdir.c:205-0 do_reqest_vol_info return true slot=4 Volume=000044
backup1-sd: reserve.c:472-0 jid=17914 find_vol=000044 found=1
backup1-sd: reserve.c:189-0 jid=17914 List find_volume: 000041 in_use=0 on 
device "Drive-1" (/dev/nst0)
backup1-sd: reserve.c:189-0 jid=17914 List find_volume: 000044 in_use=0 on 
device "Drive-2" (/dev/nst1)
backup1-sd: reserve.c:645-0 jid=17914 Vol=000044 on "Drive-2" (/dev/nst1) we 
have "Drive-1" (/dev/nst0)
backup1-sd: reserve.c:649-0 jid=17914 Vol=000044 dev="Drive-2" (/dev/nst1) not 
busy.
backup1-sd: askdir.c:285-0 Call reserve_volume. Vol=000044
backup1-sd: reserve.c:316-0 jid=17914 enter reserve_volume=000044 
drive="Drive-1" (/dev/nst0)
backup1-sd: reserve.c:189-0 jid=17914 List begin reserve_volume: 000041 
in_use=0 on device "Drive-1" (/dev/nst0)
backup1-sd: reserve.c:189-0 jid=17914 List begin reserve_volume: 000044 
in_use=0 on device "Drive-2" (/dev/nst1)
backup1-sd: reserve.c:331-0 jid=17914 Vol attached=000041, newvol=000044 
volinuse=0 on "Drive-1" (/dev/nst0)
backup1-sd: reserve.c:348-0 jid=17914 reserve_vol free vol=000041 at 9e5cea8
backup1-sd: reserve.c:582-0 === clear in_use vol=000041
backup1-sd: reserve.c:585-0 jid=17914 === remove volume 000041 dev="Drive-1" 
(/dev/nst0)
backup1-sd: reserve.c:189-0 jid=17914 List free_volume: 000044 in_use=0 on 
device "Drive-2" (/dev/nst1)
backup1-sd: reserve.c:189-0 jid=17914 List reserve_vol free: 000044 in_use=0 on 
device "Drive-2" (/dev/nst1)
backup1-sd: reserve.c:236-0 jid=17914 new Vol=000044 at 9e5cea8 dev="Drive-1" 
(/dev/nst0)
backup1-sd: reserve.c:363-0 jid=17914 Found vol=000044 dev-same=0
backup1-sd: reserve.c:370-0 jid=17914 reserve_vol free-tmp vol=000044 at 9e65520
backup1-sd: reserve.c:388-0 ==== jid=17914 Swap vol=000044 from dev="Drive-2" 
(/dev/nst1) to "Drive-1" (/dev/nst0)
backup1-sd: reserve.c:575-0 jid=17914 No vol on dev "Drive-1" (/dev/nst0)
backup1-sd: reserve.c:418-0 jid=17914 === set in_use. vol=000044 dev="Drive-1" 
(/dev/nst0)
backup1-sd: reserve.c:189-0 jid=17914 List end new volume: 000044 in_use=1 on 
device "Drive-1" (/dev/nst0)
backup1-sd: askdir.c:292-0 dir_find_next_appendable_volume return true. 
vol=000044
backup1-sd: reserve.c:1220-0 jid=17914 looking for Volume=000044
backup1-sd: reserve.c:1279-0 jid=17914 >dird changer: 3000 OK use device 
device=Drive-1
backup1-sd: reserve.c:1107-0 jid=17914 Device Drive-1 reserved=1 for append.
backup1-sd: reserve.c:1053-0 jid=17914 available device found=Autochanger
backup1-sd: reserve.c:1068-0 OK dev found. Vol=000044
backup1-sd: dircmd.c:217-0 <dird: run
backup1-sd: dircmd.c:231-0 Do command: run
backup1-sd: job.c:171-0 Run_cmd: run
backup1-sd: jcr.c:603-0 OnEntry JobStatus=C set=F
backup1-sd: jcr.c:623-0 OnExit JobStatus=F set=F
backup1-sd: job.c:190-0 Job.2008-08-04_00.30.15 waiting 1800 sec for FD to 
contact SD key=cut
backup1-sd: bnet.c:667-0 who=client host=10.0.0.1 port=36643
backup1-sd: dircmd.c:173-0 Conn: Hello Start Job Job.2008-08-04_00.30.15
backup1-sd: dircmd.c:179-0 Got a FD connection at 04-Aug-2008 01:01:29
backup1-sd: job.c:237-0 Found Job Job.2008-08-04_00.30.15
backup1-sd: cram-md5.c:73-0 send: auth cram-md5 <cut> ssl=0
backup1-sd: cram-md5.c:133-0 cram-get received: auth cram-md5 <cut> ssl=0
backup1-sd: cram-md5.c:152-0 sending resp to challenge: cut
backup1-sd: job.c:254-0 OK Authentication jid=17914 Job Job.2008-08-04_00.30.15
backup1-sd: job.c:209-0 Running job Job.2008-08-04_00.30.15
backup1-sd: fd_cmds.c:122-0 Start run Job=Job.2008-08-04_00.30.15
backup1-sd: jcr.c:603-0 OnEntry JobStatus=F set=R
backup1-sd: jcr.c:623-0 OnExit JobStatus=R set=R
backup1-sd: fd_cmds.c:160-0 <filed: append open session
backup1-sd: fd_cmds.c:228-0 Append open session: append open session
backup1-sd: fd_cmds.c:238-0 >filed: 3000 OK open ticket = 46
backup1-sd: fd_cmds.c:160-0 <filed: append data 46
backup1-sd: fd_cmds.c:190-0 Append data: append data 46
backup1-sd: fd_cmds.c:192-0 <bfiled: append data 46
backup1-sd: append.c:73-0 Start append data. res=1
backup1-sd: acquire.c:366-0 acquire_append device is tape
backup1-sd: acquire.c:398-0 jid=17914 Do mount_next_write_vol
backup1-sd: mount.c:77-0 Enter mount_next_volume(release=0) dev="Drive-1" 
(/dev/nst0)
backup1-sd: mount.c:88-0 mount_next_vol retry=0
backup1-sd: mount.c:520-0 Swap unloading slot=4 "Drive-2" (/dev/nst1)
backup1-sd: acquire.c:539-0 JobId=17913 broadcast wait_device_release at 
04-Aug-2008 01:01:32
backup1-sd: bnet.c:667-0 who=client host=10.0.0.2 port=36643
backup1-sd: dircmd.c:173-0 Conn: Hello Director backup1-dir calling
backup1-sd: dircmd.c:188-0 Got a DIR connection at 04-Aug-2008 01:10:01
backup1-sd: jcr.c:603-0 OnEntry JobStatus=backup1-sd: jcr.c:623-0 OnExit 
JobStatus=C set=C
backup1-sd: cram-md5.c:73-0 send: auth cram-md5 <cut> ssl=0
backup1-sd: cram-md5.c:133-0 cram-get received: auth cram-md5 <cut> ssl=0
backup1-sd: cram-md5.c:152-0 sending resp to challenge: cut
backup1-sd: dircmd.c:210-0 Message channel init completed.
backup1-sd: dircmd.c:217-0 <dird: status
backup1-sd: dircmd.c:231-0 Do command: status

And that's it. After that, the SD hangs.



-- 
Leszek "Tygrys" Urbański, SCSA, SCNA
 "UN and OAS, they have their place, I guess,
  But first, send the Marines!" -- Tom Lehrer
     http://cygnus.moo.pl/ -- Cygnus High Altitude Balloon

-------------------------------------------------------------------------
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK & win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100&url=/
_______________________________________________
Bacula-users mailing list
Bacula-users AT lists.sourceforge DOT net
https://lists.sourceforge.net/lists/listinfo/bacula-users

<Prev in Thread] Current Thread [Next in Thread>