Bacula-users

Re: [Bacula-users] job scheduling snafoo

2010-12-15 10:21:15
Subject: Re: [Bacula-users] job scheduling snafoo
From: Hugo <mail AT hugo DOT ro>
To: Martin Simmons <martin AT lispworks DOT com>
Date: Tue, 14 Dec 2010 16:54:12 +0100
On Tue, Dec 14, 2010 at 12:31:07PM +0000, Martin Simmons wrote:
> >>>>> On Tue, 14 Dec 2010 09:26:40 +0100, Hugo  said:
> > 
> > hi list
> > 
> > I had to do a full backup of our archive, which is about 5.5 TB big. I ran 
> > into 
> > some problems a few times, like the hardcoded 6 days job running limit, so 
> > I 
> > split the archive, moved one of the folders out of my backup dir and 
> > started the 
> > job. after that, I moved it back, and started the backup again, thinking 
> > that 
> > bacula will back it up incrementally. I was wrong about that, since it is 
> > an 
> > archive, so the dates on the files are all older than the date the last job 
> > was 
> > run. then I tried an "Accurate = on" on the job, but I got a lot of DB 
> > errors like:
> > ----
> > 10-Dec 13:20 appendix-dir JobId 80: Fatal error: sql_get.c:998 
> > sql_get.c:998 
> > query SELECT 
> > MediaId,VolumeName,VolJobs,VolFiles,VolBlocks,VolBytes,VolMounts,VolErrors,VolWrites,MaxVolBytes,VolCapacityBytes,MediaType,VolStatus,PoolId,VolRetention,VolUseDuration,MaxVolJobs,MaxVolFiles,Recycle,Slot,FirstWritten,LastWritten,InChanger,EndFile,EndBlock,VolParts,LabelType,LabelDate,StorageId,Enabled,LocationId,RecycleCount,InitialWrite,ScratchPoolId,RecyclePoolId,VolReadTime,VolWriteTime,ActionOnPurge
> >  
> > FROM Media WHERE VolumeName='BA1014L2' failed:
> > server sent data ("D" message) without prior row description ("T" message)
> > could not receive data from server: Operation now in progress
> > ----
> > I think they happened because the SELECT took a really long time (2.11min.) 
> > to 
> > get the backed up files from the DB (lots of files, 3.22TB) and bacula 
> > somehow 
> > choked on the data.
> > so, I decided to set "Accurate = no" back again, and touch all the files in 
> > the 
> > folder i moved back (1.9TB) to the archive dir to 2010.12.12 19:44. I then 
> > manually started a job this sunday at 2010.12.12 19:55. it completed 
> > yesterday 
> > 2010.12.13 at 22:17.
> > my problem is, since it should be an incremental archive backup, I 
> > scheduled it 
> > to run on mondays at 06:66 AM, so the job scheduled itself again while it 
> > was 
> > running. it was a funny listed job, it kept throwing an error if I tried to 
> > get 
> > details, saying that I should try to list the jobs with the option 
> > "catalog=all". the manually called job completed successfully, but the 
> > other 
> > one, auto-scheduled on 6AM a day before, started just after that. now I 
> > canceled 
> > it, but it already wrote on 4 tapes (LTO2).
> > I suspect it started because bacula thought that the touched files weren't 
> > saved, since it didn't yet complete the job.
> > shouldn't bacula take the last file dates from when the job starts, not 
> > when it 
> > was scheduled?
> 
> It does take the date from when it starts, but I suspect the second job
> started and was then suspended until the first job finished.  It doesn't
> reread the date in that case.
> 
> 
> >                what should I do with the last 4 written tapes? shold I now 
> > purge 
> > the files written by the job, or even delete the job? or just ignore it, 
> > and 
> > live with the fact that i wasted 4 tapes..
> > I just want to bring the tapes to the bank and be done with it already :)
> 
> Yes, you can use the delete jobid command to delete the unwanted incremental
> job and the purge volume command to make the 4 tapes reusable.
> 
> However, I would seriously worry about the integrity of the backup, given this
> confusion about dates.  Consider redoing everything with two separate jobs and
> two separate filesets, split so that the jobs are small enough to run in a
> reasonable time.
> 
> __Martin
> 
> ------------------------------------------------------------------------------
> Lotusphere 2011
> Register now for Lotusphere 2011 and learn how
> to connect the dots, take your collaborative environment
> to the next level, and enter the era of Social Business.
> http://p.sf.net/sfu/lotusphere-d2d
> _______________________________________________
> Bacula-users mailing list
> Bacula-users AT lists.sourceforge DOT net
> https://lists.sourceforge.net/lists/listinfo/bacula-users
> 
hmm, I decided to let it be, and ignore the 4 tapes :)
I started the job again, and it made a correct backup of
the files that had the date newer than the last time the
job was run, so everithing is ok so far.
but...
i wanted to test a restore from one job, and then "the bat"
just dies, with a lot of errors from DB SELECT statements, 
like the one I described before.
I get the same error when I try to restore from bconsole:

---
5: Select the most recent backup for a client
Select item:  (1-13): 5
Automatically selected Client: appendix-fd
Automatically selected FileSet: Archiv
+-------+-------+-----------+-------------------+---------------------+------------+
| jobid | level | jobfiles  | jobbytes          | starttime           | 
volumename |
+-------+-------+-----------+-------------------+---------------------+------------+
|    84 | F     | 1,297,329 | 3,462,784,586,845 | 2010-12-10 18:00:02 | 
BA1000L2   |
|    84 | F     | 1,297,329 | 3,462,784,586,845 | 2010-12-10 18:00:02 | 
BA1001L2   |
|    84 | F     | 1,297,329 | 3,462,784,586,845 | 2010-12-10 18:00:02 | 
BA1002L2   |
|    84 | F     | 1,297,329 | 3,462,784,586,845 | 2010-12-10 18:00:02 | 
BA1003L2   |
|    84 | F     | 1,297,329 | 3,462,784,586,845 | 2010-12-10 18:00:02 | 
BA1004L2   |
|    84 | F     | 1,297,329 | 3,462,784,586,845 | 2010-12-10 18:00:02 | 
BA1005L2   |
|    84 | F     | 1,297,329 | 3,462,784,586,845 | 2010-12-10 18:00:02 | 
BA1006L2   |
|    84 | F     | 1,297,329 | 3,462,784,586,845 | 2010-12-10 18:00:02 | 
BA1007L2   |
|    84 | F     | 1,297,329 | 3,462,784,586,845 | 2010-12-10 18:00:02 | 
BA1008L2   |
|    84 | F     | 1,297,329 | 3,462,784,586,845 | 2010-12-10 18:00:02 | 
BA1009L2   |
|    84 | F     | 1,297,329 | 3,462,784,586,845 | 2010-12-10 18:00:02 | 
BA1010L2   |
|    84 | F     | 1,297,329 | 3,462,784,586,845 | 2010-12-10 18:00:02 | 
BA1011L2   |
|    84 | F     | 1,297,329 | 3,462,784,586,845 | 2010-12-10 18:00:02 | 
BA1012L2   |
|    84 | F     | 1,297,329 | 3,462,784,586,845 | 2010-12-10 18:00:02 | 
BA1013L2   |
|    85 | I     |       379 |        94,578,029 | 2010-12-12 13:42:55 | 
BA1013L2   |
|    86 | I     |       380 |        94,600,045 | 2010-12-12 14:20:34 | 
BA1013L2   |
|    87 | I     |       409 |        99,082,925 | 2010-12-12 17:15:27 | 
BA1013L2   |
|    88 | I     |       380 |        94,578,029 | 2010-12-12 17:58:30 | 
BA1013L2   |
|    89 | I     |       378 |        94,578,029 | 2010-12-12 18:18:40 | 
BA1013L2   |
|    90 | I     |   402,618 | 2,062,750,781,319 | 2010-12-12 19:56:01 | 
BA1013L2   |
|    90 | I     |   402,618 | 2,062,750,781,319 | 2010-12-12 19:56:01 | 
BA1014L2   |
|    90 | I     |   402,618 | 2,062,750,781,319 | 2010-12-12 19:56:01 | 
BA1015L2   |
|    90 | I     |   402,618 | 2,062,750,781,319 | 2010-12-12 19:56:01 | 
BA1016L2   |
|    90 | I     |   402,618 | 2,062,750,781,319 | 2010-12-12 19:56:01 | 
BA1017L2   |
|    90 | I     |   402,618 | 2,062,750,781,319 | 2010-12-12 19:56:01 | 
BA1018L2   |
|    90 | I     |   402,618 | 2,062,750,781,319 | 2010-12-12 19:56:01 | 
BA1019L2   |
|    90 | I     |   402,618 | 2,062,750,781,319 | 2010-12-12 19:56:01 | 
BA1020L2   |
|    90 | I     |   402,618 | 2,062,750,781,319 | 2010-12-12 19:56:01 | 
BA1021L2   |
|    92 | I     |       378 |        94,578,029 | 2010-12-14 09:19:17 | 
BA1024L2   |
+-------+-------+-----------+-------------------+---------------------+------------+
You have selected the following JobIds: 84,85,86,87,88,89,90,92

Building directory tree for JobId(s) 84,85,86,87,88,89,90,92 ...
  Query failed: SELECT Path.Path, Filename.Name, Temp.FileIndex, 
Temp.JobId, LStat, MD5 FROM ( SELECT DISTINCT ON (FilenameId, 
PathId) StartTime, JobId, FileId, FileIndex, PathId, FilenameId, 
LStat, MD5 FROM (SELECT FileId, JobId, PathId, FilenameId, FileIndex,
 LStat, MD5 FROM File WHERE JobId IN (84,85,86,87,88,89,90,92) 
UNION ALL SELECT File.FileId, File.JobId, PathId, FilenameId, 
File.FileIndex, LStat, MD5 FROM BaseFiles JOIN File USING (FileId) 
WHERE BaseFiles.JobId IN (84,85,86,87,88,89,90,92) ) 
AS T JOIN Job USING (JobId) ORDER BY FilenameId, PathId, StartTime DESC
 ) AS Temp JOIN Filename ON (Filename.FilenameId = Temp.FilenameId) 
JOIN Path ON (Path.PathId = Temp.PathId) WHERE FileIndex > 0 ORDER 
BY Temp.JobId, FileIndex ASC: 
 ERR=could not receive data from server: Operation now in progress

For one or more of the JobIds selected, no files were found,
so file selection is not possible.
Most likely your retention policy pruned the files.

Do you want to restore all the files? (yes|no): no
---

what seems to be the problem here? is the director
choking on the SQL data from the server? it is a rather
large response, the postgreSQL server needs around 2 minutes to
answer it, and if I pipe it into a file, it is a few hundreds 
of MB big..
I tested bacula before, with another dir, and I succeeded in restoring
both with file selection (the bat under windows) and a full restore.
this is pretty unconvenient, if we ever have to restore something
from the archive, and we know only the file/dir name. 
perhaps it runs ok if we restore a full job, and it doesn't have to list
the files first, but if not, I would have to ask the file list from the 
postgres first, and then restore only those files with method:
    11: Enter a list of directories to restore for found JobIds
that sucks :(

greets
hugo.-


------------------------------------------------------------------------------
Lotusphere 2011
Register now for Lotusphere 2011 and learn how
to connect the dots, take your collaborative environment
to the next level, and enter the era of Social Business.
http://p.sf.net/sfu/lotusphere-d2d
_______________________________________________
Bacula-users mailing list
Bacula-users AT lists.sourceforge DOT net
https://lists.sourceforge.net/lists/listinfo/bacula-users

<Prev in Thread] Current Thread [Next in Thread>