Bacula-users

Re: [Bacula-users] ERROR Spooling/Backups with large amounts of data from windows server 2012

2013-11-21 11:03:20
Subject: Re: [Bacula-users] ERROR Spooling/Backups with large amounts of data from windows server 2012
From: lst_hoe02 AT kwsoft DOT de
To: bacula-users AT lists.sourceforge DOT net
Date: Thu, 21 Nov 2013 16:58:22 +0100
Zitat von Hans Thueminger <bacula AT ipf.tuwien.ac DOT at>:

> lst_hoe02 AT kwsoft DOT de wrote, On 21.11.2013 10:07:
>> [...]
>> With up to Windows 2008 R2 the supported volume size was 16TB with
>> Windows 2012 it is 64TB. Note that their are other constraints with
>> VSS when used with volumes containing many files or having heavy load
>> while doing snapshots. That said you should always be able to backup
>> without VSS, but open files get you in trouble in this case.
> I didn't find a way to make a backup without VSS, neither with the  
> in the operating system included Windows-Server-Backup program nor  
> with Bacula. But while writing this sentence I had an idea: What is  
> happening, if backing up a mount point instead of a drive letter?  
> Now I've just mounted the 120TB Filesystems as a mount point in C:  
> (which is a 300GB filesystem) and look at here:
>
> 21-Nov 12:49 bacula-sd JobId 292: Spooling data ...
> 21-Nov 12:49 fs2-fd JobId 292: Generate VSS snapshots. Driver="Win64  
> VSS", Drive(s)="C"
>
> and the status says:
>
> JobId 292 Job fs2-PHOTO.2013-11-21_12.46.00_11 is running.
>     VSS Full Backup Job started: 21-Nov-13 12:47
>     Files=31,223 Bytes=308,963,855,104 Bytes/sec=89,192,798 Errors=0
>     Files Examined=31,223
>     Processing file:  
> C:/PHOTO/Projects/PHOTO-Projects/09_ALS_Kaernten/GailtalLatschur/Dif-Gailtal-Latschur/s577_s592_p02.tif
>
> Projects is the mountpoint for the 120TB filesystem!
>
> By now trying to backup G:/PHOTO-Projects/ (which is the same 120TB  
> filesyste as above), I always received the following error:
>
> 17-Sep 15:54 fs2-fd JobId 36: Generate VSS snapshots. Driver="Win64  
> VSS", Drive(s)="G"
> 17-Sep 15:55 fs2-fd JobId 36: Fatal error: CreateSGenerate VSS  
> snapshots failed. ERR=The operation completed successfully.
>
> It seems, that this is a way to trick bacula and windows :-) Of  
> course with this workaround we still have the problems with open  
> files, but the actual problem which I want to discuss with you is  
> the error I receive after creating the second or third or sometimes  
> subsequent spoolfile:

Uhm, no. The idea was to set "enable vss = no", but as said this only  
works some sort of if the volume in question does not have open files.

>>> Now I'm glad to make backups and restores of files of our 4x40TB
>>> Filesystems with Bacula if they are not too big (< 5TB). That works
>>> fine. If they are too big (> 15TB) I always get an error after
>>> creating the second or third or sometimes subsequent spoolfile
>>> (Error: lib/bsock.c...). Never for the first spoolfile! I've tried
>>> several spool sizes (from 500GB to 16TB) and different network
>>> settings. As attachment (bacula_mailing_list_some_error_logs.txt)
>>> you can find some logs, when the error occurred. What I have also
>>> tried:
>>> - using different Networkinterfaces (at the moment an ethernet cable
>>> is directly connected (no switch) between the fileserver and the
>>> backupserver and this connection is used for the backups (checked
>>> with netstat))
>>> - heartbeat: enabling on the SD (60 seconds) and
>>> net.ipv4.tcp_keepalive_time also set to 60
>>> - AllowCompression Yes/No
>>> - many, many hours for trials
>> So you really have files with 15TB in size? What would be worth a try
>> is the following:
> Sorry, that was badly written of me. It's not the size of one file,  
> it's the size of all files. So what I wanted to write was: "if the  
> amount of files to be backuped is not too large (< 5TB) it works. If  
> the amount of the files to be backuped ist larger than 15TB it fails  
> always!"
>

This might point to another problem case. We had a similar problem on  
our main filer with backup until around 2TB succeeded, anything above  
had a ~50% to fail with network errors. We switched the NIC and use  
some Intel PlugIn card and the problem went away.


>>
>> - Increase the "maximum file size" for the tape drive. The default is
>> 1G and it limits how big the "blocks" on tape are between EOF markers.
>> Maybe the counter per file is an int and you therefore have trouble
>> with 15TB files?
> I guess the spool file is written as one file to the tape which  
> would mean, that for every spool file only one EOF marker would be  
> written? Can you confirm that, or are I'm wrong? What would you  
> suggest to set for "maximum file size"?

No, the spool file is only to decouple the streaming to tape from  
network and client speed and delays. There is not much buffering when  
writing to tape so the spool area need to be able to constantly  
deliever data faster tahn the tape can consume. It is still written  
with the "maximum block size" per transfer and a EOF marker every  
"maximum file size" to tape.

>> - You should increase the default block size written to tape with
>> "maximum block size" set to for example 2M. Warning: You could not
>> read already written tapes with non matching block sizes.
> Ok and thank you for the warning, that's rather good to know!
>
>> - The spool area doesn't need to be that big, but really fast to
>> saturate the tape drive and keep them streaming. Recommended is
>> something like fast SSD or similar.
> The tapes we are using have a native capacity of 4TB. I thought that  
> should be the lowest size of the spool size to prevent start and  
> stop operations of the drives. With the hardware compression  
> sometimes more than 9TB are written on a tape, so I decided the  
> ideal spool size is about 16TB?!  So where is my error in reasoning?
>

There is no problem in cutting the job in pieces of some GB with the  
tape being idle in between. The problem arise if you can deliever  
data, but not fast enough. So the tape drive try to reach its max.  
speed try to settle down if data are to slow and stop and rewind if  
nothing helps. To prevent this you need data chunks delieverable with  
(much) more than the tape can handle, so for this chunk you never has  
a "underun" of data. IMHO a smaller faster spool is always better than  
a bigger slower spool.

Regards

Andreas


------------------------------------------------------------------------------
Shape the Mobile Experience: Free Subscription
Software experts and developers: Be at the forefront of tech innovation.
Intel(R) Software Adrenaline delivers strategic insight and game-changing 
conversations that shape the rapidly evolving mobile landscape. Sign up now. 
http://pubads.g.doubleclick.net/gampad/clk?id=63431311&iu=/4140/ostg.clktrk
_______________________________________________
Bacula-users mailing list
Bacula-users AT lists.sourceforge DOT net
https://lists.sourceforge.net/lists/listinfo/bacula-users