Bacula-users

Re: [Bacula-users] ERROR Spooling/Backups with large amounts of data from windows server 2012

2013-11-21 12:32:05
Subject: Re: [Bacula-users] ERROR Spooling/Backups with large amounts of data from windows server 2012
From: Hans Thueminger <bacula AT ipf.tuwien.ac DOT at>
To: lst_hoe02 AT kwsoft DOT de
Date: Thu, 21 Nov 2013 18:28:54 +0100
lst_hoe02 AT kwsoft DOT de wrote, On 21.11.2013 16:58:
> Zitat von Hans Thueminger <bacula AT ipf.tuwien.ac DOT at>:
>
>> lst_hoe02 AT kwsoft DOT de wrote, On 21.11.2013 10:07:
>>> [...]
>>> With up to Windows 2008 R2 the supported volume size was 16TB with
>>> Windows 2012 it is 64TB. Note that their are other constraints with
>>> VSS when used with volumes containing many files or having heavy load
>>> while doing snapshots. That said you should always be able to backup
>>> without VSS, but open files get you in trouble in this case.
>> I didn't find a way to make a backup without VSS, neither with the
>> in the operating system included Windows-Server-Backup program nor
>> with Bacula. But while writing this sentence I had an idea: What is
>> happening, if backing up a mount point instead of a drive letter?
>> Now I've just mounted the 120TB Filesystems as a mount point in C:
>> (which is a 300GB filesystem) and look at here:
>>
>> 21-Nov 12:49 bacula-sd JobId 292: Spooling data ...
>> 21-Nov 12:49 fs2-fd JobId 292: Generate VSS snapshots. Driver="Win64
>> VSS", Drive(s)="C"
>>
>> and the status says:
>>
>> JobId 292 Job fs2-PHOTO.2013-11-21_12.46.00_11 is running.
>>      VSS Full Backup Job started: 21-Nov-13 12:47
>>      Files=31,223 Bytes=308,963,855,104 Bytes/sec=89,192,798 Errors=0
>>      Files Examined=31,223
>>      Processing file:
>> C:/PHOTO/Projects/PHOTO-Projects/09_ALS_Kaernten/GailtalLatschur/Dif-Gailtal-Latschur/s577_s592_p02.tif
>>
>> Projects is the mountpoint for the 120TB filesystem!
>>
>> By now trying to backup G:/PHOTO-Projects/ (which is the same 120TB
>> filesyste as above), I always received the following error:
>>
>> 17-Sep 15:54 fs2-fd JobId 36: Generate VSS snapshots. Driver="Win64
>> VSS", Drive(s)="G"
>> 17-Sep 15:55 fs2-fd JobId 36: Fatal error: CreateSGenerate VSS
>> snapshots failed. ERR=The operation completed successfully.
>>
>> It seems, that this is a way to trick bacula and windows :-) Of
>> course with this workaround we still have the problems with open
>> files, but the actual problem which I want to discuss with you is
>> the error I receive after creating the second or third or sometimes
>> subsequent spoolfile:
> Uhm, no. The idea was to set "enable vss = no", but as said this only
> works some sort of if the volume in question does not have open files.
Ok, now it's clear why it makes no difference to set Enable VSS=yes or 
no: There are always open files on these filesystems (they are already 
in use)...


>
>>>> Now I'm glad to make backups and restores of files of our 4x40TB
>>>> Filesystems with Bacula if they are not too big (< 5TB). That works
>>>> fine. If they are too big (> 15TB) I always get an error after
>>>> creating the second or third or sometimes subsequent spoolfile
>>>> (Error: lib/bsock.c...). Never for the first spoolfile! I've tried
>>>> several spool sizes (from 500GB to 16TB) and different network
>>>> settings. As attachment (bacula_mailing_list_some_error_logs.txt)
>>>> you can find some logs, when the error occurred. What I have also
>>>> tried:
>>>> - using different Networkinterfaces (at the moment an ethernet cable
>>>> is directly connected (no switch) between the fileserver and the
>>>> backupserver and this connection is used for the backups (checked
>>>> with netstat))
>>>> - heartbeat: enabling on the SD (60 seconds) and
>>>> net.ipv4.tcp_keepalive_time also set to 60
>>>> - AllowCompression Yes/No
>>>> - many, many hours for trials
>>> So you really have files with 15TB in size? What would be worth a try
>>> is the following:
>> Sorry, that was badly written of me. It's not the size of one file,
>> it's the size of all files. So what I wanted to write was: "if the
>> amount of files to be backuped is not too large (< 5TB) it works. If
>> the amount of the files to be backuped ist larger than 15TB it fails
>> always!"
>>
> This might point to another problem case. We had a similar problem on
> our main filer with backup until around 2TB succeeded, anything above
> had a ~50% to fail with network errors. We switched the NIC and use
> some Intel PlugIn card and the problem went away.
I thought I can exclude such hardware problems, because I have already 
tried different network interfaces. But good to know, that such an error 
can occur from a NIC too!


>
>>> - Increase the "maximum file size" for the tape drive. The default is
>>> 1G and it limits how big the "blocks" on tape are between EOF markers.
>>> Maybe the counter per file is an int and you therefore have trouble
>>> with 15TB files?
>> I guess the spool file is written as one file to the tape which
>> would mean, that for every spool file only one EOF marker would be
>> written? Can you confirm that, or are I'm wrong? What would you
>> suggest to set for "maximum file size"?
> No, the spool file is only to decouple the streaming to tape from
> network and client speed and delays. There is not much buffering when
> writing to tape so the spool area need to be able to constantly
> deliever data faster tahn the tape can consume. It is still written
> with the "maximum block size" per transfer and a EOF marker every
> "maximum file size" to tape.
>
>>> - You should increase the default block size written to tape with
>>> "maximum block size" set to for example 2M. Warning: You could not
>>> read already written tapes with non matching block sizes.
>> Ok and thank you for the warning, that's rather good to know!
>>
>>> - The spool area doesn't need to be that big, but really fast to
>>> saturate the tape drive and keep them streaming. Recommended is
>>> something like fast SSD or similar.
>> The tapes we are using have a native capacity of 4TB. I thought that
>> should be the lowest size of the spool size to prevent start and
>> stop operations of the drives. With the hardware compression
>> sometimes more than 9TB are written on a tape, so I decided the
>> ideal spool size is about 16TB?!  So where is my error in reasoning?
>>
> There is no problem in cutting the job in pieces of some GB with the
> tape being idle in between. The problem arise if you can deliever
> data, but not fast enough. So the tape drive try to reach its max.
> speed try to settle down if data are to slow and stop and rewind if
> nothing helps. To prevent this you need data chunks delieverable with
> (much) more than the tape can handle, so for this chunk you never has
> a "underun" of data. IMHO a smaller faster spool is always better than
> a bigger slower spool.
Thank you for this explantation. Than we will reconstruct the 
backupserver and remove some of the HDs and put som SSDs in the Server...


Regards

Hans



------------------------------------------------------------------------------
Shape the Mobile Experience: Free Subscription
Software experts and developers: Be at the forefront of tech innovation.
Intel(R) Software Adrenaline delivers strategic insight and game-changing 
conversations that shape the rapidly evolving mobile landscape. Sign up now. 
http://pubads.g.doubleclick.net/gampad/clk?id=63431311&iu=/4140/ostg.clktrk
_______________________________________________
Bacula-users mailing list
Bacula-users AT lists.sourceforge DOT net
https://lists.sourceforge.net/lists/listinfo/bacula-users