Re: [Bacula-users] 350TB backup

thanks,

On 20 October 2015 at 11:02, Alan Brown <ajb2 AT mssl.ucl.ac DOT uk> wrote:

On 19/10/15 22:08, Dimitri Maziuk wrote:

On 10/19/2015 03:53 PM, Thing wrote:

Hi, Is anyone backing total volumes of this order? and if so, what sort of scaling, design, hardware?

I take it, that's the size of your filesystems? Not the estimated size of the backup set (i.e. all cycles in retention period)?

Assuming it is,

Yes. about 700TB and still growing.

Keeping the individual filesets to 1Tb so that tape run isn't excessive.

Largish changer - I'm about to retiire a 500-slot neo8000 with 7 LTO5 drives in favour of a 120-slot Scalar i500 with 6 LTO6s.

If you don't have enough slots you'll be feeding it multiple times during long weekends (we can easily peak at 20 tapes/day if multiple fulls get kicked off).
If you don't have enough drives you won't keep up, let alone cope with the inevitable drive failures and 2 day turnaround for a replacement. You absolutely must have at least 1 more drive than you think you need to cope with the backup load. Apart from anything else it means you can run urgent restores without interrupting backups in progress.

Large data safes. You'll need something like a Phoenix FS1903, probably a couple (these hold about 800 LTOs apiece) and a strong floor for them to sit on.

The tapes, safes and changer should all sit in close proximity in a temperature-controlled _clean_ environment, preferably in their own room, which is accessed as infrequently as possible. Dust kills drives and human skin is one of the worst contaminants because it's greasy with most other dust types being abrasive. Consider an air scrubber and clean-room "flypaper" sticky sheets on the door threshold.

Large (200Gb+), high performance SSD for spool. Consumer drives become a bottleneck.

Something similar (raid1) for database, 500Gb or so.

Postgresql - just works. Mysql doesn't scale this large very well - It will work but you'll be constantly fighting with it.

LOTS of ram for the DB box. I have 48Gb in a 5year old machine. It's due for an upgrade, but just about anything newer than 5 years with a E5 CPU or better will do the job nicely.

10Gb/s connectivity. You can fudge it with LACP on 1Gb/s but it becomes a bottleneck. Ditto on the fileservers themselves.

A decent network switch. Huawei 6800 series are nicely specced (1TB/s throughput) and run rings around equivalently priced Cisco/Juniper kit - which mostly all use the same Broadcom Trident2/2+/3 chipset anyway.

We run 14 month retention on the backup cycle, with a full every 3 months, nightly incrementals and 4-weekly differentials. Rapidly changing data in smaller sets gets monthly full backups. Thankfully this is science data, as financial stuff may need to be retained up to 7 years.

The most common restore is for accidental deletions but we've had to pull a few fileset restores over the years - usually because someone cheaped out and didn't RAID their box on the basis "its easy to rebuild".
It never is unless it's a cookie cutter - which they never are after a week of operation - and it's less disruptive to change a dead drive in a raidset anyway (this can be done hot on Linux systems using mdraid).

There's only ever been one major central store restore and that was a runaway rm -rf. Unfortunately one group has a 200TB system which is beyond warranty but not being replaced because of budgets. It's being driven hard and sooner or later it's going to drop its bundle. I'm not looking forward to that day.

Regarding the data safes: People say "Iron Mountain", but backups are not archives. You're going to cycle the tapes and retrieving them is much easier if they're local. A good fire safe will survive an intense fire for 60 minutes and a 10 metre drop (simulating building collapse) with the insides not going above 50C, but it's best to site your safes where they're least likely to get that kind of experience and pipe the data to them and the tape library.

Your single biggest hurdle is getting enough budget for the job. Management usually won't spend enough on decent storage systems and they'll heavily resist spending on backup systems. "Raid is not backup" usually doesn't sink in unless they've been burned a few times.
------------------------------------------------------------------------------

_______________________________________________
Bacula-users mailing list
Bacula-users AT lists.sourceforge DOT net
https://lists.sourceforge.net/lists/listinfo/bacula-users

------------------------------------------------------------------------------

_______________________________________________
Bacula-users mailing list
Bacula-users AT lists.sourceforge DOT net
https://lists.sourceforge.net/lists/listinfo/bacula-users