Networker

Re: [Networker] VTL/Dedup

2007-06-29 07:20:46
Subject: Re: [Networker] VTL/Dedup
From: Stuart Whitby <swhitby AT DATAPROTECTORS.CO DOT UK>
To: NETWORKER AT LISTSERV.TEMPLE DOT EDU
Date: Fri, 29 Jun 2007 12:17:39 +0100
Hi Joel,
 
An unlimited slot license would be handy in this situation, and immediately 
removes any need for "manual media management" of your VTL in future.  
Personally, with 30TB and an unlimited license, I'd go for a 500 slot library, 
initially "filled" with 300 LTO1 tapes.  The media type makes no difference to 
the library, but the 100GB capacity makes things much simpler for quick 
calculations :)  This also gives you room for decent expansion in the number of 
disk trays without any hassle.  Keeping a large number of small tapes will 
allow for better utilisation though - 50GB tapes would probably be a better bet 
(to still keep the easy calculation factor).  Tapes can be set to a maximum 
fixed size in the CDL, where we generally use 96GB rather than 100 to allow for 
direct Vtape-tape cloning if an equivalent drive is ever attached to the back 
of the CDL.  This is likely to save any hassles with running out of space on 
the physical tape.
 
Out of interest, I've just been readling through the 7.4 release notes, and 
your jukebox license will no longer work with a VTL from this version onwards.  
"Each VTL hardware frame requires one VTL frame license and will support an 
unlimited number of VTLs on that frame."  Temporary enablers are provided in 
the release notes, but you'll need to relicense for a VTL from that version.  
Unless you're being ambitious and starting with it ;)
 
Capacity utilisation when I arrived here was near 100%, and that was without 
the cloning which they wanted (all backups being held on site isn't the best 
idea).  The biggest problem was that this had been installed using what I'd 
reckon to be a "standard" backup policy, with regular full backups.  The other 
big problem was that they were doing snapshot backups on about 14TB of disk on 
an incremental policy.  Even shifting this to differential didn't help, as 
differential backup still relies on the archive bit for backup.  The archive 
bit may have been reset on the snapshot, but wasn't touched on the source 
volume, so the file was backed up again the following night.  Setting the 
NSR_AVOID_ARCHIVE environment variable to Yes (not the "No" as per the admin 
guide) solved this and brought down the amount of data hugely.  I don't have 
particularly good data on the amount backed up per month.  The last 3 months 
show around 30TB according to NMC, but there's a big difference in the 
percentage of successful backups now, which is near 100%.  Previously, the 
system would be running out of space on a very regular basis and backups 
regularly failing for a variety of reasons.
 
All levels are currently kept in the VTL.  That's why the "differential only" 
policy.  There is no point in doing a weekly full to a VTL given the protection 
of RAID and a reasonable cloning policy.  
 
I'll be leaving this site in a couple of weeks.  There are Celerras, but no 
Unix servers in play here whatsoever, and it's not going to be left in the 
hands of NetWorker "experts".  With the Celerras, any filesystems which are 
added will have their paths added to their NetWorker client resource, and the 
pool will pick up the fact it's a level 0 backup and back it up to the yearly 
pool.  That wouldn't be a problem for setting up a Unix pool either, and this 
can be left entirely automated.  The problem with Windows backups in this 
situation is the fact that an incremental or lower-level differential backup 
will still back up SYSTEM STATE:\ etc. as full backups, so I can't simply 
direct all full backups from Windows systems to the yearly group or the yearly 
tapes will be full of junk data which will simply take up space on its disk and 
never be freed up.  (I'm tempted to set up another pool for this data, but that 
means a large number of tapes in use unless I limit the devices and throw this 
stuff over the network.)
 
Ideally, I'd like to leave it so that the users can simply add filesystems or 
clients and the backup would simply work.  In reality, they'll need to add this 
to a "NewClients" group, run that, and move the system to the correct groups 
once it's known to be working - and coincidentally has a full backup in the 
yearly pool.  
 
Ideally, I'd also like to leave only the bare essential resources on the 
system.  However, it's going to have to be left with a bunch of irrelevant 
schedules, directives, pools, groups, label templates etc.  Nothing to do with 
this topic, just a HUGE pet peeve of mine about NetWorker - WHY CAN'T I DELETE 
THESE, AND WHY CAN'T I CHOOSE WHICH POOL SHOULD BE "DEFAULT"?!?!?!?!  Sorry.  
Will pick up my toys and put them back in the pram now.
 
What will be set up here, once more datacentres are set up, is offsite backup 
direct to another datacentre, with cloning of the yearly and monthly data back 
to the original site.  This leaves them able to recover to the last night's 
backup in the event of datacentre loss, and recover for discovery reasons in 
the event of the loss of the other datacentre (and hope that a user doesn't 
require their file back from the local datacentre after the entire loss of the 
other site).
 
Cheers,
 
Stuart.
 

________________________________

From: Joel Fisher [mailto:jfisher AT wfubmc DOT edu]
Sent: Wed 27/06/2007 18:07
To: Stuart Whitby; EMC NetWorker discussion
Subject: RE: [Networker] VTL/Dedup



Hey Stuart!

 

Thanks for the time spent in replying!

 

We have an unlimited autochanger license just sitting around doing nothing, so 
that would probably alleviate some of the growth issues you mentioned.

 

We do currently have 30TB on adv_file devices, but I less then thrilled with 
them to be honest.  I basically have to keep then less than 80% full otherwise 
I run into problems.  In my current environment, that is ~6tb wasted... as it 
grows that amount of wasted space will continue to grow.  That in part is why I 
was thinking VTL.   What has your capacity utilization been with your VTL?  
Another thought was to use adv_file on ZFS then I could add incremental amounts 
of space to the device as needed.

 

So do you keep all your levels in the VTL?  Unless we go with dedup, I would 
have to dump fulls to a silo, otherwise I was have to have a massive vtl.

 

If I go with a VTL, and choose one that backends to a real silo, then I'll just 
emulate the same type of drives I have now.  If I chose one that has to clone 
through networker, then I'll probably set it up to emulate dlt7000s or 
something else small.  My thinking is that then the vtapes won't be tied up as 
much so their will be fewer conflicts.

 

Why do you say to separate the client types?

 

How much data do you backup a month?

 

Anyone else have any experience to share about these technologies?

 

Thanks!

 

Joel

 

 

 

________________________________

From: Stuart Whitby [mailto:swhitby AT dataprotectors.co DOT uk] 
Sent: Wednesday, June 27, 2007 10:54 AM
To: EMC NetWorker discussion; Joel Fisher
Subject: RE: [Networker] VTL/Dedup

 

No experience of any but the EMC Clariion Disk Library (which I hear is 
Falconstor based), but I'm not keen on VTLs.  If you're looking at a disk based 
solution, then I'd recommend a decent sized disk based Advanced File staging 
area with a physical tape library on the back end.

 

Going on my experience of the CDL, VTLs themselves aren't bad.  There are a 
couple of niggles with the CDL GUI, but it's behaved almost flawlessly with 
7.3.  However, the environment I'm currently working in was set up with a 25TB 
CDL as a 256 slot LTO1 library.  With a basic 256 slot autochanger license, 
this splits the 25TB nicely and uses all the space.  Then we added more disk.

 

Now we need to do media management tasks to move expired tapes into and out of 
the library, since there isn't enough space in the slots to hold the 35TB any 
more.  This is standard practice for a tape library, but shouldn't be necessary 
with a VTL (and wouldn't be if not for NW's slot-based licensing model).  We 
could change to LTO2, but that means having 200GB volumes set up, so we only 
get 175 tapes which won't expire as regularly since they hold more data.  

 

By going purely to a disk based solution, we've also had to move to a 
differential only policy.  To do a full backup on a weekly basis was killing us 
in terms of the disk space used, so we're down to a full once a year to be kept 
for 25 years, L1 once a month to be kept for a year, and L2-8 through the week 
to be kept for a month.  Sounds good until you add a filesystem to one of your 
servers.  Now you have a level 0 backup on your daily pool which will remain 
until all monthlies based on it expire in a year's time, and that's taking up 
200GB of space in your library.  I could put all full backups to the yearly 
pool, but SYSTEM_FILES, SYSTEM_DB and ASR savesets are always full.  This is 
impossible to filter given the pure "OR" logic of pool selection criteria.

 

The other huge problem of going for a pure VTL environment (which you don't 
specify here but I figure I'd throw in anyway) is that the VTL *NEEDS* to be 
based off-site.  Without doing that, you are at massive risk of losing all data 
in the event of datacentre loss.  The plus side is that it's a great get-out if 
the boss needs to start a fire to put paid to your next Enron scandal ;)

 

So if you're going to go down the VTL route, I'd recommend the following:

 

- Size your VTL requirements very carefully and configure appropriately to 
allow plenty of tapes without vaulting.

- 3 pools: Yearly (full), Monthly (L1) and Daily (L2-8).  Maybe a weekly as 
well if you need (L2).  

- Still have a tape library on the back end.  Yearly and monthly (& weekly?) 
data should have 2 tape copies - one clone for the event of RAID failure and 
another in archive for the event of site loss.  The tape library can be a small 
unit given the reliance on the VTL for recoveries.

- Yearly and monthly backups need to be well spaced out to allow the systems to 
use as few tapes as possible.  Or they should be sent across the network to the 
server if possible, allowing it to maintain only 2 part-used tapes for yearly 
backups.  This removes one of the VTL's main benefits of being able to provide 
large numbers of tape drives to multiple systems - unless you DDS it, and 
what's the point in that?  Another option to keep the data available and within 
the VTL is to clone to tape and back to a yearly clone pool with the original 
yearly tapes recycled.  

- Separate Unix, Windows and NDMP pools, with Unix and NDMP daily groups having 
their L0 backups going to the yearly pool.  Can't do this with Windows unless 
there's a directive to skip ASR etc.  

- Set strict savegroup parallelism values to cut down on the number of 
appendable tapes per pool.  If you have 50 tapes in use across a large number 
of servers in a big group, it's going to take a long time for all those tapes 
to fill up and, eventually, expire.

 

Should give you something to think about, at least :)

 

Cheers,

 

Stuart.

 

 

________________________________

From: EMC NetWorker discussion on behalf of Joel Fisher
Sent: Wed 27/06/2007 14:27
To: NETWORKER AT LISTSERV.TEMPLE DOT EDU
Subject: [Networker] VTL/Dedup

Hey Guys,

Can anyone comment about their experience good/bad on the below products
in a Networker environment?

Diligent Protectier
Data Domain (any dedup/vtl product)
Copan Revolution(falconstor)
Sun STK VTL(falconstor)
Any other VTL/Dedup solutions

VTLs in general?  Prefer VTL or adv_file?

Thanks!

Joel




To sign off this list, send email to listserv AT listserv.temple DOT edu and 
type "signoff networker" in the body of the email. Please write to 
networker-request AT listserv.temple DOT edu if you have any problems with this 
list. You can access the archives at 
http://listserv.temple.edu/archives/networker.html or
via RSS at http://listserv.temple.edu/cgi-bin/wa?RSS&L=NETWORKER


To sign off this list, send email to listserv AT listserv.temple DOT edu and 
type "signoff networker" in the body of the email. Please write to 
networker-request AT listserv.temple DOT edu if you have any problems with this 
list. You can access the archives at 
http://listserv.temple.edu/archives/networker.html or
via RSS at http://listserv.temple.edu/cgi-bin/wa?RSS&L=NETWORKER

<Prev in Thread] Current Thread [Next in Thread>