ADSM-L

Re: [ADSM-L] Ang: Re: [ADSM-L] Ang: Re: [ADSM-L] vtl versus file systems for pirmary pool

2011-10-04 09:39:37
Subject: Re: [ADSM-L] Ang: Re: [ADSM-L] Ang: Re: [ADSM-L] vtl versus file systems for pirmary pool
From: Stefan Folkerts <stefan.folkerts AT GMAIL DOT COM>
To: ADSM-L AT VM.MARIST DOT EDU
Date: Tue, 4 Oct 2011 15:23:30 +0200
My understanding is that each core in a TSM server can identify around 60GB
of undeduped data per hour.
This puts a quad core at about 3-4TB per day of newly undeduped data per day
(not running at night).
Of course there are differences in CPU architecture and memory/storage as
well but IBM say's they see abount 60GB/hour per core and advise not to go
above 3-4TB per day on a FILEPOOL with dedupe on TSM.


On Fri, Sep 30, 2011 at 12:29 AM, Colwell, William F.
<bcolwell AT draper DOT com>wrote:

> Hi Daniel,
>
> My main point was to say that your previous posts seemed to be saying that
> dedup storagepools
> were recommended to be 6 TB in size at most.  It is my understanding the
> 6TB recommendation was
> a daily server thruput maximum design target when dedup is in use.
>
> I agree, a processor at 100% is not good and I have been adjusting the
> server design to reduce
> the load.
>
> I started re-hosting our backup service on v6 as soon as v6 was available.
>  I started out
> deduping everything but quickly ran into performance problems.  To solve
> them I started excluding
> classes of data from dedup - all Oracle backups, all outlook PST files and
> any other file larger
> than 1 GB.  I also replaced all the disks I started with over 12 months and
> greatly expanded the
> total storage.
>
> Where the Redbook says that expiration is much improved, that is only
> partly true.  If dedup is involved,
> a hidden process starts after the visible expiration process is done and
> runs on for quite a while longer.
> This process has to check if a chuck in an expired file can truly be
> removed from storage because
> it could be that other files are pointing to that chunk.  You can see the
> process by entering
> 'show dedupdeleteinfo' after expiration completes.
>
> The thing about big files is that they are broken into lots of chunks.
>  When a big file is expired,
> this hidden process will take a long time to complete and can bog down the
> system.  This is the
> real reason I exclude some files from dedup.
>
> As for SATA, I have been using some big arrays (20 2TB disks, raid 6), 8
> such arrays, for 18 months
> and have had only 1 disk fail.  But I try not to abuse them.  Backups first
> go onto jbod
> disks - 15K rpm, 600GB - and all the dedup activity is done there.  The
> storagepools on those disks
> are then migrated to storagepools on the SATA arrays.  It is a mostly
> sequential process.
>
> I can only suggest that if your customer does storagepool backup from the
> SATA arrays after migration or
> reclaim, and the copypool is not dedup, then there would be a lot of random
> requests to the SATA storagepools
> to rehydrate the backups.
>
> Regards,
>
> Bill Colwell
> Draper Lab
>
> -----Original Message-----
> From: ADSM: Dist Stor Manager [mailto:ADSM-L AT VM.MARIST DOT EDU] On Behalf 
> Of
> Daniel Sparrman
> Sent: Thursday, September 29, 2011 1:24 AM
> To: ADSM-L AT VM.MARIST DOT EDU
> Subject: Ang: Re: [ADSM-L] Ang: Re: [ADSM-L] Ang: Re: [ADSM-L] vtl versus
> file systems for pirmary pool
>
> Like it says in the document, it's a recommendation and not a technical
> limit.
>
> However, having the server running at 100% utilization all the time doesnt
> seem like a healthy scenario.
>
> Why arent you deduplicating files larger than 1GB? From my experience,
> datafiles from SQL, Exchange and such has a very large de-dup ratio, while
> TSM's deduplication skips files smaller than 2KB?
>
> I have a customer up north who used this configuration on an HP EVA based
> box with SATA disks. The disks where breaking down so fast that the arrays
> within the box was in a constant "rebuild" phase. HP claimed it was TSM
> dedup that was breaking the disks (they actually claimed TSM was writing so
> often that the disks broke), a scenario I have very hard to believe.
>
> Best Regards
>
> Daniel
>
>
>
> Daniel Sparrman
> Exist i Stockholm AB
> Växel: 08-754 98 00
> Fax: 08-754 97 30
> daniel.sparrman AT exist DOT se
> http://www.existgruppen.se
> Posthusgatan 1 761 30 NORRTÄLJE
>
>
>
> -----"ADSM: Dist Stor Manager" <ADSM-L AT VM.MARIST DOT EDU> skrev: -----
>
>
> Till: ADSM-L AT VM.MARIST DOT EDU
> Från: "Colwell, William F." <bcolwell AT DRAPER DOT COM>
> Sänt av: "ADSM: Dist Stor Manager" <ADSM-L AT VM.MARIST DOT EDU>
> Datum: 09/28/2011 20:43
> Ärende: Re: [ADSM-L] Ang: Re: [ADSM-L] Ang: Re: [ADSM-L] vtl versus file
> systems for pirmary pool
>
> Hi Daniel,
>
>
>
> I remember hearing about a 6 TB limit for dedup in a webinar or conference
> call,
>
> but what I recall is that that was a daily thruput limit.  In the same
> section of the
>
> redbook as you quote is this paragraph -
>
>
>
> Experienced administrators already know that Tivoli Storage Manager
> database expiration
>
> was one of the more processor-intensive activities on a Tivoli Storage
> Manager Server.
>
> Expiration is still processor intensive, albeit less so in Tivoli Storage
> Manager V6.1, but this is
>
> now second to deduplication in terms of consumption of processor cycles.
> Calculating the
>
> MD5 hash for each object and the SHA1 hash for each chunk is a processor
> intensive activity.
>
>
>
> I can say this is absolutely correct; my processor is frequently running at
> or near 100%.
>
>
>
> I have gone way beyond 6 TB of storage for dedup storagepools as this sql
> shows
>
> for the 2 instances on my server -
>
>
>
> select cast(stgpool_name as char(12)) as "Stgpool", -
>
>       cast(sum(num_files)     / 1024 /1024 as decimal(4,1)) as "Mil Files",
> -
>
>       cast(sum(physical_mb)   / 1024 /1024 as decimal(4,1)) as
> "Physical_TB", -
>
>       cast(sum(logical_mb)    / 1024 /1024 as decimal(4,1))as "Logical_TB",
> -
>
>       cast(sum(reporting_mb)  / 1024 /1024 as decimal(4,1))as
> "Reporting_TB" -
>
> from occupancy -
>
>  where stgpool_name in (select stgpool_name from stgpools where deduplicate
> = 'YES') -
>
>   group by stgpool_name
>
>
>
>
>
> Stgpool            Mil Files      Physical_TB      Logical_TB
>  Reporting_TB
>
> -------------     ----------     ------------     -----------
> -------------
>
> BKP_2                  368.0              0.0            30.0
>  95.8
>
> BKP_2X                 341.0              0.0            23.9
>  58.6
>
>
>
>
>
> Stgpool            Mil Files      Physical_TB      Logical_TB
>  Reporting_TB
>
> -------------     ----------     ------------     -----------
> -------------
>
> BKP_2                  224.0              0.0            35.7
>  74.1
>
> BKP_FS_2                49.0              0.0            21.0
>  45.5
>
>
>
>
>
> Also, I am not using any random disk pool, all the disk storage is scratch
> allocated
>
> file class volumes.  There is also a tape library (lto5) for files larger
> than 1GB
>
> which are excluded from deduplication.
>
>
>
>
>
> Regards,
>
>
>
> Bill Colwell
>
> Draper Lab
>
>
>
>
>
> -----Original Message-----
> From: ADSM: Dist Stor Manager [mailto:ADSM-L AT VM.MARIST DOT EDU] On Behalf 
> Of
> Daniel Sparrman
> Sent: Wednesday, September 28, 2011 3:49 AM
> To: ADSM-L AT VM.MARIST DOT EDU
> Subject: Ang: Re: [ADSM-L] Ang: Re: [ADSM-L] Ang: Re: [ADSM-L] vtl versus
> file systems for pirmary pool
>
>
>
> To be honest, it doesnt really say. The information is from the Tivoli
> Storage Manager Technical Guide:
>
>
>
> Note: In terms of sizing Tivoli Storage Manager V6.1 deduplication, we
> currently
>
> recommend using Tivoli Storage Manager to deduplicate up to 6 TB total of
> storage pool
>
> space for the deduplicated pools. This is a rule of thumb only and exists
> solely to give an
>
> indication of where to start investigating VTL or filer deduplication. The
> reason that a
>
> particular figure is mentioned is for guidance in typical scenarios on
> commodity hardware.
>
> If more than 6 TB of real diskspace is to be duplicated, you can either use
> Tivoli Storage
>
> Manager or a hardware deduplication device. The 6 TB is in addition to
> whatever disk is
>
> required by non-deduplicated storage pools. This rule of thumb will change
> as processor
>
> and disk technologies advance, because the recommendation is not an
> architectural,
>
> support, or testing limit.
>
>
>
> http://www.redbooks.ibm.com/redbooks/pdfs/sg247718.pdf
>
>
>
> I'm guessing it's server-side since client-side shouldnt use any resources
> @ the server. I'm also guessing you could do 8TB or 10, but not 60TB.
>
>
>
> Best Regards
>
>
>
> Daniel Sparrman
>
>
>
>
>
>
>
> Daniel Sparrman
>
> Exist i Stockholm AB
>
> Växel: 08-754 98 00
>
> Fax: 08-754 97 30
>
> daniel.sparrman AT exist DOT se
>
> http://www.existgruppen.se
>
> Posthusgatan 1 761 30 NORRTÄLJE
>
>
>
>
>
>
>
> -----"ADSM: Dist Stor Manager" <ADSM-L AT VM.MARIST DOT EDU> skrev: -----
>
>
>
>
>
> Till: ADSM-L AT VM.MARIST DOT EDU
>
> Från: Hans Christian Riksheim <bullhcr AT GMAIL DOT COM>
>
> Sänt av: "ADSM: Dist Stor Manager" <ADSM-L AT VM.MARIST DOT EDU>
>
> Datum: 09/28/2011 09:56
>
> Ärende: Re: [ADSM-L] Ang: Re: [ADSM-L] Ang: Re: [ADSM-L] vtl versus file
> systems for pirmary pool
>
>
>
> This 6 TB supported limit for deduplicated FILEPOOL does this limit
>
> apply when one does client side deduplication only?
>
>
>
> Just wondering since I have just set up a 30 TB FILEPOOL for this purpose.
>
>
>
> Regards
>
>
>
> Hans Chr.
>
>
>
> On Tue, Sep 27, 2011 at 8:44 PM, Daniel Sparrman
>
> <daniel.sparrman AT exist DOT se> wrote:
>
> > Just to put an end to this discussion, we're kinda running out of limits
> here:
>
> >
>
> > a) No VTL solution, neither DD, neither Sepaton, neither anyone, is a
> replacement for random diskpools. Doesnt matter if you can configure 50
> drives, 500 drives or 5000 drives, the way TSM works, you're gonna make the
> system go bad since the system is made from having random pools infront,
> sequential pools in the back.  A sequential device is not gonna replace
> that, independent being a sequential file pool or a VTL (or, for that
> question, a tape library).
>
> >
>
> > b) VTL's where invented because most backup software (I've only worked
> with TSM, Legato & Veritas aka Symantec) is used to working with sequential
> devices. That havent changed, and wont change in the near future. VTL's (and
> the file device option) is just a replacement. Performance wise, VTL's are
> gonna win all the time compared to a file device, question you need to ask
> yourself is, do I need the VTL, or can I go along with using file devices.
> According to the TSM manual (dont have the link , but if you want i'll find
> it) the maximum supported file device pool for deduplication is 6TB... so if
> you're thinking of replacing a VTL with a seq. file pool, keep that in mind.
> The limit is because the amount of resources needed by TSM to do the file
> deduplication is limited, or as the manual says, "until new technologies are
> available".
>
> >
>
> > The discussion here where people are actually planning on just having a
> sequential pool (since noone is actually discussing that there's a random
> pool infront) is plain scary. No sequential device is gonna have their time
> of the life having a fileserver serving 50K blocks at a time.
>
> >
>
> > So my last 50 cents worth is:
>
> >
>
> > a) Have a random pool infront
>
> >
>
> > b) Depending on the size of your environment, you're either gonna go with
> a filepool and use de-dup (limit is 6TB for each pool, you might not want to
> de-dup everything), or you're gonna go with a fullscale VTL. Choice here is
> size vs costs.
>
> >
>
> > I've seen alot of posts here lately about the disadvantages with VTL's ..
> well, I havent seen one this far with mine. I have a colleague who bought a
> XXXX VTL and found out he needed another VTL just todo the de-dup, since one
> VTL wasnt a supported configuration to do de-dup. I have another colleague
> who bought a very cheap VTL solution (from a very mentioned name around
> here) and ended up with having same hashes, but different data, leaving him
> with unrestorable data.
>
> >
>
> > Comparing eggs to apples just isnt fair.  Different manufactures of VTL's
> do different things, meaning both performance and availability is completely
> different.
>
> >
>
> > Just to sum up, we've had both 3584's and (back in the days) 3575, and
> I've never been happier with our VTL (and yes, we do restore tests).
>
> >
>
> > Best Regards
>
> >
>
> > Daniel
>
> >
>
> >
>
> >
>
> > Daniel Sparrman
>
> > Exist i Stockholm AB
>
> > Växel: 08-754 98 00
>
> > Fax: 08-754 97 30
>
> > daniel.sparrman AT exist DOT se
>
> > http://www.existgruppen.se
>
> > Posthusgatan 1 761 30 NORRTÄLJE
>
> >
>
> >
>
> >
>
> > -----"ADSM: Dist Stor Manager" <ADSM-L AT VM.MARIST DOT EDU> skrev: -----
>
> >
>
> >
>
> > Till: ADSM-L AT VM.MARIST DOT EDU
>
> > Från: Rick Adamson <RickAdamson AT WINN-DIXIE DOT COM>
>
> > Sänt av: "ADSM: Dist Stor Manager" <ADSM-L AT VM.MARIST DOT EDU>
>
> > Datum: 09/27/2011 18:02
>
> > Ärende: Re: [ADSM-L] Ang: Re: [ADSM-L] vtl versus file systems for
> pirmary pool
>
> >
>
> > Interesting. Every VTL based solution, including data domain, that I
> looked at had limits on the amount of drives that could be emulated which
> were nowhere near a hundred let alone a thousand. Perhaps it's time to
> revisit this.
>
> >
>
> > The license is a data domain fee, and a hefty one at that.
>
> >
>
> > The bigger question I have is since the file based storage is native to
> TSM why exactly is using a file based storage not supported?
>
> >
>
> > ~Rick
>
> >
>
> >
>
> > -----Original Message-----
>
> > From: ADSM: Dist Stor Manager [mailto:ADSM-L AT VM.MARIST DOT EDU] On 
> > Behalf Of
> Daniel Sparrman
>
> > Sent: Tuesday, September 27, 2011 10:30 AM
>
> > To: ADSM-L AT VM.MARIST DOT EDU
>
> > Subject: [ADSM-L] Ang: Re: [ADSM-L] vtl versus file systems for pirmary
> pool
>
> >
>
> > Not really sure where the general idea that a VTL will limit the number
> of available mount points.
>
> >
>
> > I'm not familiar with Data Domain, but generally speaking, the number of
> virtual tape drives configured within a VTL is usually thousands. Not sure
> why you'd want that many though, I always prefer having a small diskpool
> infront of whatever sequential pool I have, and let the bigger files pass
> the diskpoool and go straightly to the seq. pool.
>
> >
>
> > As far as for LAN-free, the only available option I know of is SANergy.
> And going down that road (concerning both price & complexity) will probably
> make the VTL look cheap.
>
> >
>
> > Not sure what kind of licensing you're talking about concerning VTL, but
> I assume it's a Data Domain license and not a TSM license?
>
> >
>
> > Best Regards
>
> >
>
> > Daniel Sparrman
>
> >
>
> >
>
> >
>
> > Daniel Sparrman
>
> > Exist i Stockholm AB
>
> > Växel: 08-754 98 00
>
> > Fax: 08-754 97 30
>
> > daniel.sparrman AT exist DOT se
>
> > http://www.existgruppen.se
>
> > Posthusgatan 1 761 30 NORRTÄLJE
>
> >
>
> >
>
> >
>
> > -----"ADSM: Dist Stor Manager" <ADSM-L AT VM.MARIST DOT EDU> skrev: -----
>
> >
>
> >
>
> > Till: ADSM-L AT VM.MARIST DOT EDU
>
> > Från: Rick Adamson <RickAdamson AT WINN-DIXIE DOT COM>
>
> > Sänt av: "ADSM: Dist Stor Manager" <ADSM-L AT VM.MARIST DOT EDU>
>
> > Datum: 09/27/2011 16:52
>
> > Ärende: Re: [ADSM-L] vtl versus file systems for pirmary pool
>
> >
>
> > A couple of things that I did not see mentioned here which I experienced
>
> > was.... for Data Domain the VTL is an additional license and it does
>
> > limit the available mount points (or emulated drives), where a TSM file
>
> > based pool does not. Like Wanda stated earlier depends what you can
>
> > afford !
>
> >
>
> > I myself have grown fond of using the file based approach, easy to
>
> > manage, easy to configure, and never worry about an available tape drive
>
> > (virtual or otherwise). The lan-free issue is something to consider but
>
> > from what I have heard lately is that it can still be accomplished using
>
> > the file based storage. If anyone has any info on it I would appreciate
>
> > it.
>
> >
>
> > ~Rick
>
> > Jax, Fl.
>
> >
>
> > -----Original Message-----
>
> > From: ADSM: Dist Stor Manager [mailto:ADSM-L AT VM.MARIST DOT EDU] On 
> > Behalf Of
>
> > Tim Brown
>
> > Sent: Monday, September 26, 2011 4:05 PM
>
> > To: ADSM-L AT VM.MARIST DOT EDU
>
> > Subject: [ADSM-L] vtl versus file systems for pirmary pool
>
> >
>
> > What advantage does VTL emulation on a disk primary storage pool have
>
> >
>
> > as compared to disk storage pool that is non vtl ?
>
> >
>
> >
>
> >
>
> > It appears to me that a non vtl system would not require the daily
>
> > reclamation process
>
> >
>
> > and also allow for more client backups to occur simultaneously.
>
> >
>
> >
>
> >
>
> > Thanks,
>
> >
>
> >
>
> >
>
> > Tim Brown
>
> > Systems Specialist - Project Leader
>
> > Central Hudson Gas & Electric
>
> > 284 South Ave
>
> > Poughkeepsie, NY 12601
>
> > Email: tbrown AT cenhud DOT com <<mailto:tbrown AT cenhud DOT com>>
>
> > Phone: 845-486-5643
>
> > Fax: 845-486-5921
>
> > Cell: 845-235-4255
>
> >
>
> >
>
> >
>
> >
>
> > This message contains confidential information and is only for the
>
> > intended recipient. If the reader of this message is not the intended
>
> > recipient, or an employee or agent responsible for delivering this
>
> > message to the intended recipient, please notify the sender immediately
>
> > by replying to this note and deleting all copies and attachments.
>

<Prev in Thread] Current Thread [Next in Thread>