Like it says in the document, it's a recommendation and not a technical limit.
However, having the server running at 100% utilization all the time doesnt seem
like a healthy scenario.
Why arent you deduplicating files larger than 1GB? From my experience,
datafiles from SQL, Exchange and such has a very large de-dup ratio, while
TSM's deduplication skips files smaller than 2KB?
I have a customer up north who used this configuration on an HP EVA based box
with SATA disks. The disks where breaking down so fast that the arrays within
the box was in a constant "rebuild" phase. HP claimed it was TSM dedup that was
breaking the disks (they actually claimed TSM was writing so often that the
disks broke), a scenario I have very hard to believe.
Best Regards
Daniel
Daniel Sparrman
Exist i Stockholm AB
Växel: 08-754 98 00
Fax: 08-754 97 30
daniel.sparrman AT exist DOT se
http://www.existgruppen.se
Posthusgatan 1 761 30 NORRTÄLJE
-----"ADSM: Dist Stor Manager" <ADSM-L AT VM.MARIST DOT EDU> skrev: -----
Till: ADSM-L AT VM.MARIST DOT EDU
Från: "Colwell, William F." <bcolwell AT DRAPER DOT COM>
Sänt av: "ADSM: Dist Stor Manager" <ADSM-L AT VM.MARIST DOT EDU>
Datum: 09/28/2011 20:43
Ärende: Re: [ADSM-L] Ang: Re: [ADSM-L] Ang: Re: [ADSM-L] vtl versus file
systems for pirmary pool
Hi Daniel,
I remember hearing about a 6 TB limit for dedup in a webinar or conference call,
but what I recall is that that was a daily thruput limit. In the same section
of the
redbook as you quote is this paragraph -
Experienced administrators already know that Tivoli Storage Manager database
expiration
was one of the more processor-intensive activities on a Tivoli Storage Manager
Server.
Expiration is still processor intensive, albeit less so in Tivoli Storage
Manager V6.1, but this is
now second to deduplication in terms of consumption of processor cycles.
Calculating the
MD5 hash for each object and the SHA1 hash for each chunk is a processor
intensive activity.
I can say this is absolutely correct; my processor is frequently running at or
near 100%.
I have gone way beyond 6 TB of storage for dedup storagepools as this sql shows
for the 2 instances on my server -
select cast(stgpool_name as char(12)) as "Stgpool", -
cast(sum(num_files) / 1024 /1024 as decimal(4,1)) as "Mil Files", -
cast(sum(physical_mb) / 1024 /1024 as decimal(4,1)) as "Physical_TB", -
cast(sum(logical_mb) / 1024 /1024 as decimal(4,1))as "Logical_TB", -
cast(sum(reporting_mb) / 1024 /1024 as decimal(4,1))as "Reporting_TB" -
from occupancy -
where stgpool_name in (select stgpool_name from stgpools where deduplicate =
'YES') -
group by stgpool_name
Stgpool Mil Files Physical_TB Logical_TB Reporting_TB
------------- ---------- ------------ ----------- -------------
BKP_2 368.0 0.0 30.0 95.8
BKP_2X 341.0 0.0 23.9 58.6
Stgpool Mil Files Physical_TB Logical_TB Reporting_TB
------------- ---------- ------------ ----------- -------------
BKP_2 224.0 0.0 35.7 74.1
BKP_FS_2 49.0 0.0 21.0 45.5
Also, I am not using any random disk pool, all the disk storage is scratch
allocated
file class volumes. There is also a tape library (lto5) for files larger than
1GB
which are excluded from deduplication.
Regards,
Bill Colwell
Draper Lab
-----Original Message-----
From: ADSM: Dist Stor Manager [mailto:ADSM-L AT VM.MARIST DOT EDU] On Behalf Of
Daniel Sparrman
Sent: Wednesday, September 28, 2011 3:49 AM
To: ADSM-L AT VM.MARIST DOT EDU
Subject: Ang: Re: [ADSM-L] Ang: Re: [ADSM-L] Ang: Re: [ADSM-L] vtl versus file
systems for pirmary pool
To be honest, it doesnt really say. The information is from the Tivoli Storage
Manager Technical Guide:
Note: In terms of sizing Tivoli Storage Manager V6.1 deduplication, we currently
recommend using Tivoli Storage Manager to deduplicate up to 6 TB total of
storage pool
space for the deduplicated pools. This is a rule of thumb only and exists
solely to give an
indication of where to start investigating VTL or filer deduplication. The
reason that a
particular figure is mentioned is for guidance in typical scenarios on
commodity hardware.
If more than 6 TB of real diskspace is to be duplicated, you can either use
Tivoli Storage
Manager or a hardware deduplication device. The 6 TB is in addition to whatever
disk is
required by non-deduplicated storage pools. This rule of thumb will change as
processor
and disk technologies advance, because the recommendation is not an
architectural,
support, or testing limit.
http://www.redbooks.ibm.com/redbooks/pdfs/sg247718.pdf
I'm guessing it's server-side since client-side shouldnt use any resources @
the server. I'm also guessing you could do 8TB or 10, but not 60TB.
Best Regards
Daniel Sparrman
Daniel Sparrman
Exist i Stockholm AB
Växel: 08-754 98 00
Fax: 08-754 97 30
daniel.sparrman AT exist DOT se
http://www.existgruppen.se
Posthusgatan 1 761 30 NORRTÄLJE
-----"ADSM: Dist Stor Manager" <ADSM-L AT VM.MARIST DOT EDU> skrev: -----
Till: ADSM-L AT VM.MARIST DOT EDU
Från: Hans Christian Riksheim <bullhcr AT GMAIL DOT COM>
Sänt av: "ADSM: Dist Stor Manager" <ADSM-L AT VM.MARIST DOT EDU>
Datum: 09/28/2011 09:56
Ärende: Re: [ADSM-L] Ang: Re: [ADSM-L] Ang: Re: [ADSM-L] vtl versus file
systems for pirmary pool
This 6 TB supported limit for deduplicated FILEPOOL does this limit
apply when one does client side deduplication only?
Just wondering since I have just set up a 30 TB FILEPOOL for this purpose.
Regards
Hans Chr.
On Tue, Sep 27, 2011 at 8:44 PM, Daniel Sparrman
<daniel.sparrman AT exist DOT se> wrote:
> Just to put an end to this discussion, we're kinda running out of limits here:
>
> a) No VTL solution, neither DD, neither Sepaton, neither anyone, is a
> replacement for random diskpools. Doesnt matter if you can configure 50
> drives, 500 drives or 5000 drives, the way TSM works, you're gonna make the
> system go bad since the system is made from having random pools infront,
> sequential pools in the back. A sequential device is not gonna replace that,
> independent being a sequential file pool or a VTL (or, for that question, a
> tape library).
>
> b) VTL's where invented because most backup software (I've only worked with
> TSM, Legato & Veritas aka Symantec) is used to working with sequential
> devices. That havent changed, and wont change in the near future. VTL's (and
> the file device option) is just a replacement. Performance wise, VTL's are
> gonna win all the time compared to a file device, question you need to ask
> yourself is, do I need the VTL, or can I go along with using file devices.
> According to the TSM manual (dont have the link , but if you want i'll find
> it) the maximum supported file device pool for deduplication is 6TB... so if
> you're thinking of replacing a VTL with a seq. file pool, keep that in mind.
> The limit is because the amount of resources needed by TSM to do the file
> deduplication is limited, or as the manual says, "until new technologies are
> available".
>
> The discussion here where people are actually planning on just having a
> sequential pool (since noone is actually discussing that there's a random
> pool infront) is plain scary. No sequential device is gonna have their time
> of the life having a fileserver serving 50K blocks at a time.
>
> So my last 50 cents worth is:
>
> a) Have a random pool infront
>
> b) Depending on the size of your environment, you're either gonna go with a
> filepool and use de-dup (limit is 6TB for each pool, you might not want to
> de-dup everything), or you're gonna go with a fullscale VTL. Choice here is
> size vs costs.
>
> I've seen alot of posts here lately about the disadvantages with VTL's ..
> well, I havent seen one this far with mine. I have a colleague who bought a
> XXXX VTL and found out he needed another VTL just todo the de-dup, since one
> VTL wasnt a supported configuration to do de-dup. I have another colleague
> who bought a very cheap VTL solution (from a very mentioned name around here)
> and ended up with having same hashes, but different data, leaving him with
> unrestorable data.
>
> Comparing eggs to apples just isnt fair. Different manufactures of VTL's do
> different things, meaning both performance and availability is completely
> different.
>
> Just to sum up, we've had both 3584's and (back in the days) 3575, and I've
> never been happier with our VTL (and yes, we do restore tests).
>
> Best Regards
>
> Daniel
>
>
>
> Daniel Sparrman
> Exist i Stockholm AB
> Växel: 08-754 98 00
> Fax: 08-754 97 30
> daniel.sparrman AT exist DOT se
> http://www.existgruppen.se
> Posthusgatan 1 761 30 NORRTÄLJE
>
>
>
> -----"ADSM: Dist Stor Manager" <ADSM-L AT VM.MARIST DOT EDU> skrev: -----
>
>
> Till: ADSM-L AT VM.MARIST DOT EDU
> Från: Rick Adamson <RickAdamson AT WINN-DIXIE DOT COM>
> Sänt av: "ADSM: Dist Stor Manager" <ADSM-L AT VM.MARIST DOT EDU>
> Datum: 09/27/2011 18:02
> Ärende: Re: [ADSM-L] Ang: Re: [ADSM-L] vtl versus file systems for pirmary
> pool
>
> Interesting. Every VTL based solution, including data domain, that I looked
> at had limits on the amount of drives that could be emulated which were
> nowhere near a hundred let alone a thousand. Perhaps it's time to revisit
> this.
>
> The license is a data domain fee, and a hefty one at that.
>
> The bigger question I have is since the file based storage is native to TSM
> why exactly is using a file based storage not supported?
>
> ~Rick
>
>
> -----Original Message-----
> From: ADSM: Dist Stor Manager [mailto:ADSM-L AT VM.MARIST DOT EDU] On Behalf
> Of Daniel Sparrman
> Sent: Tuesday, September 27, 2011 10:30 AM
> To: ADSM-L AT VM.MARIST DOT EDU
> Subject: [ADSM-L] Ang: Re: [ADSM-L] vtl versus file systems for pirmary pool
>
> Not really sure where the general idea that a VTL will limit the number of
> available mount points.
>
> I'm not familiar with Data Domain, but generally speaking, the number of
> virtual tape drives configured within a VTL is usually thousands. Not sure
> why you'd want that many though, I always prefer having a small diskpool
> infront of whatever sequential pool I have, and let the bigger files pass the
> diskpoool and go straightly to the seq. pool.
>
> As far as for LAN-free, the only available option I know of is SANergy. And
> going down that road (concerning both price & complexity) will probably make
> the VTL look cheap.
>
> Not sure what kind of licensing you're talking about concerning VTL, but I
> assume it's a Data Domain license and not a TSM license?
>
> Best Regards
>
> Daniel Sparrman
>
>
>
> Daniel Sparrman
> Exist i Stockholm AB
> Växel: 08-754 98 00
> Fax: 08-754 97 30
> daniel.sparrman AT exist DOT se
> http://www.existgruppen.se
> Posthusgatan 1 761 30 NORRTÄLJE
>
>
>
> -----"ADSM: Dist Stor Manager" <ADSM-L AT VM.MARIST DOT EDU> skrev: -----
>
>
> Till: ADSM-L AT VM.MARIST DOT EDU
> Från: Rick Adamson <RickAdamson AT WINN-DIXIE DOT COM>
> Sänt av: "ADSM: Dist Stor Manager" <ADSM-L AT VM.MARIST DOT EDU>
> Datum: 09/27/2011 16:52
> Ärende: Re: [ADSM-L] vtl versus file systems for pirmary pool
>
> A couple of things that I did not see mentioned here which I experienced
> was.... for Data Domain the VTL is an additional license and it does
> limit the available mount points (or emulated drives), where a TSM file
> based pool does not. Like Wanda stated earlier depends what you can
> afford !
>
> I myself have grown fond of using the file based approach, easy to
> manage, easy to configure, and never worry about an available tape drive
> (virtual or otherwise). The lan-free issue is something to consider but
> from what I have heard lately is that it can still be accomplished using
> the file based storage. If anyone has any info on it I would appreciate
> it.
>
> ~Rick
> Jax, Fl.
>
> -----Original Message-----
> From: ADSM: Dist Stor Manager [mailto:ADSM-L AT VM.MARIST DOT EDU] On Behalf
> Of
> Tim Brown
> Sent: Monday, September 26, 2011 4:05 PM
> To: ADSM-L AT VM.MARIST DOT EDU
> Subject: [ADSM-L] vtl versus file systems for pirmary pool
>
> What advantage does VTL emulation on a disk primary storage pool have
>
> as compared to disk storage pool that is non vtl ?
>
>
>
> It appears to me that a non vtl system would not require the daily
> reclamation process
>
> and also allow for more client backups to occur simultaneously.
>
>
>
> Thanks,
>
>
>
> Tim Brown
> Systems Specialist - Project Leader
> Central Hudson Gas & Electric
> 284 South Ave
> Poughkeepsie, NY 12601
> Email: tbrown AT cenhud DOT com <<mailto:tbrown AT cenhud DOT com>>
> Phone: 845-486-5643
> Fax: 845-486-5921
> Cell: 845-235-4255
>
>
>
>
> This message contains confidential information and is only for the
> intended recipient. If the reader of this message is not the intended
> recipient, or an employee or agent responsible for delivering this
> message to the intended recipient, please notify the sender immediately
> by replying to this note and deleting all copies and attachments.
|