VTL TS7650G ProtecTIER factoring ratio

lapfrank

Active Newcomer
Joined
Nov 22, 2005
Messages
46
Reaction score
0
Points
0
Location
Providence, RI, USA
Hello All,

I was wondering if there was a way to get the factoring ratio of specific cartridges instead of a global view ?

I'm getting a pretty bad global factoring ratio, and I'm pretty sure it's caused by my RMAN backups that are sent to disc compressed and then backed up through dsmc incremental, but I can't "prove" it.

I'd like management to buy TDP for Oracle and therefore send it to TSM directly uncompressed.

I was looking at the long term statistics spreadsheet, but I can't figure out most of what's in there.

Anyone has an idea?

Thanks !

Frank.
 
Compression will probably cause a severe reduction in dedupe ratio. With TDPO and no compression you can expect to see a dedupe ratio between 10:1 to 50:1 after the first backup (depending on if you backup the archivelogs to Protectier -- archive logs don't dedupe well). You'll also want to make sure your DBAs are using a filesperset=1 setting. If you don't set filesperset=1 then RMAN will combine the datafiles which mixes up the data and confuses Protectier (this can reduce the dedupe ratio to less than 2:1).

If you want to view the dedupe ratio for an individual backup, then it is usually easier to run a backup during the day when there is no other write activity on Protectier (stop reclamation, migrations to PT, other backups etc) and then run analyze sessions:

/opt/dtc/app/utils/analyze_sessions

That will generate a CSV file in /pt_work that shows the amount of data backed up for each specific time period, dedupe ratio etc. I believe Protectier starts a new 'time period' after it sees a period of inactivity for approx. 10 minutes.

If compression is turned off (RMAN and TSM Client) and filesperset is set to 1, then allowing the TSM BA client to backup the files should provide the same 10 - 50:1 deduplication ratio.
 
Hi,

Thank you so much for this detailed answer! I made sure filesperset were disabled. I'm running ProtecTIER 2.5.2 and I get the following info from the analyze_sessions when I run both compressed through dsmc incr and then tdpo uncompressed filesperset=1:

--> compressed RMAN on disk, then dsmc incr:
2011-5-20 14:17:46 to 2011-5-20 14:27:34,0.0142426,1.52929e+07,99.9183%,1.00082,13.9298,2011-5-20 14:17:46,2011-5-20 14:27:34

--> uncompressed RMAN tdpo directly to tape:
2011-5-20 16:09:45 to 2011-5-20 16:56:40,0.161054,1.7293e+08,99.995%,1.00005,33.7245,2011-5-20 16:09:45,2011-5-20 16:56:40

Not sure if I'm reading the correct column though, there's a couple more than what's described in the user's guide and in the redbook. If I understand correctly, the factoring ratio is the 6th column. Therefore, I would be getting a factoring ratio of 13.9 with the compressed RMAN files, and a 33.7 with the uncompressed one. Does that make sense?

The number of TB (column #3) shows the size prior to the factoring ratio, so I suppose I would need to divide that by 13 and 33 respectively to get the size.

Since RMAN did a compression ratio of 9 already before, from what I'm reading here.. I'm not gaining anything by using TDPO?
Am I misreading something??

Thanks!

Frank.
 
Your change rate right now looks to be 99% -- you're not deduping in either situation (dedupe/factoring ratio is showing as 1.00082). You would need to run two TDPO backups to see if there is any deduplication since it will see the first backup as 'new' data.

Are these the same databases that you backed up? The first comressed is only 14.5 GB but the one using TDPO is 165 GB -- that would mean a compression ratio of 11:1 which I've never seen before (typical is 2 - 4:1 compression).
 
OK, so I was looking at the wrong columns!
I see I'm getting from protecTIER almost the same 1.000x factoring ratio on both backups. You are right, I only did one backup with TDPO to test, so dedup would not be working. Although, I would expect something higher than 1:1, just from the compression ratio itself right ? Or is this number only showing the dedup ratio and not taking into account the compression ratio ? Or maybe ProtecTIER is just unable to compress the oracle data, just dedup?

I just asked the DBAs about the compression ratio they got, and they confirm that it was 11:1. It was the same database that I backed up twice.
 
how dumb am I... I just did a tail -100 on the CSV, without looking at the first page with all the headers explaining what each columns actually are!
 
When you compress the data it will look different to Protectier than uncompressed, so you won't see any dedupe. Protectier is compressing the backups -- first one compresses about 1.05:1 and the TDPO compressed about 4.9:1.

I'd think that Protectier should have about the same compression as the RMAN compressed backup though. Were these both level 0 cumulative? Also, you might want to verify that the filesperset parameter is working. You can check it by connecting with RMAN and running a 'list backup completed between etc" -- this shows the datafiles associated with each backup set and you should only have 1 per backupset.
 
OK, I must be missing something or something I don't understand still. You say the TDPO compressed about 4.9:1. How did you come to this number ?

Uncompressed size was: 165gigs
Change rate: 99%
Factoring ratio: 1.0005
Compressed bytes: 33.72

Does it mean that the factoring ratio doesn't include the compression ratio and that I must "add" it the factoring ratio ?
Is the 33.72gigs compressed bytes what is actually stored on the VTL?

So, when I see the following results:

2011-5-24 16:20:44 to 2011-5-24 17:04:59,0.160098,1.71904e+08,81.433%,1.228,30.4965,2011-5-24 16:20:44,2011-5-24 17:04:59

It means I have a 163.94gigs database, 81% change rate, factoring (dedup only) of 1.23 and a compressed size of 30.49, so an "actual" global factoring ratio of 5.56 ?

I'm still confused a bit.

Thanks!
 
Protectier first dedupes the data, then performs compression on the data that is written to its filesystem.

Uncompressed size was: 165gigs
Change rate: 99%
Factoring ratio: 1.0005
Compressed bytes: 33.72

For the above, you backed up 165 GB, had a dedupe ratio of 1.0005, and compressed bytes to the PT filesystem were 33.72 bytes.

You can look at it like this: Hyperfactor ratio (Dedupe+Compression) = 165 / 33.72 = 4.9

It would be good to also verify via rman with: list backup completed between
This will show which datafiles are in each backupset.

Compression Ratio = Hyperfactor/Dedupe = 4.897

For this one:

2011-5-24 16:20:44 to 2011-5-24 17:04:59,0.160098,1.71904 +08,81.433%,1.228,30.496 5,2011-5-24 16:20:44,2011-5-24 17:04:59

You backed up 164 GB, deduped at 1.228, and compressed bytes written to PT is 30.496GB:

Hyperfactor Ratio = 164/30.496 = 5.4
Compression Ratio = 5.4/1.228 = 4.4

This looks like the second backup with TDPO? It does not look like filesperset is working then. I've found that sometimes the settings in the rman script might not work if they aren't separated properly.

As an example, we separate the archive log backups from the full backups:
run {
allocate channel t1 type 'sbt_tape';
setlimit channel t1 kbytes 2000000 maxopenfiles 1 readrate 200;
allocate channel t2 type 'sbt_tape';
setlimit channel t2 kbytes 2000000 maxopenfiles 1 readrate 200;
backup incremental level=0 cumulative
filesperset = 1
database keep until time 'sysdate +30' logs include current controlfile
release channel t1;
release channel t2;
}

run {
sql 'alter system archive log current';
allocate channel t1 type 'sbt_tape';
setlimit channel t1 kbytes 2096640 maxopenfiles 32 readrate 200;
allocate channel t2 type 'sbt_tape';
setlimit channel t2 kbytes 2096640 maxopenfiles 32 readrate 200;
backup archivelog all
delete input;
release channel t1;
release channel t2;
}

You'd also need to run two backups once filesperset starts working to determine the dedupe ratio.
 
Thank you so much for such a thorough answer. It now seems very clear to me and this was really helpful!

You are right, it was the second backup through tdpo. I'm working with the DBA to change the RMAN script to send the archive logs in a different run set and always force the filesperset=1 into the run command. We'll see how the 3rd one reacts. If I don't see any big improvements after 4-5 backups in the hyperfactor global ratio, I'm not sure the customer will want to invest on:
- new hardware for the XIV for the 7650
- ++ the $$$ for the TDP.

It seems I was confusing the factoring ratio (dedup) shown in the analyze_sessions output with the global hyperfactor ratio.
 
so I did a third test right now on a UAT database, that has not even changed since 2 days ago, and I get a dedup ratio of 1.00072 ! I'll try now with the archive logs outside the first run, like you proposed. For now, the DBA launched it like this, with 4 allocated channels:

backup database plus archive log filesperset=1;

results:
2011-5-27 11:28:59 to 2011-5-27 13:04:53,0.313983,3.37137e+08,99.9283%,1.00072,68.6825,2011-5-27 11:28:59,2011-5-27 13:04:53

Is it important that the maxopenfiles be set to 1, or you're just doing that from a performance point of view and it has no specific value for the dedup ratio ?
 
Last edited:
Running this command: backup database plus archive log filesperset=1; usually results in multiple datafiles being lumped together in each backupset -- we saw the same thing and had low (less than 2) dedupe ratios. Max open files I believe is more for speed but I don't have a comparison for how it affects dedupe/performance.

Have your DBA run a "list backup completed between" for the different backup settings and he should be able to verify if multiple dbf files are being used for each dataset. For best results with dedupe, you need to have just one file per each backupset.

I haven't played around that much with the backup commands in RMAN. I just know what works and didn't work for us. For some reason, when everything is backed up under the same command, filesperset gets ignored and the dedupe ratios are very poor.
 
you are right, I just checked through rman, and the only thing that was sent into separate backupset were the archived logs. The actual data files were sent together.

I just did a backup now as you described, without the archived logs, and I get 1 backupset per datafile. I still get the same dedup of 1.000xx and compression of about 4.7, but it was the first run like this. I'll run another one tomorrow morning and see if I improve drastically on it. It looks like that from an RMAN perspective:

BS Key Type LV Size Device Type Elapsed Time Completion Time
------- ---- -- ---------- ----------- ------------ ---------------
384 Full 393.50M SBT_TAPE 00:00:21 31-MAY-11
BP Key: 384 Status: AVAILABLE Compressed: NO Tag: TAG20110531T125508
Handle: backup_bqmdnd12_1_1 Media:
List of Datafiles in backup set 384
File LV Type Ckp SCN Ckp Time Name
---- -- ---- ---------- --------- ----
4 Full 10228558536 31-MAY-11 /xxx/xxx.dbf

BS Key Type LV Size Device Type Elapsed Time Completion Time
------- ---- -- ---------- ----------- ------------ ---------------
385 Full 394.75M SBT_TAPE 00:00:21 31-MAY-11
BP Key: 385 Status: AVAILABLE Compressed: NO Tag: TAG20110531T125508
Handle: backup_brmdnd12_1_1 Media:
List of Datafiles in backup set 385
File LV Type Ckp SCN Ckp Time Name
---- -- ---- ---------- --------- ----
5 Full 10228558537 31-MAY-11 /yyy/yyy.dbf
....

Results from VTL analyze_sessions

- 2011-5-31 12:56:14 to 2011-5-31 14:32:19,0.312989,3.36069e+08,99.7962%,1.00204,68.1976,2011-5-31 12:56:14,2011-5-31 14:32:19
 
Hi,

Just wanted to leave you feedback and thank you again for your help. With the filesperset = 1 correctly set up, I ran a second backup on the same database again, that didn't change at all since yesterday, and I got really impressive dedup ratio this time. I think I'll be able to convince my management to buy TDP now with those kind of numbers!

- VTL:
- 2011-6-2 10:50:44 to 2011-6-2 12:25:17,0.312989,3.36069e+08,1.73062%,57.7828,1.16533,2011-6-2 10:50:44,2011-6-2 12:25:17
- sent: 320.50
- global hyperfactor ratio: 273.93
- size after hyperfactor: 1.17

Thanks again.

Frank.
 
Hello,
Is there a way to see the deduplication ratio for just one virtual library? I have a customer with db2 for sap backups, and deduplication ratio is quite low for these servers (1:4). But recently, on another customer with oracle for sap backups, I believe that the ratio is about 1:9 but I am not sure. Is there a way to only display deduplication ratio for one specific virtual library?

Thanks!
 
Back
Top