Bad restore performances with TDP for SAP

benoit16

ADSM.ORG Member
Joined
Sep 16, 2016
Messages
24
Reaction score
0
Points
0
Good morning,

We are testing TDP for SAP.

We have a TSM server conected over FC to Jaguar Generation 4 drives.

When we test the restore function of a DB of 250 GB, the restore rate is around 50 MB/s.
On the same host, when I test EMC Networker restore, I have around 300 MB/s.

Could you please tell how I can proceed to diagnose where the bottle neck is for this TDP restore?

Thanks in advance for your answers.
 
How many drives are working during the restore? LAN Free?

If the DB are in just one tape, then the restore is running just one stream. Unless you have defined multiple streams - and is working LAN Free - restore from 1 tape would really be slow.

For non-LAN Free, 50 mb/sec translates to 3 GB/minute or 180 GB/hour which is about average for a setup like this.
 
How long did the backup take? How many tablespaces were backed up at once? How many I/O streams? What network speed is available? Which database is it?

I have a DB2 database where we do 34 tablespaces at a time, 3 I/O streams over 10G network and we do about 1.2TB per hour to backup. Restore time should be similar though we haven't done one in production in a while.
 
Hello,

How many drives are working during the restore? LAN Free?

If the DB are in just one tape, then the restore is running just one stream. Unless you have defined multiple streams - and is working LAN Free - restore from 1 tape would really be slow.

For non-LAN Free, 50 mb/sec translates to 3 GB/minute or 180 GB/hour which is about average for a setup like this.

From what I see one drive is used during the restore.

1. As there is a storage area in between, can I configure mutliplexing?

LAN is free and should not be a problem as with Networker, it is not a problem.

2. Why do you say that 50 MB/s is about average for a setup like this?
Generation 4 Jaguar is 250 MB/s without compression and DB files compress well.
250 MB/s is far from 50 MB/s.
Could you please provide me more details on this point,

Regards,
 
Hello,

How long did the backup take? How many tablespaces were backed up at once? How many I/O streams? What network speed is available? Which database is it?

I have a DB2 database where we do 34 tablespaces at a time, 3 I/O streams over 10G network and we do about 1.2TB per hour to backup. Restore time should be similar though we haven't done one in production in a while.

The backup takes less than 8 minutes:
BKI1215I: Average transmission rate was 1662.964 GB/h (473.021 MB/sec).
BKI1227I: Average compression factor was 1.000.
BKI0020I: End of program at: Thu 15 Sep 2016 04:40:16 PM CEST .
BKI0021I: Elapsed time: 07 min 45 sec .

As I have posted in Oracle forum the DB is Oracle.

How can I know how many tablespaces were backed up at once?

What do you mean by I/O streams? Do you mean backup sessions?

Network interface are 10Gbps on bothe side.

What I would also like to know is how can I locate the bottle neck.

Thanks in advance for your answers.
 
Questions:

Do you have a disk pool that caches data during backup? Or, does the backup go directly to tapes? If if goes directly to tapes, how many sessions?

I/O streams=number of sessions.

To determine that restore is using LAN Free sessions, while running a restore, issue the command on the TSM Server CLI: "show sessions". If see sessions saying lan free is YES, then the restore is LAN Free. Also, during a restore, issuing the command "q session" should give you an indication if the restore is LAN Free - the bytes sent is not growing proportionately with the restored data. If it is, then it is not LAN Free and the restore takes a long time.
 
Good afternoon,
Do you have a disk pool that caches data during backup?

To determine that restore is using LAN Free sessions, while running a restore, issue the command on the TSM Server CLI: "show sessions". If see sessions saying lan free is YES, then the restore is LAN Free. Also, during a restore, issuing the command "q session" should give you an indication if the restore is LAN Free - the bytes sent is not growing proportionately with the restored data. If it is, then it is not LAN Free and the restore takes a long time.
Yes, we have a disk pool that caches data during backup.
This is what I call a staging area. :)

Could you please provide me more information on what LAN free is?
Could you please tell me why LAN free is so important?

Thanks in advance for your answers.
 
Troubleshooting performance issues can be lengthy and complex. This is a good start:
diagnose_bkup_restore_perf.gif

source: https://www.ibm.com/support/knowled...ibm.itsm.perf.doc/t_ptg_bkup_rstore_diag.html
 
Good afternoon,

Yes, we have a disk pool that caches data during backup.
This is what I call a staging area. :)

Could you please provide me more information on what LAN free is?
Could you please tell me why LAN free is so important?

Thanks in advance for your answers.

Ok, so you are not using LAN Free. In that case restores will really be slow. What is happening is that data is cached to disk that accounts for higher throughput. After that, the data is migrated to tape. The data (in all likelihood) is stored on one tape. This is why restore is slow.

If you have LAN Free, the total input streams=output streams and restores will mostly be near backup speeds (number of backup sessions will equal number of restore sessions which equal number of tape and tape drives used).
 
Hello,

When I restore, data is read from tape and sent to the network.
From what we see with backups, the bottle neck is not the network or the disk IOs.

A generation jaguar 4 has a throughput of 250 MB/s.
Would you know why is sending only at 50 MB/s?

Thanks in advance for your answer.
 
Hello,

When I restore, data is read from tape and sent to the network.
From what we see with backups, the bottle neck is not the network or the disk IOs.

A generation jaguar 4 has a throughput of 250 MB/s.
Would you know why is sending only at 50 MB/s?

Thanks in advance for your answer.

As I said, this is average - data is read, organized, transferred, and verified - cycle repeats. This is why restores are slow. The rated throughput is a sustained transfer and does not account verification like CRC and pauses during restore for data reorganization, etc.

If you truly need fast restores, do LAN Free with devclass=file. However, this solution requires disk acting like tapes (sequential) and can totally negate your tapes and tape drives.
 
Hello,

I have one more question.
In my case, could I increase the restore speed by changing multiplexing value?

Thanks in advance for your answer.
 
Hello,

I have one more question.
In my case, could I increase the restore speed by changing multiplexing value?

Thanks in advance for your answer.

If you mean by multiplexing=number of streams, then yes.

See 'help define stgpool' for primary random and primary sequential. THIS DOES NOT APPLY TO COPY POOLS.

The idea is to set collocation=node on the primary sequential and set migration to what value your streams are set on both primary random and primary sequential. When you restore from primary sequential (the online tape pool), you should get more than one stream.

However, how much faster? There is no simple way to calculate this. Definitely, restore speeds will not be equal or even get near backup speeds.

As mentioned, use devclass=file and you can have near backup speeds for restores.
 
Hello,
As mentioned, use devclass=file and you can have near backup speeds for restores.

I am just a TSM user.
Using Lan Free has implications that are out of the scope of just my TDP backups/restores.

I have discussed with my TSM admin colleague.
They are not in favour of using Lan Free-

This is why I am interesting in optimisations that would feet in a classical (non lan free) environment.

Regards,
 
Hello,
See 'help define stgpool' for primary random and primary sequential. THIS DOES NOT APPLY TO COPY POOLS.
Do you mean that, in case of using cache disk, multiplexing is not relevant?

Regards,
 
Hello,

Do you mean that, in case of using cache disk, multiplexing is not relevant?

Regards,

Increasing the number of streams definitely makes the backups run faster, and is true for restores to some extent.

The ideal setup is restore=backup speed which is hard to achieve when your gauge is backup to disk and restores from tapes. Restoring from a sequential media is really slow. Random devices are really fast.

You can have restore speeds run faster if you keep the backup on disk for some time that you think is needed for restores. As an example, if you do restores once a week, you can setup the disk NOT to empty out when data is moved to tape for a week. Just be aware that you will need a HUGE disk to hold this cached data.
 
Increasing the number of streams definitely makes the backups run faster, and is true for restores to some extent.

The ideal setup is restore=backup speed which is hard to achieve when your gauge is backup to disk and restores from tapes. Restoring from a sequential media is really slow. Random devices are really fast.

You can have restore speeds run faster if you keep the backup on disk for some time that you think is needed for restores. As an example, if you do restores once a week, you can setup the disk NOT to empty out when data is moved to tape for a week. Just be aware that you will need a HUGE disk to hold this cached data.

Hey moon-buddy.

I think you are wrong on some points here. If we were talking about restore of a directory or server, you'd be spot on. This is the restore of a database direct from tape. The data was saved sequentially and will be restored sequentially and the tape drives should be able to restore at near their max speed if the network and oracle database disk doesn't hold it up. Restoring off tape will be far quicker than disk. They managed to back it up at 1.6TB per hour.

Remember also that this is TDP not the BA client so it will behave a little different.

Benoit16:
Looking our our init<sid>.utl file, we have the following parameters.

MAX_SESSIONS 4 - This equates to the number of I/O sessions in DB13 is SAP. Essentially in DB13, you set how many tape drives you want to use although this cannot be higher than the max_sessions in the init<sid>.utl file or the maxnummp setting for the node.

SESSIONS 34 - This is the number of tablespaces backed up at the same time and relates to the parallelism setting in DB13.

Restoring the database, you will want to use similar setting to the ones used to back it up and I think this is likely to be why you are getting a poor result. Do you use backom to do the restore? I would be comparing the backup settings with the restore settings and looking for differences there.

My experience with the TDP is with DB2 so the config may be different for Oracle.
 
Hello,
MAX_SESSIONS 4 - This equates to the number of I/O sessions in DB13 is SAP. Essentially in DB13, you set how many tape drives you want to use although this cannot be higher than the max_sessions in the init<sid>.utl file or the maxnummp setting for the node.

SESSIONS 34 - This is the number of tablespaces backed up at the same time and relates to the parallelism setting in DB13.

Restoring the database, you will want to use similar setting to the ones used to back it up and I think this is likely to be why you are getting a poor result. Do you use backom to do the restore? I would be comparing the backup settings with the restore settings and looking for differences there.

What is backom?
For Oracle DB, I can only use brrestore that uses TDP backint or the file manager (backfm).

In my case I set the same number of session for restore and backup.
But the restore use a single session because the backup is stored on a single tape.

In case of Oracle, from what I have read, session and max_session is similar.
session is in a server subpart and max_session is a global configuration.
But both specifies the number of connection between the client and the TSM server.

On your side, which bandwidth do you get, per tape drive, when you backup and restore?

Thanks in advance for your answers.
 
But the restore use a single session because the backup is stored on a single tape.
To use multiple sessions, you will need to do one of the following:
Option 1: do like Moon-buddy said here: https://adsm.org/forum/index.php?th...formances-with-tdp-for-sap.31286/#post-129945
Option 2: do like Moon-buddy said here: https://adsm.org/forum/index.php?th...formances-with-tdp-for-sap.31286/#post-129942
Option 3: backup directly to tape so that the data is written on multiple tapes and therefore restored from multiple tapes (optionally, use LANFree with this for even better performance)

As long as the data is on one tape, you will only have 1 session for the restore, there's no way around it.
 
Back
Top