Need Some Ideas

GregE

ADSM.ORG Senior Member
Joined
May 12, 2006
Messages
2,089
Reaction score
31
Points
0
Website
Visit site
Based on this post... http://adsm.org/forum/showthread.ph...store-Question&p=108229&viewfull=1#post108229

....I need to rework my MS SQL backup strategy. Currently everything goes to disk, no LAN-Free, and migrates to tape. And based on the way that tape restore acted, I should keep that primary target disk-based.

Based on the issue in the above post, I have expanded my primary disk pool and enabled caching so there is a copy of all MS SQL data in that disk pool at all times. The crawling tape restore could have been disastrous, had it been needed, and now I do NOT trust SQL restore from tape. My tape pool is collocated by Filespace, but I don't have enough tapes that every filespace can have its own, but it still should keep filespace data close together.

What are some of your recommendations other than what I have done? And have you also seen the type of crawling restore from tape? Remember too that it was the same data that was restored earlier in the day at a normal pace, and no reclamation had run on those tapes between restore #1 and #2, so no data had moved. (though Expiration HAD run)
 
In contrast, I still believe in tape as a restore from media.

Can you have your collocation set for 'by node' so the data is concentrated into a fewer tapes than by filespace?

SQL won't restore in parallel so there isn't much use for collocation by filespace.

I am not a big fan of data being stored only on disk - what happens during a real DR? A lot of folks had moved over to a VTL environment for both primary and copy pools but never had thought of DR situations. They simply don't have a failover site to capitalize on a non-tape environment.

I still suggest that you go ahead with tape for offsite, and implement a dual primary pool strategy: disk and tape. This way you are well covered.
 
Thanks Ed. DR is currently tape. But onsite, what happened last week was scary situation because the restore was running at 150Kb/sec during that very slow one. I can't do tape-only having experienced that, unless I know exactly why and definitely avoid it again.

I had collocation set to "node" not long ago, but changed it to "filespace" as I was doing a restore (for testing) to a another server and experienced that same horrible performance. It makes no sense. I thought colloc by filespace had fixed things until I ran that crawler.

Have you seen anything like this when restoring from tape? My DR is in jeopardy if this is how restore from tape is going to function.
 
Last edited:
Yes - I have seen this with TSM for VE and it was the TSM for VE (the client software) that was causing the very, very slowwww restore.

It may be that you need to update the TDP (SQL) version or apply maintenance patches.
 
Last edited:
Opened a ticket with IBM. Full pack reads are not happening.......

05/31/2012 18:07:26.501 [019752] [28160] : session.cpp (1411): Recv Verb: ...
05/31/2012 18:07:26.501 [019752] [28160] : commtcp.cpp (1831): TcpRead: Upper level requested 4 bytes
05/31/2012 18:07:26.501 [019752] [28160] : commtcp.cpp (1950): TcpReadAvailable: Issuing recv for 4 bytes.
05/31/2012 18:07:26.501 [019752] [28160] : commtcp.cpp (2091): TcpReadAvailable: 4 bytes read.
05/31/2012 18:07:26.501 [019752] [28160] : commtcp.cpp (1870): TcpRead: 4 bytes read of 4 requested.
05/31/2012 18:07:26.501 [019752] [28160] : session.cpp (1456): sessRecvVerb(): length=8000, verb=07, magic=a5
05/31/2012 18:07:26.501 [019752] [28160] : commtcp.cpp (1831): TcpRead: Upper level requested 32764 bytes
05/31/2012 18:07:26.501 [019752] [28160] : commtcp.cpp (1950): TcpReadAvailable: Issuing recv for 32764 bytes.
05/31/2012 18:07:26.501 [019752] [28160] : commtcp.cpp (2091): TcpReadAvailable: 2522 bytes read.
05/31/2012 18:07:26.501 [019752] [28160] : commtcp.cpp (1870): TcpRead: 2522 bytes read of 32764 requested.
05/31/2012 18:07:26.501 [019752] [28160] : commtcp.cpp (1950): TcpReadAvailable: Issuing recv for 30242 bytes.
05/31/2012 18:07:26.689 [019752] [28160] : commtcp.cpp (2091): TcpReadAvailable: 7230 bytes read.
05/31/2012 18:07:26.689 [019752] [28160] : commtcp.cpp (1870): TcpRead: 7230 bytes read of 30242 requested.
05/31/2012 18:07:26.689 [019752] [28160] : commtcp.cpp (1950): TcpReadAvailable: Issuing recv for 23012 bytes.
05/31/2012 18:07:26.892 [019752] [28160] : commtcp.cpp (2091): TcpReadAvailable: 7230 bytes read.
05/31/2012 18:07:26.892 [019752] [28160] : commtcp.cpp (1870): TcpRead: 7230 bytes read of 23012 requested.
05/31/2012 18:07:26.892 [019752] [28160] : commtcp.cpp (1950): TcpReadAvailable: Issuing recv for 15782 bytes.
05/31/2012 18:07:27.095 [019752] [28160] : commtcp.cpp (2091): TcpReadAvailable: 7230 bytes read.
05/31/2012 18:07:27.095 [019752] [28160] : commtcp.cpp (1870): TcpRead: 7230 bytes read of 15782 requested.
05/31/2012 18:07:27.095 [019752] [28160] : commtcp.cpp (1950): TcpReadAvailable: Issuing recv for 8552 bytes.
05/31/2012 18:07:27.298 [019752] [28160] : commtcp.cpp (2091): TcpReadAvailable: 7230 bytes read.
05/31/2012 18:07:27.298 [019752] [28160] : commtcp.cpp (1870): TcpRead: 7230 bytes read of 8552 requested.
05/31/2012 18:07:27.298 [019752] [28160] : commtcp.cpp (1950): TcpReadAvailable: Issuing recv for 1322 bytes.
05/31/2012 18:07:27.501 [019752] [28160] : commtcp.cpp (2091): TcpReadAvailable: 1322 bytes read.
05/31/2012 18:07:27.501 [019752] [28160] : commtcp.cpp (1870): TcpRead: 1322 bytes read of 1322 requested.

and from IBM support.
"...during the normal part of the bad restore when TSM issues a read to TCPIP for a full packet (32764 bytes) it gets the full packet (then it gives it to the TDP and the TDP gives it to SQL) but during the bad part of the restore that same read gets only a partial return and has to be repeated till the full packet has been read. The problem ranges from bad (15-30 milliseconds) to truly horrible (12 seconds or more)...."

Thus far, suggested (though long shots) Windows registry settings already exist. Another long shot was to raise TDP client TCPWindowSize to 256 or 512. I did that and the problem continues.
 
Back
Top