ADSM-L

Re: [ADSM-L] Select or Query help

2012-07-12 14:30:07
Subject: Re: [ADSM-L] Select or Query help
From: Alex Paschal <apaschal5 AT FRONTIER DOT COM>
To: ADSM-L AT VM.MARIST DOT EDU
Date: Thu, 12 Jul 2012 11:23:40 -0700
Hi, Margaret, Geoff.

If you select the key sequence for the backups table, you can optimize
your select so it doesn't kill your TSM instance.

v5: select colname, index_keyseq from syscat.columns where tabname='BACKUPS'
v6: I think it's keyseq instead of index_keyseq

If you then add clauses to your "where" predicate, such that you keep
the key sequence intact as low as possible, you minimize the row
scanning you do.

For example, in the v5 tabschema (I don't have a v6 tabschema saved),
the index_keyseq order for the BACKUPS table is node_name,
filespace_name, filespace_id, state, type.  In this case below, two
selects will actually run faster than one select.

select * from backups where
     node_name='NODEA' and
     filespace_name='/mnt1' and
     filespace_id='10' and
     state='ACTIVE_VERSION' and
     type='FILE' and
     ll_name='myfile'
select * from backups where
     node_name='NODEA' and
     filespace_name='/mnt1' and
     filespace_id='10' and
     state='INACTIVE_VERSION' and
     type='FILE' and
     ll_name='myfile'

runs faster than:

select * from backups where
     node_name='NODEA' and
     filespace_name='/mnt1' and
     type='FILE' and
     ll_name='myfile'

Note the removal of the FILESPACE_ID and the STATE clauses from the
WHERE predicate.  This will cause more row scanning because it's not
indexing as deeply; it stops after FILESPACE_NAME.  Even though TYPE is
used as a clause, because of the break in the indexing, it's not used as
an index.  Or, at least, that used to be the case in v5.  I suppose it's
possible v6/db2 handles skips in the index more gracefully, but someone
else would have to chime in on that.  :-)

Pro-tip 1:  For select testing, do a "query occupancy" to find a
filespace with fairly few objects and test on that filespace.
Pro-tip 2:  For large selects that aren't friendly to indexing, build a
shell/perl/whatever script to parse the occupancy table and issue a
billion small indexed selects.  I find it runs orders of magnitude
faster than one large select that kills your instance. Again, Y(v6)MMV.

Alex


On 7/12/2012 10:41 AM, Clark, Margaret wrote:
Well, since you asked for "any suggestions"...

It seems to me you could get this report using a command like

select archive_date,filespace_name from archives a,file_name from contents c where 
archive_date >= current_timestamp - 1 days and  a.node_name = c.node_name and 
a.filespace_name = c.filespace_name

or

select backup_date,filespace_name from backups b,file_name from contents c where 
backup_date >= current_timestamp - 1 days and  b.node_name = c.node_name and 
b.filespace_name = c.filespace_name

However, I haven't tested these commands, and I'm not about to.
On our system, any such command would run forever and probably die before it 
completed.
I have long since given up trying to use select statements using either the 
contents or backups tables, they are way too big.
If this report is not produced regularly, maybe it's because your predecessor 
gave up too!

P.S.  HL_NAME and LL_NAME  are not what you need... those are the IP address 
and port number for a client.

- Margaret

-----Original Message-----
From: ADSM: Dist Stor Manager [mailto:ADSM-L AT VM.MARIST DOT EDU] On Behalf Of 
Geoff Gill
Sent: Thursday, July 12, 2012 9:13 AM
To: ADSM-L AT VM.MARIST DOT EDU
Subject: [ADSM-L] Select or Query help

Good day all,

Below is a section of a file I was sent. I'm told the person who did this is 
long gone and nobody knows what command he used to get it. I was hoping someone 
might recognize it and help recreate it since I'm not sure we'll even be able 
to contact that person for help.

There are no headers for the columns but they seem obvious. You can see from 
the output the date, file://server/drive, file\path and filename. The filename 
and data in the file seems to indicate that this command is run to output one 
specific dates worth of data of every file a specific node backed up.

If anyone has any siggestions I would be eternally grateful because I'm being 
hammered to get an answer so I can run it and get the data.

       2011-10-17         \\bpsnt91\c$     \DOCUMENTS AND         NTUSER.DAT
02:36:24.000000                          SETTINGS\AFSADEV\
       2011-10-17         \\bpsnt91\c$     \DOCUMENTS AND         NTUSER.DAT.LOG
02:36:24.000000                          SETTINGS\AFSADEV\

      2011-10-17         \\bpsnt91\c$     \DOCUMENTS AND         NTUSER.INI
02:36:24.000000                          SETTINGS\AFSADEV\ Thank You Geoff Gill


<Prev in Thread] Current Thread [Next in Thread>