Filespace collocation groups versus collaction groups?

ldmwndletsm

ADSM.ORG Senior Member
Joined
Oct 30, 2019
Messages
232
Reaction score
5
Points
0
If you have a lot of file systems on a client, and you wanted to collocate this stuff would it be better to create a bunch of collocation file space groups for the same node (each would obviously list only one file space)? Or would it be better to create multiple nodes (multiple stanzas) wherein each stanza handles multiple, specific file systems (e.g. 'domain /filesystem' in dsm.sys), and each node is then added to a different collocation (node) group?

We have a lot of file systems on this one host, but half of them will be written to their own storage pool (A), and only one or two other clients will use that same pool, but they have little data. The other half of the file systems will use a different pool (B), but so will a number of other clients. So pool B will be used by a bunch of clients. We will be using two stanzas (two nodes) for this. Each stanza will use the 'domain /filesystem' statement to explicitly list what is to be backed up.

I'm not too worried right now about pool A since there's so few nodes on there, and most of the file systems are not frequently written to except when their first created, and most daily activity occurs on the newest file system. Once it's close to filling up, a new file system is then created and we move on from there, so we just keep adding more. However, a little space is left on each one so changes can occur later and do, but again, most of the current activity occurs on the latest file system.

My concern here with pool B is that when we first start out, the first-time incrementals, while they will be the largest, will complete okay since we won't add more file systems to the corresponding stanza than we can accomplish in one night. However, as time goes on, and more and more file systems are added, the amount of time that it might take to walk a filesystem for the nightly incrementals, back it up, move to the next file system, walk that, etc, etc. might take too long.

Could we expedite this by using collocation? We will be writing to disk and then the data will be moved to tape (no disk cache), but it will not stick around long on disk (maybe a day or so). So it may be moot since it will go to disk first?

Splitting things out by file space seems a mess since there's so many and all the file spaces have to be the same for the node name for that collocation group, right? So we'd end up with one collocation group for each file space, each listing the same node and a unique file system. Kinda messy.
 
The time it takes to scan and process a client filesystem has nothing* to do with where the actual backed up data is stored, as the client is querying the TSM database.

There are options you can undertake to speed up the backup process such as resourceutilization. Client performance tuning is more of an art than a science.

Colocation is designed to help you use your backup server storage more efficiently, facilitate faster restores, or the 'best of both' so to speak. For example, I have a set of servers that don't perform any data reduction and rarely delete data, I have defined them into a colocation group to get the best capacity of my tape resources so I'm not always running a reclaim on 60% utilized tapes. Here's a good doc on colocation: https://www.ibm.com/support/knowled...0/com.ibm.itsm.srv.doc/t_colloc_planning.html
By using groups I was able to free up 20 or so LTO6 volumes in my primary tape pool.

If I'm understanding your concern correctly, you are worried about a client that has say 20 filesystems with several million files underneath each going to one 'node name'. So, resource utilization will help in scanning and backing up those files. However, there could be a point where you will need to set up multiple agents and tie them together with proxy and target node definitions. Then have each proxy agent process a subset of those files assuming the server and disk IO can keep up.

There are other 3rd party products that help serialize the TSM backup operations. One that I am familiar with, but do not have deployed is this: http://www.general-storage.com/PRODUCTS/dsmISI-MAGS/dsmisi-mags.html

Hope this helps.

*Unless talking about some TDP products, then yes where some control files for say VM's are stored it can affect backup performance.
 
Back
Top