Can you move a file space between nodes?

marclant · Feb 25, 2020

ldmwndletsm said:
If you change it to 4, would it still send two file systems to the same drive? Or would it be one per drive and thus force another tape to be loaded?

You would have 2 data sessions, each session would require 1 drive. It's sequential writes, so can't write with multiple sessions. For filesystem backups, you are best to land to a random disk and crank the sessions up higher than 4. I'd go to 10 if you have that many filesystems. At 10, you would get 4 producers, 4 consumers and some data session (resourceutil is weird in the way it works, but higher = more sessions assuming there's enough work).

ldmwndletsm · Feb 25, 2020

Okay, yes, we are using a disk volume for the initial backups, and then it gets moved to tape later ('next storage pool').

But what happens when it gets copied from the disk volume to the copy pool tape or migrated to the primary pool tape? Will the resourceutil value from the node's stanza still apply? Or does that only apply when it's first sending to the disk volume? If not, then won't you be in the same boat when you have to offload the disk volume to tape? Than again, maybe not since it all lives on the disk volume, and the tape library is right there so there's no network bottleneck, etc.?

That's interesting about not being able to write multiple sessions to the same drive. I'm used to doing that with EMC, wherein you have, for example a drive parallelism of, say, 6, and you have a client parallelism of maybe 10, so the client will send up to 10 file systems simultaneously to the server, 6 of which will go to one drive, assuming there's one available that's idle, and the other 4 will go to another drive. A file system will not be split between drives, however. All file systems coming into the drive will be wrapped (multplexed) together. Higher drive parallelism can help to keep the drive streaming optimally, and thus faster backup times but slower recover times since it as to unmultiplex all that. Higher parallelism on the client can get backups done faster, but again, a similar drawback. Obviously, in this case, if a drive is running 4 sessions then it could only accept 2 more, until one freed up.

marclant · Feb 25, 2020

The resourceutil only affects the client backups to the server by controlling the number of sessions. Once in the disk storage pool, it doesn't matter how it got there, it's migrated to tape just the same.

You are right that a constant stream of data better maximizes the tape drives. That`s why as a general rule with Spectrum Protect we send file system backups to a disk pool. The incremental-forever nature of backups causes the clients to not send a steady stream of data, and you end up with a lot of shoe-shinning if going to tape. So all going to disk first, the migration assures a steady stream to tape. That leaves your tape drives free to database backups directly to tape because they usually consist of a small number of large objects.

ldmwndletsm · Feb 26, 2020

So just to recap, two nodes will work, but it's usually not used for this purpose. That said, the dual node method achieves a kind of faux parallelism in the sense that each stanza can be associated with a different schedule and the second schedule isn't hobbled by the first. So if you had one node (no proxies) and one schedule, and it took 8 hours for everything to finish then that could be bad, whereas if you could have two schedules that could overlap (something dual nodes allows; 2 dsmcad processes) then even if the second was started an hour later, maybe both could finish in 5 hours, with the data split between them.

But as you pointed out, you could use proxies for this and just use a single stanza, upping the utilization beyond 2, as necessary, thus achieving true parallelism in the sense of the word.

So if you use two stanzas, and you use the 'resourceutilization' setting, then does this apply per stanza, as in you could have resourceutilization 4 in both stanzas and each would use 4, or is it 4 for the whole system?

Also, with two stanzas, if resourceutilization isn't set then it would default to 2 each so each will have one walker and process one file system at a time versus if you had it set to 4 wherein each might do two at a time?

Finally, if you use proxies, how do you tell TSM which file systems will be managed by which one?

And if the default value of 2 is sufficient then does each automatically get that, or do you need to specify the resourceutilization for each? If you set it to 4 each then it's really 4, not each swapping back and forth for the 4, right? Guess that would be self defeating.

marclant · Feb 26, 2020

ldmwndletsm said:
So if you use two stanzas, and you use the 'resourceutilization' setting, then does this apply per stanza, as in you could have resourceutilization 4 in both stanzas and each would use 4, or is it 4 for the whole system?

It applies to each invocation of the client using that specific stanza. So if scheduler1 is using stanza1, that applies to scheduler1. If you also start dsmc using stanza1, it applies to dsmc.

ldmwndletsm said:
Finally, if you use proxies, how do you tell TSM which file systems will be managed by which one?

Same as you do now, each scheduler for each proxy would have their own stanza.

If your backups are working now with 2 nodes, but don't finish fast enough. Just start by increasing the resourceutil to get parallelism before you start changing which node backs up which filesystem. Like you said, multiple nodes on a single machine can make it complicated to figure out which node to use to do the restore. If one has more data to process than the other, you can increase one more than the other.

ldmwndletsm · Feb 26, 2020

Got it.

I guess I'm still a little confused here, though, with the proxies versus the nodes.

Option 1. If we used a single stanza and one node then we could add all the 'domain /filesystem' statements, that we would otherwise have split between the two stanzas, into just one stanza, and we'd also have to use the sundry includes to send the respective file systems to the required management class. But if we did this then it would seem that we could do it all with just one schedule, although we might need to adjust the resourceutil value since we would not otherwise be running two or more file systems at a time (by default) like we would with two stanzas.

Option 2. Have two stanzas and two proxies (one per stanza), divide the file systems up as we would with two nodes, use the resourceutil option (per stanza), as necessary (if necessary) and use two schedules. And this could all be done with just one node wherein the data would be recoverable from a single node as far as the database is concerned?

I thought two schedules required two nodes? Or is it that two schedules requires either two nodes or two proxies?

If we use two proxies, then we'd have to have two schedules, right?

Also, when using proxies, does two schedules require two dsmcad processes (one for each respective .opt file) like it does when using the two nodes/two stanzas approach?

ldmwndletsm · Feb 28, 2020

A question here on database speed in regards to one versus two nodes.

With over 150+ 1.7 TB file systems for the one management class and 100+ file systems for the second management class, that's a lot of files. The 100+ files systems vary in size, but most are well under 500 GB, and collectively, this group is a fraction of the data for the 1.7 TB file systems.

If all this data is backed up using a single node, I wonder if the number of entries that TSM is tracking in its database for the one node could slow things down during the file system walks, recovers, queries, etc. versus if it was split between two nodes.

I know little about how the database works internally, and I would think it would be well optimized. Moreover, the database is spread out across multiple disks or file systems for better performance. But is it possible that if we had to query node 1 (handling the 100+ file systems) that it might be more efficient since it might not have to search through as many entries (a fraction of them) as it would if node 2's data (150*1.7 TB) was also stored with it, all as one node?

marclant · Mar 2, 2020

No difference.

Data is stored hierarchically by node -> filespaces -> files and directories. ALL the space is consumed by the latter. During the backup, queries are done at the filespace level. So it doesn't matter if it's under a single or 2 nodes. It just queries the filespace it is backing up now.

So if you have a single node and a high resourceutil, you will have multiple producer threads querying the server for multiple filespaces, and scan the corresponding filesystems for changes and send those to the consumer for backup. Whatever resourceutil you would have used with 2 nodes, use the sum of that with a single node.

If you do the same with 2 nodes, the only difference is that you will have 2 processes running (1 for each node), and multiple producer threads across 2 processes querying the server for multiple filespaces, and scan the corresponding filesystems for changes and send those to the consumer for backup. Whatever resourceutil you would use for a single node, split that between the 2 nodes.

The first method is better because it's easier to manage. The second method works, but offers no benefits unless the 2 nodes were on different machines, that way you have more hardware working for you. As you keep increasing the resourceutil, you will hit a bottleneck at some point. That's the maximum number of threads that it will be able to handle. Let's assume for sake of arguments it's 16. 16 threads in a single process is the same as 2 processes with 8 threads each. You still just have 16 threads processing data.

ldmwndletsm · Mar 2, 2020

Okay, thank you very much for your responses.

This raises an important question. If you chose the proxy method then can you have the target node be one of the two already existing nodes and then create a third node to replace the one listed in the first stanza?

For example, if system_data was the node listed in stanza 1 of 2, and system_data_arc was listed for stanza 2 of 2, and we'd already run some backups for system_data then if we created the target node system_data_all and kept sytem_data and system_data_arc as the two agent nodes then we would have to rerun those backups, correct?

But if we instead changed system_data to system_data_nonarc, and made the target node: system_data then the target node would already have that previously backed up data listed for it. Any more backups of those same file systems under system_data_nonarc would continue as they were since they're still being tracked as from system_data. Otherwise, anything new would be backed up for the first time as always.

Would this work? If not, we don't have that much data that we've backed up so re-running backups is not a big deal, but if this would save us the time, that would be nice.

On another issue with proxies, since both agent nodes would have all their data being tracked by the target node then do both use the settings for the target node configuration, e.g. MAXNUMMP and not the settings in their individual node configurations?

marclant · Mar 2, 2020

ldmwndletsm said:
This raises an important question. If you chose the proxy method then can you have the target node be one of the two already existing nodes and then create a third node to replace the one listed in the first stanza?

Yes.

ldmwndletsm said:
On another issue with proxies, since both agent nodes would have all their data being tracked by the target node then do both use the settings for the target node configuration, e.g. MAXNUMMP and not the settings in their individual node configurations?

Yes

But if you are going to backup the data from the 2nd node back into the original node via a proxy. Why not try the single node method with a higher resourceutil (sum of both CADs)? Add proxies later if you feel it's needed, but I can't see it. If you increase the resourceutil to the point you maximize the network pipe, having 2 schedulers maximizing the network pipe will not be quicker or slower than a single scheduler maximizing the network pipe.

ldmwndletsm · Mar 2, 2020

We just wanted two schedules for better control over when the backups for each set of data (two sets) starts, and we wanted to allow overlap which is a critical concern, so we needed two nodes. Perhaps the proxy method might be preferred if we later decided to add data on another host to that same collection wherein it could be restored and tracked from the same target, so the proxy method seems more scalable from that perspective. However, after thinking more about that, if you stuck with two nodes (A and B) and no proxies, and you later decided to add data to some other host (node C), and you wanted that tracked to node B then it seems that you could just make node B a target node and make C a proxy to it, and go from there?

In this case, it wouldn't matter that node B had never been a target node? You could now make it one, and not lose any of the data associated with it from before? So from this perspective, could we adapt later down the road?

RecoveryOne · Mar 4, 2020

marclant said:
Add proxies later if you feel it's needed, but I can't see it. If you increase the resourceutil to the point you maximize the network pipe, having 2 schedulers maximizing the network pipe will not be quicker or slower than a single scheduler maximizing the network pipe

I can't speak for, ldmwndletsm, but I found having two independent schedulers are faster in some scenarios. The example I can give is my home and department shares. Millions of files and folders and set up like E:\Share and E:\Users on the same drive. Even with a resourceutil of 10, it takes a while to traverse the 'Share' and all the millions of sub-folders/files before it even thinks about hitting the 'Users' area.

Its not always about maxing the network pipe, even with client side data reduction turned off, there's a lot of time spent on scanning the files. So now I have two schedulers with a resourceutil of 10, each one performing work on a specific directory. Now, running two schedulers from the same node can lead to some bandwidth issues, disk io issues, but it allows a more efficient backup in a shorter amount of time over all. Or at least in my case.

As to ldmwndletsm's most recent post, I've never attempted that scenario, and I'm afraid my knoweldge of the product and the results of the ask are beyond my skill set

marclant · Mar 4, 2020

ldmwndletsm said:
In this case, it wouldn't matter that node B had never been a target node? You could now make it one, and not lose any of the data associated with it from before? So from this perspective, could we adapt later down the road?

When you connect as a proxy, you need the asnode option to tell it you are connecting on behalf of the target. Without that option, you are connecting at the node itself. You can switch back and forth without losing data.

marclant · Mar 4, 2020

RecoveryOne said:
I can't speak for, ldmwndletsm, but I found having two independent schedulers are faster in some scenarios. The example I can give is my home and department shares. Millions of files and folders and set up like E:\Share and E:\Users on the same drive. Even with a resourceutil of 10, it takes a while to traverse the 'Share' and all the millions of sub-folders/files before it even thinks about hitting the 'Users' area.

In that scenario, it makes sense because you are also splitting a large filesystem in two. So you get more producers working through the data, and also more consumers. But in the OP's case with 150 filesystems, getting more filesystems processed at the same time is easily done with increasing resourceutil.

RecoveryOne · Mar 4, 2020

marclant said:
But in the OP's case with 150 filesystems, getting more filesystems processed at the same time is easily done with increasing resourceutil.

You are correct, but there still might be a point of diminishing returns with the resourceutil setting depending on how deep their filesystems are. In that case, running two (or more) schedulers to tackle /fs1 to fs20 and then /fs21 to 40 as an example along with a high resourceutil could work best. Really just depends on where the active data is living within those filesystems.

That's the great thing about TSM, you can bend it to your will (most of the time).

Can you move a file space between nodes?

marclant

ldmwndletsm

marclant

ldmwndletsm

marclant

ldmwndletsm

ldmwndletsm

marclant

ldmwndletsm

marclant

ldmwndletsm

RecoveryOne

marclant

marclant

RecoveryOne

Data Privacy Impact Assessment

Sponsor ADSM.ORG

Navigation Menu

NordVPN 3 Months FREE

Forum statistics