Can you move a file space between nodes?

ldmwndletsm

ADSM.ORG Senior Member
Joined
Oct 30, 2019
Messages
232
Reaction score
5
Points
0
PREDATAR Control23

Let's say you have multiple stanzas, so maybe you have three nodes for the same physical box wherein each node handles a different list of file systems. And lets suppose that stanza 1 (node A) backs up /filesytem1. If you later change the stanzas so that stanza 3 (node C) now handles /filesystem1, and no longer stanza 1 (and let's say the storage pools and all that stays the same), then what?

If would seem that if you were recovering a directory under /filesystem1 that had never been backed up by node A then no problem since you'll specify the correct .opt file when you run dsmc wherein it will know to restore from node C. Likewise, if it had been backed up by node A at some point in the past, and then also later with node B, and you needed an older copy from node A then you could just specify the corresponding .opt file for node A, BUT:

1. What if you didn't know that it had ever been backed up by node A?

2. What if you need to rebuild it the way it looked yesterday, and this requires tapes from when it was handled by node A and also node B?

How would you tell TSM to do this when the directory's backup history spans two nodes? I thought when restoring data that you have to use the node (specify the corresponding .opt file, unless it's the defult dsm.opt) that corresponds the stanza that managed the data?

Would you have to first recover it from node A and then from node B and overwrite any duplicates?
 
PREDATAR Control23

You can`t move data from one node to another.

When you say 'move' do you mean trying to make some other stanza, for another node, now back that up instead, or do you mean running some kind of 'move' command, or some such thing, to literally update the database?

Just guessing here ... but ... is this correct? Once the data has been backed up by a given node then the only way you can change to another node would be to 1. move the file system to another stanza for another node and run a selective backup, 2. same as 1 but delete the file space and run an incr (BAD since you'd lose the old information) or 3. rename the file system and add that to another stanza and run an incr, thereby forcing the equivalent of a selective backup as in 1.
 
PREDATAR Control23

Your initial question was can you move a filespace between nodes. That's no.


What are describing is not a move at all. You are creating a new node, and doing a new backup to a new filespace. That data may have been backed up previously under another node/filespace, but that's not relevant.
 
PREDATAR Control23

Yes, I see your point. So the file space for a given node cannot span nodes. It's all or nothing. And If you did what I proposed then you'd end up re-baking up all the data again. And if you wanted to do that then you could just list the file space under another node and run an 'incr' -- no need to run a selective. But at this point it's for a different node so it's not the same file space (even though it has the same path name), and all the information in the database from the old node would still remain.

But if you did that then wouldn't you have to remember that if you ever needed to recover a version of a file backed up on that other node? So if some file was deleted before you could re-back it all up again under the new node then you might not know if someone later wanted it restored and it wasn't showing up when querying the backup file space on the new node. Maybe it would be prudent to first unmount and remount the file system read-only before backing it up to the new node, as a precaution?
 
PREDATAR Control23

Generally speaking, you don't use multiple nodes on the same system. There are exceptions, like if it's clustered, so you'd have a node to backup local filesystems, and one node per cluster resource to backup the shared filesystems. Or a case where a group that would be doing the backup/restore of application data without root access outside the regular filesystem backup.

Other than those 2 cases, creating multiple nodes just creates unnecessary complexity, but it will work.

But if you did that then wouldn't you have to remember that if you ever needed to recover a version of a file backed up on that other node? So if some file was deleted before you could re-back it all up again under the new node then you might not know if someone later wanted it restored and it wasn't showing up when querying the backup file space on the new node. Maybe it would be prudent to first unmount and remount the file system read-only before backing it up to the new node, as a precaution?
I don't follow that scenario. If you want the filesystem to be backed up by a different node, just start backing it up daily with the new node, and stop the daily backups of that filesystem with the old node. Just be conscious of the date when doing restores so you pick the right node.
 
PREDATAR Control23

Generally speaking, you don't use multiple nodes on the same system. There are exceptions, like if it's clustered, so you'd have a node to backup local filesystems, and one node per cluster resource to backup the shared filesystems. Or a case where a group that would be doing the backup/restore of application data without root access outside the regular filesystem backup.

Other than those 2 cases, creating multiple nodes just creates unnecessary complexity, but it will work.

We have two nodes on the same system because 1. the data that each is backing up needs to go to different storage pools with different management classes, copy groups, etc. The requirements are different. The two sets of data could have been put on different systems, but that's the hand that was dealt. 2. we need to be able to run concurrent schedules, overlapping between the two, making it difficult to predict a startup window, and we don't want the second node's data to wait until the first has been completed.

G
I don't follow that scenario. If you want the filesystem to be backed up by a different node, just start backing it up daily with the new node, and stop the daily backups of that filesystem with the old node. Just be conscious of the date when doing restores so you pick the right node.

Right. What I was saying was that the first incr backup on the new node will have to do a first-time full, so
what if that doesn't complete before something gets deleted? How would you know that that file was backed up on a previous node? How would you track that later down the road if someone wanted to restore it? How would you know which node to pick or even that it ever existed on another node?
 
PREDATAR Control23

We have two nodes on the same system because 1. the data that each is backing up needs to go to different storage pools with different management classes, copy groups, etc. The requirements are different. The two sets of data could have been put on different systems, but that's the hand that was dealt.
You don`t need 2 nodes for that, just additional include statements to send different files to different management classes.
2. we need to be able to run concurrent schedules, overlapping between the two, making it difficult to predict a startup window, and we don't want the second node's data to wait until the first has been completed.
The only way to run 2 concurrent schedules is with 2 nodes. However, when data is going to a single node, typically you go with only 1 schedule to backup everything, but there are exceptions.
What I was saying was that the first incr backup on the new node will have to do a first-time full, so
what if that doesn't complete before something gets deleted? How would you know that that file was backed up on a previous node? How would you track that later down the road if someone wanted to restore it? How would you know which node to pick or even that it ever existed on another node?
You don't. So on day 1 of the first backup with a new node, I would also at the same time do one last backup with the old node. That way you are covered. The only time you would need to look at both is if restoring from that day. A day later, new node, a day prior, old node.
 
PREDATAR Control23

Actually, you have a similar issue if someone creates and deletes a file between 2 backups.
 
PREDATAR Control23

You don`t need 2 nodes for that, just additional include statements to send different files to different management classes..

We're using domain statements to explicitly enumerate the file systems that we want backed up. There are a lot, so we're only going to add as many per night to the dsm.sys stanza as we're confident that we can complete in a night. This is easier for me and clearer than instead using excludes to filter out everything other than what we want on each night. Once the first-time full has run, it remains listed and we add the next one. Since the domain statements must be in dsm.sys and cannot be in an includeexlude file referenced in dsm.sys then, as an example, if /filesystem1 is to go to mgmtclassA and /filesytem2 is to go to mgmtclassB then something like this ???:

include /filesystem1/* mgtmclassA
domain /filesystem1
include /filesystem2/* mgtmclassA
domain /filesystem2


You don't. So on day 1 of the first backup with a new node, I would also at the same time do one last backup with the old node. That way you are covered. The only time you would need to look at both is if restoring from that day. A day later, new node, a day prior, old node.

I still don't follow. Yes, you will still have that last backup under the old node, but if it's a large file system then what if by the time the new node has reached a particular file, in its initial walk-through, that file has already been deleted from disk? As a result, the first-time backup to the new node will not have it. If later you need to restore that file then how will you know that it was ever backed on that other node? How would you know to go look there unless you made a note of it? In other words, if some user came to you and wanted you to restore /path/myfile.txt, and it never made it onto the backup for the new node then how would you know to restore it from the old node?

Clearly, this is a problem that exists with any backup software if data is later backed up by a different client or host, but I was looking at this from a situation wherein it's still on the same physical client, but just no longer handled by node 1 but instead by node 2 (both stanzas on the same client). But maybe I have it all wrong. Probably a good chance since I'm new to TSM. If not, is there a way in TSM that you can run a querry to see which nodes the database has for a given name space (path)? For example, which nodes has /filesystem_unique ever been backed up to?
 
PREDATAR Control23

I still don't follow. Yes, you will still have that last backup under the old node, but if it's a large file system then what if by the time the new node has reached a particular file, in its initial walk-through, that file has already been deleted from disk?
Make the last backup on the old node after the first backup on the new node. No gaps this way.
How would you know to go look there unless you made a note of it? In other words, if some user came to you and wanted you to restore /path/myfile.txt, and it never made it onto the backup for the new node then how would you know to restore it from the old node?
By the date the user gives you. You just need to remember the date of the cutover. And every day that goes by, the less likely there are going to be requests to restore files from the old backup.
 
PREDATAR Control23

For example, which nodes has /filesystem_unique ever been backed up to?
Filesystems are represented as filespaces on the server. QUERY FILESPACE gives you the list of all nodes and their filespaces.
 
PREDATAR Control23

Ah, now I'm there. That all makes sense. And the last piece of the puzzle for me was the 'q filespace * /filesytem' command. *IF* you've forgotten the earlier node, or you forgot the file system was ever handled by that node, then you could find it that way. That would clue you in that *maybe* it might be worth checking that other node.

Of course, if it was a common file system name like /home then that would be arduous since you wouldn't know which node to query for some file that was backed up on that other node and subsequently deleted long before you ever switched the file system to the new node (again, I'm coming at it in this particular case from the perspective of two or more stanzas, which would not be a standard scenario).
 
PREDATAR Control23

You still have not convinced me you need 2 different nodes. And with all our back and forth exchange yesterday, we just demonstrated downsides of using 2 nodes for a same system.

If you are trying to run more parallel backups by running 2 concurrent schedules, that's the wrong approach. I'm guessing that's why you asked this: https://adsm.org/forum/index.php?threads/question-on-parallel-streams.33043/

Explain to me again what's the problem you are trying to solve with 2 nodes?
 
PREDATAR Control23

Thank you for your patience with this mess. :) I can think of at least two problems that we're trying to solve with two nodes, and if this has already been debunked then allow me to explain further, so you can better understand the provenance of where I may have gone awry here.

PROBEM 1: We have 150+ file systems of archive data, all living on one system (one physical box, and lets call it systemA). We do not want to use the archive feature, however, because that takes a full each time. We only want ongoing incrementals. This data is to be written to its own set of tapes using a different management class, storage pool, copy group, etc. due to the fact that it requires different policies from the non-archive data. Collocation will not satisfy that requirement. Now, in regards to the benefits of collocation, we could use collocation by file space, but that would be ridiculous. Instead, we will collocate that by node or group. There might be another system in that same group, but it has little data. We then have 100+ file systems of non-archive data (also systemA) that likewise requires its own management class, storage pool, copy group, etc. We will collocate that by group as well. We also have a number of VMs. Each of these will be a separate node with one stanza, and they will use the same management class, etc. as the non-archive data on systemA. We will use collocation by group for those, so we might end up with two or three groups total for all systems, including systemA.

Now, it was my original understanding (way prior to this thread) that to keep the backups of systemA properly split between the two management classes and copy groups and such, that we needed two nodes and thus two stanzas wherein the archive stanza would specify the following two lines (I'm overriding the default behavior for directories and files):

DIRMC managementclass_archive
include * managementclass_archive
domain /some_filesystem1
domain /some_filesystem2
...
domain /some_filesystem150

And the non-archive stanza would look like this:

DIRMC managementclass_nonarchive
include * managementclass_nonarchive
domain /some_filesystem151
domain /some_filesystem152
...
domain /some_filesystem250

Something like that. BUT as you pointed out yesterday, while this will work, we could instead accomplish this by using just one stanza on systemA (one node) wherein instead of specifying only a 'domain /filesytem' line for each file system, we also specify the appropriate mangement class for that file system. So the single stanza would like something like this:

DIRMC managementclass_archive
include /some_filesystem1/* managementclass_archive
domain /some_filesystem1
include /some_filesystem2/* managementclass_archive
domain /some_filesystem2
include /some_filesystem3/* managementclass_nonarchive
domain /some_filesystem3
include /some_filesystem4/* managementclass_nonarchive
domain /some_filesystem4

and so and so forth. *HOWEVER*, it's not clear to me how to deal with 'DIRMC managementclass_archive' versus 'DIRMC managementclass_nonarchive'? Wouldn't these need to be in their own separate stanzas? Alternatively, if you just left that out altogether then it's going to assign the default behavior to directories wherein they will be assigned to the policy domain with the longest retention period. We have two different copy groups, policies and pools. How is that going to work with one stanza?

Problem 2: We *thought* that if we had only one node and one schedule that we would not be able to walk all of these file systems in a single night, never mind the backup time. More to the point, there's a limit to the number of objects that can be specified in a schedule, so we can't do it that way. We have to rely on the dsm.sys file to list these. So we thought that if we instead set up two dsmcad processes (which we did: one pointing to dsm.opt, and the other pointing to dsm_arc.opt; each references a different server; each server is listed in a separate stanza for the corresponding node) that we could then have a separate schedule for each to allow overlap so neither has to wait for the other to complete before it starts, and we don't have to deal with backup windows and all that.

At this time, we will be starting off *only* with the non-archive data, and the archive data will come later, so we're not backed into a corner yet. If there's a way to make this work with one stanza then I'm fine with it, and the archive data could be added to that later. I just need a way to ensure that the directories (files, too) for the archive file systems will NOT be assigned to the same management class as the non-archive.
 
PREDATAR Control23

I have not seen a compelling case yet for 2 nodes yet. Unless when you say archive, you mean that one schedule has action=incremental and the other has action=archive. It's not possible for a single schedules to have 2 different actions, and because 1 node cannot run 2 schedules at the same time, that would require 2 nodes. Could still use proxies so that ALL the data is still owned by one node, each proxy would run a schedule (https://www.ibm.com/support/knowledgecenter/SSEQVQ_8.1.8/client/t_bac_clientnodeproxy.html )

If that's not the case, it sounds like you are trying to get more parallelism from multiple schedules, I would instead increase the RESOURCEUTIL, it can go higher than 10, but you'd have to test going up slowly until you get to the point of diminishing return. That will enable you to process multiple filespaces in parallel, that will help with the overall time.


Off topic:
and so and so forth. *HOWEVER*, it's not clear to me how to deal with 'DIRMC managementclass_archive' versus 'DIRMC managementclass_nonarchive'?
I would not use DIRMC, there hasn't been a requirement to use that since earlier versions of ADSM. Directories are automatically bound to the management class with the longest retention. Historically, directories had to to be restored before files, so using a disk DIRMC was faster. But sometime in the 90s, that was changed that if the directory is not restored yet, instead of mounting a different tape for the directory, the client creates the directory instead, and later in the restore the directory is actually restored.
 
PREDATAR Control23

I have not seen a compelling case yet for 2 nodes yet. Unless when you say archive, you mean that one schedule has action=incremental and the other has action=archive. It's not possible for a single schedules to have 2 different actions, and because 1 node cannot run 2 schedules at the same time, that would require 2 nodes. Could still use proxies so that ALL the data is still owned by one node, each proxy would run a schedule (https://www.ibm.com/support/knowledgecenter/SSEQVQ_8.1.8/client/t_bac_clientnodeproxy.html ).

No, each schedule would be an incr, not archive. Where I said "archive" (just in case), I simply meant that the "archive" data had stricter requirements than the "non-archive" data, e.g. retention, etc. We wanted the capability to be able to run two schedules (one for archive, one for non-archive) at the same time, thus two nodes. Of course, the schedules would be started at different times, and in all probability, most of the time, the first would finish before the second started, and, yes, we would set a long enough startup window for the second to better allow for that. But due to the large number of file systems, and the time to walk these, I was concerned that in some cases, the first would not complete soon enough, and the second would be aborted, after its startup window had expired. Moreover, I didn't want to risk backups spilling over into the next day, as in late morning.

I didn't know anything about proxies before I set this up and asked all these questions. I will look into that. I was just going based on a non-proxy state of mind, and the two node method seemed the most logical approach since that's all I knew at the time.

I'm not aware that any proxies are being used on the TSM server at this time for other hosts that I'm not managing, but if, so, that would be interesting to view. How would I determine if any proxies are currently even configured on the backup server?

If that's not the case, it sounds like you are trying to get more parallelism from multiple schedules, I would instead increase the RESOURCEUTIL, it can go higher than 10, but you'd have to test going up slowly until you get to the point of diminishing return. That will enable you to process multiple filespaces in parallel, that will help with the overall time..

That was part of the madness, but as I said, it was also because I thought you had to do that in order to specify a different management class for one set of file systems versus another. I didn't realize that you could do it on a per file system basis, e.g. include /some_filesystem/* some_managementclass

What is the default value for the RESOURCEUTIL setting?

Also, I checked around, and I don't see any such setting in any dsm.sys file on any client (even ones I'm not managing). I even checked the client options file on the server, but that only has this:

Option Sequence number Use Option Set Value FORCE Option Value
SCHEDMODE 100 No Prompted
TXNBYTELIMIT 0 No 2097152

so it would appear that every node is using the default?

Off topic:

I would not use DIRMC, there hasn't been a requirement to use that since earlier versions of ADSM. Directories are automatically bound to the management class with the longest retention. Historically, directories had to to be restored before files, so using a disk DIRMC was faster. But sometime in the 90s, that was changed that if the directory is not restored yet, instead of mounting a different tape for the directory, the client creates the directory instead, and later in the restore the directory is actually restored.

Well, the reason that I used it is because I want the directories to be bound to the same management class as the files, not some other management class. I have seen cases on hosts that I'm not managing wherein the management class reported for the directories was 'management_classA' while the ones reported for the files were all 'DEFAULT', and management_classA did NOT have 'Default management class' set to 'Yes'. Instead, it was some other management class that was the default. That sounds like a possible problem. Perhaps, someone should change that.

BUT for us, they would be the same, so no, I would not have to use this option. Moreover, I wouldn't even need to use the option to force the files either, but I didn't like seeing 'DEFAULT' reported for the files since I found this confusing, so I forced the files using the single 'include * management_class' statement in the stanza. Again, though, for us, none of this was necessary from a functional perspective since the management class for the directories is the default, and the appropriate management class was reported for the directories. I guess I decided to use the DIRMC option because after I forced the files to be bound to a specific management class so they would report as such (yes, it was the same as the DEFAULT), I then decided for consistency to do likewise for the directories even though the management class that was reported prior to this change was the same.

The IBM documentation said: "the server chooses the management class with the longest retention period in the backup copy group (retention period for the only backup version). When two or more management classes have the same, "longest" retention period, the Tivoli® Storage Manager client selects the management class whose name is last in alphabetical order."

I didn't want it to select the wrong one.

Regardless, I'm still not clear on how you're going to ensure that the correct management class is being used for the directories if you have everything in a single stanza and you have two different management classes, pools and copy groups for the data? For the files, yes, since you can specify 'include /filesystem/* management_class_name'. But for the directories? What would prevent it from choosing the wrong one when you want one for one set of file systems and a different one for the others? Could you provide an example of that?
 
PREDATAR Control23

BTW, there's nothing inherently wrong with using multiple nodes
What is the default value for the RESOURCEUTIL setting?
2 meaning there's 2 threads, a producer that queries the server and scans the filesystem and a consumer that actually moves the data. The producer scans 1 filesystem at a time. So the higher the value, the more producers and consumers you have, the more parallel work.
But for the directories? What would prevent it from choosing the wrong one when you want one for one set of file systems and a different one for the others? Could you provide an example of that?
Most people just don't bother with DIRMC, it falls where it falls and because it's bound to the management class with the longest retention, won't expire before the files. Just make sure that the management class with the longest retention is not using slow storage compared to the rest of the data.
 
PREDATAR Control23

I'm not aware that any proxies are being used on the TSM server at this time for other hosts that I'm not managing, but if, so, that would be interesting to view. How would I determine if any proxies are currently even configured on the backup server?
QUERY PROXY

Most people that use proxy nodes though use multiple machines to spread the workload. So you could have a large file server, and multiple machines mounting different shares and backing it up as proxies. So you have more horsepower because you have more machines.

If on a single machine, you do more concurrent work by increasing the resourceutil. Using proxies would be a good case if you need to run two different backup schedules simultaneously for the same node. So you have 2 proxies that each run a schedule for the same node. So the node still owns the data.
 
PREDATAR Control23

BTW, there's nothing inherently wrong with using multiple nodes

Okay. I didn't think so, but again, I wasn't planning to get too carried away with it and create a lot of them, probably just the two. I'll take a look at proxies, though, and see what I can learn.

2 meaning there's 2 threads, a producer that queries the server and scans the filesystem and a consumer that actually moves the data. The producer scans 1 filesystem at a time. So the higher the value, the more producers and consumers you have, the more parallel work.

Then I guess all the nodes are using the default since I see no evidence to the contrary, i.e. no mention of that option in any dsm.sys file.

If you change it to 4, would it still send two file systems to the same drive? Or would it be one per drive and thus force another tape to be loaded?

Most people just don't bother with DIRMC, it falls where it falls and because it's bound to the management class with the longest retention, won't expire before the files.

Maybe I'm misunderstanding, but I see a problem here. Suppose the copy group for that had something like: 10 10 50 365 for 'Versions data exists', 'versions data deleted', 'retain extra versions' and 'retain only versions' respectively and the management class for that was MC_AB. Let's say another copy group has 50 50 50 365, but it's called MC_A. They both have 365 for 'retain only versions, but MC_AB sorts last alphabetically, so that's the one that the directories will be bound to if you don't use DIRMC to otherwise force it to use MC_A.

That could be bad, right, since it will mean different retention times for the first two of the four versions values. If your files are bound correctly to the right management class, that doesn't mean the directories are. For our stuff, based on what I'm seeing, if we go with the default all is well. But when looking at some other nodes, they're reporting MC_AB for the directories and DEFAULT for the files, but MC_AB is not the default. Instead, it's MC_A. Sounds like they have different values then for the first two of the four versions values for the directories than what they have for the files. Am I missing something here?

. Just make sure that the management class with the longest retention is not using slow storage compared to the rest of the data.

Ah, yes. Good point. I think it's the same for all of our stuff regardless.
 
Top