channdeep
Newcomer
- Joined
- Apr 5, 2018
- Messages
- 1
- Reaction score
- 0
- Points
- 0
Dear Community,
This is my first post, and sorry to bother with a long post ! I am pretty much a newbie in TSM, so kindly forgive for any obvious mistake/ wrong understandings. I mostly use the OC GUI for administration and monitoring the TSM (I use CLI for some basic tasks only).
My objective is that I want my TSM OC (and Daily Protection Report) to show all the TSM nodes in green color and with zero "At Risk" nodes. I have already worked with respective server owners for few nodes which needed complete decommissioning, or some OPT file reconfiguration etc, and they have been streamlined. Now, I am still left, with few nodes intermittently, and few nodes permanently, falling in the "At Risk" category.
The challenge is that I see somewhat inconsistent behavior of TSM engine to treat a node as "At Risk". I have this feeling because, for a same error and warning message - the TSM shows some nodes as good green; whereas shows some node in "At Risk". I observed the server logs in OC for few days in a row, but unable to make a conclusion - and request your help.
Earlier, my understanding was that this was due to the open files no node, which TSM will treat as "At Risk" as unable to copy them due to unsaved changes etc - but this understanding seems wrong, or rather I am still confused. Below are some of my notes and my current understandings/ observations:-
-------
a. “At Risk” category nodes are different than “Warning” category nodes.
b. It is not correct to consider that every “Warning” will be treated in “At Risk” category.
c. Now, for producing the message “the object is in use by another process” – PFB the logs from SERVER_1 where I intentionally kept a file opened with unsaved changes - “Testing_TSM_Open_File_Copy.txt”. Still, the logs show it as [Sent]. This was without configuring the option - “Open File support”. Still, the node appears as good green in the report.
i. 05/04/2018 21:12:32 Normal File--> 26 \\ SERVER_1\e$\Backups \Testing_TSM_Open_File_Copy.txt [Sent]
d. Also, noticed today in few other random windows servers (SERVER_2 and SERVER_3) with same warning messages in server logs – still both appear as good green in Daily Protection Report/ OC. Hence, looks like that we should not do too much effort for this specific message of “the object is in use by another process”. What say?
e. It is possible that, on some day, one specific node has both: 1) “At Risk” error 2) “Warning” message – however, it falls in “At Risk” category only because of “At Risk” error.
f. And, now, we believe that only focus is needed on real errors, which do make the node fall in the category of “At Risk”; and makes our daily report dirty, which when observed now for few days, are due to below errors generally:
i. file not found
ii. Object changed during processing. Object skipped.
iii. file is temporarily unavailable
iv. Object name '/backup1//characterizations/20117 - PSC-1pct Pt-TiSi-H2Z2-048811, tørret prøve.pdf' contains one or more unrecognized characters and is not valid.
v. Node not communicating with TSM at all.
-------
PFA the screenshot for reference that how I see the server logs view.
Then, I further make more research to understand by categorising different errors as below (Server names changed, but kept a copy at my end of actual names for correlation later):
------
All error nodes in the server logs are:-
1) ANE4037E: Object changed during processing. Object skipped.
Server_C
Server_E
2) ANE4008E: file is temporarily unavailable
Server_F
3) ANE4987E: the object is in use by another process
Server_G
Server_D
4) ANE4005E: file not found
Server_A
5) ANE4042E: file name contains unrecognized characters and is not valid
Server_B
Now, the nodes shown as "At Risk":-
Server_A
Server_B
Server_C
Server_D
So, again I get confused that:-
1) When Server_C appears as "At Risk" for error ANE4037E, then why not Server_E appears same (or vice-versa)?
2) When Server_D appears as "At Risk" for error ANE4987E, then why not Server_G appears same (or vice-versa)?
3) Why Server_F not shown as "At Risk"?
Because of this, I am unable to categorise, and make myself understand that, ok channdeep, just treat these error codes as "At Risk", and these error codes as NOT "At Risk".
------
Many thanks in advance for any comments/ guidance.
Best regards,
channdeep.
This is my first post, and sorry to bother with a long post ! I am pretty much a newbie in TSM, so kindly forgive for any obvious mistake/ wrong understandings. I mostly use the OC GUI for administration and monitoring the TSM (I use CLI for some basic tasks only).
My objective is that I want my TSM OC (and Daily Protection Report) to show all the TSM nodes in green color and with zero "At Risk" nodes. I have already worked with respective server owners for few nodes which needed complete decommissioning, or some OPT file reconfiguration etc, and they have been streamlined. Now, I am still left, with few nodes intermittently, and few nodes permanently, falling in the "At Risk" category.
The challenge is that I see somewhat inconsistent behavior of TSM engine to treat a node as "At Risk". I have this feeling because, for a same error and warning message - the TSM shows some nodes as good green; whereas shows some node in "At Risk". I observed the server logs in OC for few days in a row, but unable to make a conclusion - and request your help.
Earlier, my understanding was that this was due to the open files no node, which TSM will treat as "At Risk" as unable to copy them due to unsaved changes etc - but this understanding seems wrong, or rather I am still confused. Below are some of my notes and my current understandings/ observations:-
-------
a. “At Risk” category nodes are different than “Warning” category nodes.
b. It is not correct to consider that every “Warning” will be treated in “At Risk” category.
c. Now, for producing the message “the object is in use by another process” – PFB the logs from SERVER_1 where I intentionally kept a file opened with unsaved changes - “Testing_TSM_Open_File_Copy.txt”. Still, the logs show it as [Sent]. This was without configuring the option - “Open File support”. Still, the node appears as good green in the report.
i. 05/04/2018 21:12:32 Normal File--> 26 \\ SERVER_1\e$\Backups \Testing_TSM_Open_File_Copy.txt [Sent]
d. Also, noticed today in few other random windows servers (SERVER_2 and SERVER_3) with same warning messages in server logs – still both appear as good green in Daily Protection Report/ OC. Hence, looks like that we should not do too much effort for this specific message of “the object is in use by another process”. What say?
e. It is possible that, on some day, one specific node has both: 1) “At Risk” error 2) “Warning” message – however, it falls in “At Risk” category only because of “At Risk” error.
f. And, now, we believe that only focus is needed on real errors, which do make the node fall in the category of “At Risk”; and makes our daily report dirty, which when observed now for few days, are due to below errors generally:
i. file not found
ii. Object changed during processing. Object skipped.
iii. file is temporarily unavailable
iv. Object name '/backup1//characterizations/20117 - PSC-1pct Pt-TiSi-H2Z2-048811, tørret prøve.pdf' contains one or more unrecognized characters and is not valid.
v. Node not communicating with TSM at all.
-------
PFA the screenshot for reference that how I see the server logs view.
Then, I further make more research to understand by categorising different errors as below (Server names changed, but kept a copy at my end of actual names for correlation later):
------
All error nodes in the server logs are:-
1) ANE4037E: Object changed during processing. Object skipped.
Server_C
Server_E
2) ANE4008E: file is temporarily unavailable
Server_F
3) ANE4987E: the object is in use by another process
Server_G
Server_D
4) ANE4005E: file not found
Server_A
5) ANE4042E: file name contains unrecognized characters and is not valid
Server_B
Now, the nodes shown as "At Risk":-
Server_A
Server_B
Server_C
Server_D
So, again I get confused that:-
1) When Server_C appears as "At Risk" for error ANE4037E, then why not Server_E appears same (or vice-versa)?
2) When Server_D appears as "At Risk" for error ANE4987E, then why not Server_G appears same (or vice-versa)?
3) Why Server_F not shown as "At Risk"?
Because of this, I am unable to categorise, and make myself understand that, ok channdeep, just treat these error codes as "At Risk", and these error codes as NOT "At Risk".
------
Many thanks in advance for any comments/ guidance.
Best regards,
channdeep.