How to get rid of pesky exit 4 on log files?

foobar2devnull

ADSM.ORG Member
Joined
Nov 30, 2010
Messages
122
Reaction score
1
Points
0
Location
Belgium
Hi guys,

I'm trying to clean things up a bit and I am seeing a large amount of systems "failing" with an exit code of 4 (ANS4037E). This is usualy on '/var/log' as one would expect. I thought I'd solved this by creating a dynamic copy pool and then creating a clopt file on the TSM server that would force all '/var/log' files to be backed up using the mentioned copy pool as you can see bellow.


tsm: TSMINST1>q copy os active PF01_62DY f=d

Policy Domain Name: OS
Policy Set Name: ACTIVE
Mgmt Class Name: PF01_62DY
Copy Group Name: STANDARD
Copy Group Type: Backup
Versions Data Exists: No Limit
Versions Data Deleted: 0
Retain Extra Versions: 62
Retain Only Version: 0
Copy Mode: Modified
Copy Serialization: Dynamic
Copy Frequency: 0
Copy Destination: PF01
Table of Contents (TOC) Destination:
Last Update by (administrator): JDOE
Last Update Date/Time: 02/15/13 10:37:15
Managing profile:
Changes Pending: No​


tsm: TSMINST1>q clopt linux

Optionset Description Last Update by Managing profile Replica Option
(administrator) Set
------------------------- ------------------------- --------------- -------------------- ---------------
LINUX General Linux definitions JDOE No

Option Sequence Use Option Option Value
number Set Value
(FORCE)
------------------------- -------- ---------- ------------------------------------------------------------
ARCHSYMLINKASFILE 110 Yes YES
COMPRESSION 100 Yes No
DIRMC 120 No PF01_31SS
DOMAIN 130 Yes ALL-LOCAL
INCLEXCL 200 No Include '/var/log/.../* PF01_62DY'
INCLEXCL 210 No Exclude.dir '/proc'
INCLEXCL 220 No Exclude.dir '/selinuxc'
INCLEXCL 230 No Exclude.dir '/sys'
INCLEXCL 240 No Exclude.dir '/.../tmp'
INCLEXCL 250 No Exclude.dir '/mnt'
INCLEXCL 260 No Exclude.dir '/media'
INCLEXCL 270 No Exclude.dir '/dev'
[...]​

Sadly, I still get this on most of my linux boxes:

11/13/2016 22:08:48 ANS1228E Sending of object '/var/log/foo/bar.log' failed.
11/13/2016 22:08:48 ANS4037E Object '/var/log/foo/bar.log' changed during processing. Object skipped.

Is there something I am doing that is wrong? How can I get rid of these errors and finish with an exit code of 0?

My servers are running on 7.1.5.200 and the clients are between 6.4 to 7.1.6 and all show the same symptoms.

Thanks for your help!
 
This is usualy on '/var/log' as one would expect. I thought I'd solved this by creating a dynamic copy pool and then creating a clopt file on the TSM server that would force all '/var/log' files to be backed up using the mentioned copy pool as you can see bellow.
The copypool doesn't make any difference here (neither does the primary for that matter).

I see that the backup copygroup "Copy Serialization: is set to Dynamic. But the backup doesn't fail because the file is opened, but because it changed. If the file is changing while TSM is copying it, it will fail. You may want to try "SHRDYnamic" instead so that is retries a few times.
 
Thanks for your answer.

I thought files such as logs where considered to be open and that the Dynamic serialization meant it would backup the file as is. i did not realise it would throw an error if the file changed, I thought it would just back it up as is.

Does this mean log files and any other file that keeps getting accessed on a regular base are prone to throwing an error or is there a better way to backup files like logs?

I mentioned the copy pool because it is referenced in the clopt for /var/log and the linux boxes use the linux copy pool.
 
NetBackup is not logging those messages; xinetd is. You will need to adjust the logging levels of xinetd. It's a Linux configuration, not a NetBackup one.

It's probably not a new feature, but more likely a different configuration of syslog between the two versions.
 
Does this mean log files and any other file that keeps getting accessed on a regular base are prone to throwing an error or is there a better way to backup files like logs?
It's not that they are accessed or open, it's that the file changed while it's being backed up. Let's say the file is 10MB in size, if the file is opened, but there are no changes while it's being backed, then it will succeed with dynamic. However, if TSM starts to read the file from disk to send it to the server, but after reading 4MB, the file changes, that's when the warning will be issued. SSHRDYnamic could potentially help here because it will retry, so there is a chance that in one of the retries, the file will not change.
 
Thank you, it's clear now. What I do not understand is how you can manage your systems if they keep throwing error 4. Is there no way for me to say "Backup the file regardless of the change" so I get an exit 0 unless it was unable to back the file up at all, then send me exit code X? Any log file that is heavily used (tsmsched.log, ...) will always throw an errror.

I would like to make a difference between a log file and a database file (bad example, I know) so that an exit code 4 prompts me to investigate, not ignore it all together because all boxes have the same "issue".
 
What I do not understand is how you can manage your systems if they keep throwing error 4.
Most people don't consider error 4 a failure because it's usually a different set of files that are opened everyday and skipped, but there is usually a good backup at some point. That's usually after reviewing the patterns though.

Is there no way for me to say "Backup the file regardless of the change" so I get an exit 0 unless it was unable to back the file up at all, then send me exit code X?
That's the problem, it can't. What happen is the exact same thing as if the application that has the file open is making a change to it while you are in the middle of copying it, the copy will fail. Unix/Linux doesn't have open file support like Windows does.
I would like to make a difference between a log file and a database file (bad example, I know) so that an exit code 4 prompts me to investigate, not ignore it all together because all boxes have the same "issue".
If the same log files fail every day (even after you try SHRDynamic), then you may as well exclude them.

For databases, since a dynamic backup creates a fuzzy backup, meaning that you would capture what's on disk, but not what's in memory and not committed to the DB. Plus some database files are in a state when opened that they are not reusable if restored in that same state. You are better off to do an online backup of the database either using TDP if available and if not, an online database backup using the database tools to a file, then backup that file instead of the actual DB. If neither are an option, then you should do an offline backup by using a Preschedulecmd/Prenschedulecmd to quiesce the database before the backup(pre) and start it again after (post).
 
Thanks merclant... I did say that the database was a bad example for those exact reasons! ;)

I will try and reduce the "fail" count by implementing SHRD. The problem is that I implemented the OC to give a graphical view of the backups and now people are in arms because 35 boxes get a non exit 0... Oh well, my silly fault.

Thanks a lot for your help in clarifying Dynamic and SHRD!
 
The problem is that I implemented the OC to give a graphical view of the backups and now people are in arms because 35 boxes get a non exit 0... Oh well, my silly fault.
Are they looking at the "at-risk" or the actual schedule result? If the "at-risk", for less critical nodes, you can increase the "at-risk" criteria to 2 or 3 days. So if the backup is successful most of the time, it will not show up at risk unless if returns non-zero 2 or 3 days in a row. Leave the critical nodes to 1 day because if they fail one day, it's important to know.
 
Sorry for the late reply. It's the "at-risk". The OC generates a general report which I forward. I tried the SHRD with no improvement. Next idea is simply to not backup /var/log on the servers seeing as how we forward them to a log server. Of course, should the forwarding fail, we'd be left with nada.
 
Back
Top