False "At Risk" Reporting due to ANE4037E Error

JasonA1969

Active Newcomer
Joined
Jan 20, 2016
Messages
6
Reaction score
0
Points
0
PREDATAR Control23

Hello,
Without using Bypass, I would like to fix my issue so the Executive Summary report does not reflect nodes as "At Risk" when ANE4037E is the only error.
I have up to 10 nodes that will report as "At Risk: Data" A backup completed with skipped files or failures during the configured status interval (by default, 24 hours). I have the risk interval custom set to 3 days.
It is always a subset of the 10 nodes alerting even though I have other nodes registered the same way, running the same version of SP, and giving the same ANE4037E error. Copy Serialization is set the same for all nodes as well (Shared Static).

An example of two of the issues:
05/02/17 22:14:41 ANE4037E (Session: 44030, Node: SURPRISE) Object
'/rhnsat/pgsql/data/base/16384/628831' changed during
processing. Object skipped. (SESSION: 44030)
05/02/17 22:16:55 ANE4037E (Session: 44037, Node: LTC-RHN72) Object
'/var/lib/mongodb/journal/j._6' changed during
processing. Object skipped. (SESSION: 44037)

Thanks in advance for any help!
Jason
 
PREDATAR Control23

Without using Bypass, I would like to fix my issue so the Executive Summary report does not reflect nodes as "At Risk" when ANE4037E is the only error.
You can't change the report based on the type of error you get, you have to instead fix what is causing the error. In that regards, here's a few suggetions:
- increase the number of CHANGERETRIES on the client
- if the files in question change constantly, you may need to switch to shared dynamic in order to get a backup
- if the files change constantly and you can never get a backup, may as well exclude them
- depending on what application these files belong to, try to determine if there is a better time to back them up and adjust the backup schedule accordingly
- if possible, run a preschedule command to quiesce the application that modify those files and a postschedule command to restart the application.
 
PREDATAR Control23

'/rhnsat/pgsql/data/base/16384/628831'
By the name, this file appears to belong to a database product. See if you can backup the DB from within the application to a flat file. Exclude the database directory and include the flat file in your backup.
 
PREDATAR Control23

Outside of fixing the underlying problem, which may or may not be possible...

If you go into operations center settings, there is a check box for "Consider warnings and skipped files as "at risk" conditions", if you uncheck that they will go to a "warning" state and not "at risk". I had the same problem and this is the way to get that report out of the red.
 
PREDATAR Control23

I already have unchecked box "Consider warnings and skipped files as "at risk" conditions". Skipped files should not make the backup show as at risk when the Ops Center box is correctly unchecked. The backups look to be completing with errors. Other nodes completing with errors are not showing as at risk. CHANGERETRIES should be 4 by default, right? So, I am looking at moving the backup window. Why are only some nodes reporting as at risk?

Here is a node with 6 files that changed during backup, but the backup looks like it completed successfully.
05/03/17 22:21:09 ANE4037E (Session: 47211, Node: SURPRISE) Object
'/rhnsat/pgsql/data/base/16384/628930.2' changed during
processing. Object skipped. (SESSION: 47211)
05/03/17 22:35:26 ANE4037E (Session: 47211, Node: SURPRISE) Object
'/rhnsat/pgsql/data/base/16384/630271.1' changed during
processing. Object skipped. (SESSION: 47211)
05/03/17 22:38:39 ANR0403I Session 47211 ended for node SURPRISE (Linux
x86-64). (SESSION: 47211)
05/03/17 22:39:20 ANR0406I Session 47287 started for node SURPRISE (Linux
x86-64) (Tcp/Ip surprise.raleigh.ibm.com(48899)). (SESSION: 47287)
05/03/17 22:47:54 ANE4037E (Session: 47287, Node: SURPRISE) Object
'/var/log/httpd/access_log' changed during processing.
Object skipped. (SESSION: 47287)
05/03/17 22:48:04 ANE4037E (Session: 47219, Node: SURPRISE) Object
'/var/log/httpd/ssl_access_log' changed during
processing. Object skipped. (SESSION: 47219)
05/03/17 22:48:16 ANE4037E (Session: 47287, Node: SURPRISE) Object
'/var/log/httpd/ssl_request_log' changed during
processing. Object skipped. (SESSION: 47287)
05/03/17 22:48:18 ANE4037E (Session: 47219, Node: SURPRISE) Object
'/var/log/rhn/rhn_server_xmlrpc.log' changed during
processing. Object skipped. (SESSION: 47219)
05/03/17 22:48:24 ANR0403I Session 47206 ended for node SURPRISE (Linux
x86-64). (SESSION: 47206)
05/03/17 22:48:38 ANR0406I Session 47299 started for node SURPRISE (Linux
x86-64) (Tcp/Ip surprise.raleigh.ibm.com(48941)).
(SESSION: 47299)
05/03/17 22:48:43 ANR0403I Session 47299 ended for node SURPRISE (Linux
x86-64). (SESSION: 47299)
05/03/17 22:52:56 ANR0403I Session 47287 ended for node SURPRISE (Linux
x86-64). (SESSION: 47287)
05/03/17 22:53:06 ANR0403I Session 47219 ended for node SURPRISE (Linux
x86-64). (SESSION: 47219)
05/03/17 23:16:39 ANR0406I Session 47355 started for node SURPRISE (Linux
x86-64) (Tcp/Ip surprise.raleigh.ibm.com(49064)).
(SESSION: 47355)
05/03/17 23:16:40 ANR0403I Session 47355 ended for node SURPRISE (Linux
x86-64). (SESSION: 47355)
05/03/17 23:16:40 ANE4952I (Session: 47200, Node: SURPRISE) Total number
of objects inspected: 2,817,316 (SESSION: 47200)
05/03/17 23:16:40 ANE4954I (Session: 47200, Node: SURPRISE) Total number
of objects backed up: 286,172 (SESSION: 47200)
05/03/17 23:16:40 ANE4958I (Session: 47200, Node: SURPRISE) Total number
of objects updated: 3 (SESSION: 47200)
05/03/17 23:16:40 ANE4960I (Session: 47200, Node: SURPRISE) Total number
of objects rebound: 0 (SESSION: 47200)
05/03/17 23:16:40 ANE4957I (Session: 47200, Node: SURPRISE) Total number
of objects deleted: 0 (SESSION: 47200)
05/03/17 23:16:40 ANE4970I (Session: 47200, Node: SURPRISE) Total number
of objects expired: 2,000 (SESSION: 47200)
05/03/17 23:16:40 ANE4959I (Session: 47200, Node: SURPRISE) Total number
of objects failed: 6 (SESSION: 47200)
05/03/17 23:16:40 ANE4977I (Session: 47200, Node: SURPRISE) Total number
of bytes inspected: 2.61 TB (SESSION: 47200)
05/03/17 23:16:40 ANE4961I (Session: 47200, Node: SURPRISE) Total number
of bytes transferred: 58.06 GB (SESSION: 47200)
05/03/17 23:16:40 ANE4963I (Session: 47200, Node: SURPRISE) Data transfer
time: 458.11 sec (SESSION: 47200)
05/03/17 23:16:40 ANE4966I (Session: 47200, Node: SURPRISE) Network data
transfer rate: 132,905.16 KB/sec (SESSION:
47200)
05/03/17 23:16:40 ANE4967I (Session: 47200, Node: SURPRISE) Aggregate data
transfer rate: 15,232.28 KB/sec (SESSION: 47200)
05/03/17 23:16:40 ANE4968I (Session: 47200, Node: SURPRISE) Objects
compressed by: 56% (SESSION:
47200)
05/03/17 23:16:40 ANE4976I (Session: 47200, Node: SURPRISE) Total data
reduction ratio: 97.84% (SESSION: 47200)
05/03/17 23:16:40 ANE4964I (Session: 47200, Node: SURPRISE) Elapsed
processing time: 01:06:37 (SESSION: 47200)
05/03/17 23:16:40 ANR0403I Session 47200 ended for node SURPRISE (Linux
x86-64). (SESSION: 47200)
 
Top