ADSM-L

[no subject]

2015-10-04 17:47:37

What types of problems have others encountered with OnBar backups and restores? I am trying to prove that OnBar/ADSM combination is reliable by performing multiple restores. I am working with the following platform:

IDS 7.24.UC1
AIX 4.3.2
ADSM Server Version 3, Release 1, Level 1.5
ADSM Client Version 2, Release 1, Level 0.6

The current testing has resulted in 3 problems that might be of
interest for OnBar users. Two of the 3 problems have work a around.

1. When performing backups I would receive the following error message
multiple times in the bar_act.log:
1999-01-19 21:49:32 43090  16884 ERROR: Unable to open connection to
server: could not fork server connection.
1999-01-19 21:49:32 16884  12948 The ON-Bar process 43090 exited with
a problem (exit code 130 (0x82), signal -1).
This problem was caused by the BAR_MAX_BACKUP variable being set to 0
or 'unlimited'. I was able to determine through testing that a value
of 10 worked. The limitation was not at the ADSM server, but appeared
to be an IDS/ADSM XBSA API limitation. I was able to run 2 backups for
2 instances on the same box. These 2 instances generated 20 threads or
connections to ADSM. The WATCH OUT here is that onbar -b DOES NOT
RETURN AN ERROR CODE != 0 when these CRITICAL ERRORS occur. It has
been suggested that I increase the BAR_RETRY variable, but I assume
failures would be still be logged making automated failure
notification difficult.

2. When performing a restore that requires a log to be salvaged, onbar
writes 2 entries into the ixbar file. The first entry in the file is
for a failed log backup while the second entry is for a successful
backup. If a second restore is performed after the completion of the
first, oncheck -ce indicates overlapping extents. Removing this failed
entry appears to be a correct work around. Below are examples of the
salvaged log  entries:
 csadmshm 404 L 0 866 0 0 20451203 1999-02-09 21:13:43 1
 csadmshm 405 L 0 0 0 0 20451205 1999-02-09 21:14:51 1 FAILED BACKUP
 csadmshm 405 L 0 871 0 0 20451205 1999-02-09 23:28:06 1 SUCCESSFUL

3. This issue is not resolved yet. A high percentage of the time
oncheck -cr indicates address and page errors with the log dbspace
after a restore.
Log page error: invalid address.  Log number 115, addr 0
Log page error: invalid page type.  Log number 115, addr 0
Log page error: invalid address.  Log number 0, addr 0
Log page error: invalid page type.  Log number 0, addr 0
I have been able to prove that all the appropriate data changes are in
the database after the restore. I have also been able to eliminate
this error in log file 115 by cycling a new log into log file 115. The
error for log number 0 was removed by dropping the appropriate log
file. The oncfg.<server> file gave me clues as to which log file(s)
might have a problem. Tech support is trying to reproduce this
problem. I am trying to determine how critical this is.

What kind of restore testing has been performed at other sites? I have
performed multiple test scenarios on 2 different servers to get this
far. My test scenario consists of 1 Level 0 and 2 Level 1's. Three log
files separate the archives. Each log file contains at least 1
transaction and 1 checkpoint. Six restores are performed back to back
in the scenario. The first restore point is the highest logid. Each
successive restore point is to a lower logid. The last restore point
is the logid where the level 0 archive ended.

Any comments or information would be appreciated,
Doug Hayes


Doug Hayes
dhayes AT senco DOT com
DBA, Senco Fastening Systems

------_=_NextPart_001_01BE569C.0A5608D0-- =======================================================================
<Prev in Thread] Current Thread [Next in Thread>
  • [no subject], Unknown <=