ADSM-L

[no subject]

2015-10-04 17:48:52

This one drove us nuts also.  We experienced the same problems on a Solaris
system.  We modified the semaphore system statements (they were a little
off), we tried changing the ADSM client parms, we looked at and verified
system kernel stuff (max_uprocs, etc.).  Nothing seemed to fix the problem.
I ran a truss against the ON-Bar process and handed it off to Informix.
The only thing that fixed it was Informix personnel "cleaning up" the
database after it had been corrupted by ON-Bar.  Do not ask me how a XBSA
backup system can mess up a database.

The initial cause in our case was a log being dumped via the alert script
to a different client nodename (we have HA boxes with multiple client nodes
on each box). I identified that problem and they changed the alert script,
but the damage was done: 1 log was not available.  They were testing
recovery of their system, and after they attempted that restore (which
failed because they could not rollforward that missing log), the recovery
and backup process would not function.  Could not fork!  Even the truss
didn't really show what was failing in that regard.  VERY frustrating.

We have not had problems with waiting for resource.  One of our projects
that heavily use ON-Bar/ADSM/Informix has  approx 7 machines in 3 clusters.
They kick off ON-Bar backups, sometimes at the same time.  We only have 4
drives on the server side so resource contention and wait periods happen
all the time.  But it doesn't seem to adversely affect the backups.

David Hendrix
dmhendri AT fedex DOT com





"Hayes, Doug" <DHayes AT SENCO DOT COM> on 01/25/99 02:25:41 PM

Please respond to "ADSM: Dist Stor Manager" <ADSM-L AT VM.MARIST DOT EDU>



 To:      ADSM-L AT VM.MARIST DOT EDU

 cc:      (bcc: DAVID HENDRIX/216832/ITD/FEDEX)



 Subject: Restore Errors with Informix OnBar









Platform Versions:
IDS 7.24.UC1
AIX 4.3.2
ADSM Server Version 3, Release 1, Level 1.5
ADSM Client Version 2, Release 1, Level 0.6

I have been performing backup and recovery testing under various scenarios
with unpredictable results using the above configuration. I have been
performing onbar -b -L 0 and onbar -b -L 1 backups while using onbar -r -n
<logid> for my restore. I performed my first restore and several dbSpaces
were not restored (I did not have entries in either the online.log or
bar_act.log for these dbSpaces). The dbSpaces were marked down after the
restore. I was told by Informix support that the following error messages
in
my bar_act.log indicated a CRITICAL failure during my backup:

1999-01-19 21:49:32 43090  16884 ERROR: Unable to open connection to
server:
could not fork server connection.
1999-01-19 21:49:32 16884  12948 The ON-Bar process 43090 exited with a
problem (exit code 130 (0x82), signal -1).

These critical failures created a corrupt backup for the entire server. To
complicate matters the onbar command issued (onbar -b -L 1) returned with a
return code of 0. I was told this was to be expected!

Has anyone else experienced these types of problems? What did you do to
work
around the issue?

I also find it unusual that onbar will not wait for additional ADSM server
resources to come available. I was told to control this by lowering the
BAR_MAX_BACKUP onconfig parameter. I have found this to be difficult as the
number of available ADSM resources can change from time to time. Does
anyone
have any ideas for this issue?

Thanks In Advance,
Doug Hayes
DBA, Senco Products Inc
Dhayes AT senco DOT com






<Prev in Thread] Current Thread [Next in Thread>
  • [no subject], Unknown <=