ADSM-L

SQL Backtrack and ADSM - hanging problem

1999-04-19 15:49:42
Subject: SQL Backtrack and ADSM - hanging problem
From: "van Roosmalen, Naomi" <naomi.vanroosmalen AT NBTEL.NB DOT CA>
Date: Mon, 19 Apr 1999 16:49:42 -0300
Hi everyone,
I have a very strange problem that has been going on for far too long now,
and am hoping someone may have a solution for me.

Configuration:

Server:
ADSM server 3.1.1.5, running on AIX 4.2.1

Client:
Sun Solaris 2.5.1
ADSM client 2.1.0.? (Not sure if it was 2.1.0.7)
Sybase database version 11.0.3
SQL BackTrack 3.1.1

There is a dedicated 100 Mbps ethernet connection between the two.

Sequence of events:
1. For year 2000 compliance, and to keep up to date on our software, it was
decided it was time to upgrade the ADSM client to version 3, and SQL BT to
version 4.0.50.

2. The ADSM client is upgraded to version 3.1.0.6.

3. The backup of the databases fails, but it turns out that the sys admin by
accident did not use the original dsm.sys file, which meant ADSM was using
the wrong ethernet interface.

4. Original dsm.sys file is restored.

5. Backup seems to run ok, but then starts to not complete on some days.
After the backup should have long been done, SQL BT processes are still
showing up in the ps -ef output. There are no error codes in any log files,
no messages indicating why the backup hung. Sybase logs, ADSM logs, SQL BT
logs are checked, but nothing unusual is showing up. dtwatch does not
provide any insight either, it just shows a current status, which does not
change.

6. A week and a half later the sys admin notifies me that SQL BT is hogging
semaphores, and that the system has run out. The number of semaphores is
doubled, and the system is rebooted. The first few backups work, but then
backups start hanging again.

7. Because this did not help, SQL BT is upgraded from version 3.1.1 to
version 4.0.50. This does not help, backups continue to hang.

I don't remember if point 6 happened before point 7, those two could be
reversed.

8. Last week I downgraded the ADSM client from v 3.1.0.6 to 3.1.0.5. This
did not change anything.

The difficulty with these hangs is that there are no messages in any log
files indicating what the problem is. All reporting simply stops. I have run
the backup with the -debug option and the -query option, but this did not
provide BMC support with any insight.

The backups will partially complete, but then stop. There does not seem to
be a pattern in where the backup would stop. Some days it would complete one
database, other days it would do 3 databases, and some days it would
complete everything.

I have tried everything I can think, and looked at everything I can think
of: network traffic, Sybase logs, SQL BT logs, ADSM server logs (no entries
show there either), ADSM client logs. I have asked various people who work
with this system if anything had changed in its environment. No changes have
been reported.

It somehow looks like it is semaphore related, but increasing the number of
semaphores did not help. Eventually all of them end up in use anyway (we
know this because there is another application that is dependent on
available semaphores, and this application stops working when there are none
left).

Has anyone had a similar situation and what did you do to resolve it?

Thanks,
Naomi van Roosmalen
NBTel Inc.
<Prev in Thread] Current Thread [Next in Thread>