Hi,
I just upgraded a Solaris 8 Master/Media server from 4.5FP3 to 4.5FP6
last week and have been struggling with bpsched core dumps ever since.
Backups that are kicked off still finish normally, but at some point I
realized that the DB backups, set to run after completion of scheduled
backups, were never running. Further investigation showed that one of
the bpsched processes was always left running, another (I believe
it is the -mainempty process) dumps core.
ipcs -qa shows:
IPC status from <running system> as of Wed Dec 31 14:57:10 CET 2003
T ID KEY MODE OWNER GROUP CREATOR CGROUP
CBYTES QNUM QBYTES LSPID LRPID STIME RTIME CTIME
Message Queues:
q 0 0x4c544952 -Rrw-rw-rw- root root root root
0 0 65536 282 224 14:57:01 14:57:01 13:01:48
q 1 0x412a4250 -Rrw------- root root root root
0 0 65536 498 553 13:11:58 13:11:58 13:08:53
q 2 0x4f564e42 --rw------- root root root root
532 1 65536 516 516 13:12:33 13:12:33 13:08:53
To begin with, only some client backups were showing this behavior -
after some time, I noticed that I was not meeting the "minimum
requirements" from Veritas and installed the latest patch cluster from
Sun as well as adding the following to /etc/system:
* Message queues
set msgsys:msginfo_msgmap=512
set msgsys:msginfo_msgmax=8192
set msgsys:msginfo_msgmnb=65536
set msgsys:msginfo_msgmni=256
set msgsys:msginfo_msgssz=16
set msgsys:msginfo_msgtql=512
set msgsys:msginfo_msgseg=8192
* Semaphores
set semsys:seminfo_semmap=64
set semsys:seminfo_semmni=1024
set semsys:seminfo_semmns=1024
set semsys:seminfo_semmnu=1024
set semsys:seminfo_semmsl=300
set semsys:seminfo_semopm=32
set semsys:seminfo_semume=64
* Shared memory
set shmsys:shminfo_shmmax=16777216
set shmsys:shminfo_shmmin=1
set shmsys:shminfo_shmmni=220
set shmsys:shminfo_shmseg=100
The only result I got from this was that now all client backups seem to
result in bpsched core dumps instead of some of them... :-(
This does, however, lead me to believe that the problem is to be found
in the system configuration and not necessarily NetBackup.
I have not been able to find anything suspicious in the logs, no non-0
exit status is reported. I also replaced my customized notify scripts
with the originals from the distribution, that proved to be of no help
either.
Has anyone else observed this behavior? Any ideas on what may be wrong?
Any help is greatly appreciated.
Thanks,
Bruce
|