We recently moved some MSSQL from server A (16G RAM) to server B (4G RAM),
scripts call for $ALL.
Of the 15 databases, it tries to start all 15 up, it starts up 8 (and backs up
successfully), fails 9 - 15, and the parent jobs just hangs, no error code is
returned to NetBackup. If we lower the buffers from 3 to 2 and stripes from 2
to 1, it works fine.
The issues we have are 1) how to calculate buffers and stripes, and 2) why this
is allowed to lock up and fail with no exit error code.
Here is detail from the log and Symantec support comments:
I think I found the root cause of the backup hanging. I looked through the
dbclient log and see the following:
---
15:14:18.320 [7976.4920] <16> writeToServer: ERR - send() to server on socket
failed:
15:14:18.320 [7976.4920] <16> dbc_put: ERR - failed sending data to server
15:14:18.445 [7976.4920] <16> VxBSASendData: ERR - Could not do a bsa_put().
15:14:18.445 [7976.4920] <16> DBthreads::dbclient: ERR - Error in
VxBSASendData: 1.
---
Above we have a socket failure. This results in failure to update the thread
which sets up the failure below:
---
15:14:18.445 [7976.4920] <16> CDBbackrec::ProcessVxBSAerror: ERR - Error in
DBthreads::dbclient: 6.
15:14:18.445 [7976.4920] <1> CDBbackrec::ProcessVxBSAerror: CONTINUATION: -
The system cannot find the file specified.
15:14:18.445 [7976.4920] <16> DBthreads::dbclient: ERR - Error in VxBSAEndData:
6.
15:14:18.445 [7976.4920] <1> DBthreads::dbclient: CONTINUATION: - The
handle used to associate this call with a previous VxBSAInit() call is invalid.
---
At this point the application panics. See the entries below:
---
15:14:18.461 [7976.7632] <16> DBthreads::dbclient: ERR - Error in
CompleteCommand: 0x80770004.
15:14:18.461 [7976.7632] <16> DBthreads::dbclient: ERR - A panic close was
issued to dbclient #2.
15:14:18.461 [7976.6932] <16> DBthreads::dbclient: ERR - Error in
CompleteCommand: 0x80770004.
15:14:18.523 [7976.6932] <16> DBthreads::dbclient: ERR - A panic close was
issued to dbclient #1.
---
I'm not sure you can call this a bug. I suppose the code could be a little more
robust and have a timeout set for the bsa_put() and/or the VxBSAInit() function
call.
David McMullin
_______________________________________________
Veritas-bu maillist - Veritas-bu AT mailman.eng.auburn DOT edu
http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu
|