Re: [Veritas-bu] Strange hung job problem
2010-04-20 06:03:37
Well i dont think the tape has unloaded, but it's getting stranger .
This is the final 20 odd lines from bpbkar log on the windows client.
This to be says the job has finished and ties in with the start time of
22:00.
22:32:46.684: [4448.2988] <2> tar_base::backup_finish: TAR - backup:
13574 files
22:32:46.684: [4448.2988] <2> tar_base::backup_finish: TAR - backup:
file data: 650849865 bytes 6 gigabytes
22:32:46.684: [4448.2988] <2> tar_base::backup_finish: TAR - backup:
image data: 703843328 bytes 6 gigabytes
22:32:46.684: [4448.2988] <2> tar_base::backup_finish: TAR - backup:
elapsed time: 1590 secs 4494522 bps
22:32:46.700: [4448.2988] <2> tar_base::V_vTarMsgW: INF - EXIT STATUS 0:
the requested operation was successfully completed
22:33:09.700: [4448.2988] <4> tar_backup::OVPC_WaitForServerComplete:
INF - 'INF - Server status = 0' received
22:33:10.028: [4448.2988] <2> tar_base::V_vTarMsgW: INF - Active
Directory: transcation logs truncated
22:33:10.028: [4448.2988] <16> dtcp_write: TCP - failure: send socket
(1840) (TCP 10058: Can't send after socket shutdown)
22:33:10.028: [4448.2988] <16> dtcp_write: TCP - failure: attempted to
send 6 bytes
22:33:10.028: [4448.2988] <16> dtcp_write: TCP - failure: send socket
(1840) (TCP 10058: Can't send after socket shutdown)
22:33:10.028: [4448.2988] <16> dtcp_write: TCP - failure: attempted to
send 45 bytes
22:33:10.028: [4448.2988] <4> OVStopCmd: INF - EXIT - status = 0
22:33:10.028: [4448.2988] <2> ov_log::V_GlobalLog: INF - BEDS_Term()
Enter InitFlags:0x1
22:33:10.090: [4448.2988] <4> OVShutdown: INF - Finished process
22:33:10.137: [4448.2988] <4> WinMain: INF - Exiting C:\Program
Files\VERITAS\NetBackup\bin\bpbkar32.exe
22:33:12.137: [4448.2988] <4> ov_log::OVClose: INF - Closing log file:
C:\Program Files\VERITAS\NetBackup\logs\BPBKAR\041910.LOG
Yet on the Master the job is still active and says its about 90% through
(although i don't use the gui a great deal).
Anyway on the gui job details it says it started loading a new tape
P805L2 around 01:18 this morning???
There are also WRN messages from bperror on the master for the client in
question with time stamps after 22:33.
The messages i have seen before and am not too concerned about, but how
can the master still be doing things when the client bpbkar process has
finished?
# bperror -all -client <client> -U
--extract---
04/19/2010 23:14:44 <master> <client> from client <client>: WRN - Removable
Storage Management: unable to export database (WIN32 21:
Unknown error)
04/19/2010 23:14:45 <master> <client> from client <client>: ERR - failure
reading file: C:\WINDOWS\system32\NtmsData\NTMSDATA
(WIN32
33: Unknown error)
04/19/2010 23:18:08 <master> <client> from client <client>: ERR - failure
reading file:
C:\WINDOWS\system32\wbem\Logs\FrameWork.log
(WIN32 13: Unknown error)
04/20/2010 01:18:34 <master> <client> begin writing backup id
<client>_1271711139, copy 1, fragment 180, to media id
P805L2 on drive index 0
These messages keep repeating in the bprd log on the master every 10 odd
minutes which i'm not sure if is related.:-
10:35:30.456 [5622] <2> launch: /usr/openv/netbackup/bin/bpsched, pid=465
10:35:31.822 [5622] <2> childterm: pid=465 exit=200, signo=0 core=no
10:35:31.822 [5622] <2> schild: child(ren) not ready to terminate
10:35:31.822 [5622] <2> listen_loop: select() interrupted
10:45:29.742 [5622] <2> get_behavior_rec: ?
10:45:29.743 [5622] <2> get_behavior_rec: MD5 digest line
10:45:30.826 [5622] <2> launch: /usr/openv/netbackup/bin/bpsched, pid=698
10:45:32.188 [5622] <2> childterm: pid=698 exit=200, signo=0 core=no
10:45:32.188 [5622] <2> schild: child(ren) not ready to terminate
10:45:32.188 [5622] <2> listen_loop: select() interrupted
Jobs are being multiplexed and everything else seems to have finished
apart from this client.
There are a few "db_lock_media: unable to lock media at offset 14
(P843L2)" type messages for the first tape in use for the backups in
bptm log
It successfully mounts the next tape it requires P805L2 (must have ran
out of space on P843) and other jobs which were still multiplexing seem
to write ok and finish.
Any other areas to look?
Jeff Cleverley wrote:
> Dave,
>
> Do you know if the tape library completes the media unload properly?
> I believe that the log may show the backup completed but the session
> for the master doesn't complete until the media has been unloaded.
> You may check things like the bptm or bprd logs on your master also.
>
> Jeff
>
> On Mon, Apr 19, 2010 at 8:24 AM, Dave Markham <dave.markham AT fjserv DOT
> net> wrote:
>
>> Guys anyone remember seeing something like this back on NBU 5.
>>
>> I'm running NBU 5.0MP7 on a Unix Server attached to L25 tape library.
>>
>> I have a windows client which works fine some times and then other times
>> just sits there and hangs everything up. It never finishes and needs
>> cancelling manually.
>>
>> Windows client is Win Server 2003 SP2 and has NBU 5.0mp7 client.
>>
>> Now according to the bpkar log on the client the job has finished
>> successfully, but on the master/media server the job stays hung as
>> though its not registered it finishing.
>>
>> I have some logs if anyone could help out??
>>
>> Cheers
>>
>> _______________________________________________
>> Veritas-bu maillist - Veritas-bu AT mailman.eng.auburn DOT edu
>> http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu
>>
>>
>
>
>
>
_______________________________________________
Veritas-bu maillist - Veritas-bu AT mailman.eng.auburn DOT edu
http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu
|
|
|