Veritas-bu

Re: [Veritas-bu] Strange hung job problem

2010-04-20 06:03:37
Subject: Re: [Veritas-bu] Strange hung job problem
From: Dave Markham <dave.markham AT fjserv DOT net>
To: "veritas-bu AT mailman.eng.auburn DOT edu" <veritas-bu AT mailman.eng.auburn DOT edu>
Date: Tue, 20 Apr 2010 11:01:55 +0100
Well i dont think the tape has unloaded, but it's getting stranger .

This is the final 20 odd lines from bpbkar log on the windows client. 
This to be says the job has finished and ties in with the start time of 
22:00.

22:32:46.684: [4448.2988] <2> tar_base::backup_finish: TAR - backup: 
13574 files
22:32:46.684: [4448.2988] <2> tar_base::backup_finish: TAR - backup:  
file data:  650849865 bytes  6 gigabytes
22:32:46.684: [4448.2988] <2> tar_base::backup_finish: TAR - backup:  
image data:  703843328 bytes  6 gigabytes
22:32:46.684: [4448.2988] <2> tar_base::backup_finish: TAR - backup:  
elapsed time:       1590 secs      4494522 bps
22:32:46.700: [4448.2988] <2> tar_base::V_vTarMsgW: INF - EXIT STATUS 0: 
the requested operation was successfully completed
22:33:09.700: [4448.2988] <4> tar_backup::OVPC_WaitForServerComplete: 
INF - 'INF - Server status = 0' received
22:33:10.028: [4448.2988] <2> tar_base::V_vTarMsgW: INF - Active 
Directory: transcation logs truncated
22:33:10.028: [4448.2988] <16> dtcp_write: TCP - failure: send socket 
(1840) (TCP 10058: Can't send after socket shutdown)
22:33:10.028: [4448.2988] <16> dtcp_write: TCP - failure: attempted to 
send 6 bytes
22:33:10.028: [4448.2988] <16> dtcp_write: TCP - failure: send socket 
(1840) (TCP 10058: Can't send after socket shutdown)
22:33:10.028: [4448.2988] <16> dtcp_write: TCP - failure: attempted to 
send 45 bytes
22:33:10.028: [4448.2988] <4> OVStopCmd: INF - EXIT - status = 0
22:33:10.028: [4448.2988] <2> ov_log::V_GlobalLog: INF - BEDS_Term() 
Enter InitFlags:0x1
22:33:10.090: [4448.2988] <4> OVShutdown: INF - Finished process
22:33:10.137: [4448.2988] <4> WinMain: INF - Exiting C:\Program 
Files\VERITAS\NetBackup\bin\bpbkar32.exe
22:33:12.137: [4448.2988] <4> ov_log::OVClose: INF - Closing log file: 
C:\Program Files\VERITAS\NetBackup\logs\BPBKAR\041910.LOG


Yet on the Master the job is still active and says its about 90% through 
(although i don't use the gui a great deal).

Anyway on the gui job details it says it started loading a new tape 
P805L2 around 01:18 this morning???
There are also WRN messages from bperror on the master for the client in 
question with time stamps after 22:33.
The messages i have seen before and am not too concerned about, but how 
can the master still be doing things when the client bpbkar process has 
finished?

# bperror -all -client <client> -U  

--extract---
04/19/2010 23:14:44 <master> <client>  from client <client>: WRN - Removable
                    Storage Management: unable to export database (WIN32 21:
                    Unknown error)
04/19/2010 23:14:45 <master> <client>  from client <client>: ERR - failure
                    reading file: C:\WINDOWS\system32\NtmsData\NTMSDATA 
(WIN32
                    33: Unknown error)
04/19/2010 23:18:08 <master> <client>  from client <client>: ERR - failure
                    reading file: 
C:\WINDOWS\system32\wbem\Logs\FrameWork.log
                    (WIN32 13: Unknown error)
04/20/2010 01:18:34 <master> <client>  begin writing backup id
                    <client>_1271711139, copy 1, fragment 180, to media id
                    P805L2 on drive index 0


These messages keep repeating in the bprd log on the master every 10 odd 
minutes which i'm not sure if is related.:-

10:35:30.456 [5622] <2> launch: /usr/openv/netbackup/bin/bpsched, pid=465
10:35:31.822 [5622] <2> childterm: pid=465 exit=200, signo=0 core=no
10:35:31.822 [5622] <2> schild: child(ren) not ready to terminate
10:35:31.822 [5622] <2> listen_loop: select() interrupted
10:45:29.742 [5622] <2> get_behavior_rec: ?
10:45:29.743 [5622] <2> get_behavior_rec: MD5 digest line
10:45:30.826 [5622] <2> launch: /usr/openv/netbackup/bin/bpsched, pid=698
10:45:32.188 [5622] <2> childterm: pid=698 exit=200, signo=0 core=no
10:45:32.188 [5622] <2> schild: child(ren) not ready to terminate
10:45:32.188 [5622] <2> listen_loop: select() interrupted

Jobs are being multiplexed and everything else seems to have finished 
apart from this client.

There are a few "db_lock_media: unable to lock media at offset 14 
(P843L2)" type messages for the first tape in use for the backups in 
bptm log

It successfully mounts the next tape it requires P805L2 (must have ran 
out of space on P843) and other jobs which were still multiplexing seem 
to write ok and finish.


Any other areas to look?



Jeff Cleverley wrote:
> Dave,
>
> Do you know if the tape library completes the media unload properly?
> I believe that the log may show the backup completed but the session
> for the master doesn't complete until the media has been unloaded.
> You may check things like the bptm or bprd logs on your master also.
>
> Jeff
>
> On Mon, Apr 19, 2010 at 8:24 AM, Dave Markham <dave.markham AT fjserv DOT 
> net> wrote:
>   
>> Guys anyone remember seeing something like this back on NBU 5.
>>
>> I'm running NBU 5.0MP7 on a Unix Server attached to L25 tape library.
>>
>> I have a windows client which works fine some times and then other times
>> just sits there and hangs everything up. It never finishes and needs
>> cancelling manually.
>>
>> Windows client is Win Server 2003 SP2 and has NBU 5.0mp7 client.
>>
>> Now according to the bpkar log on the client the job has finished
>> successfully, but on the master/media server the job stays hung as
>> though its not registered it finishing.
>>
>> I have some logs if anyone could help out??
>>
>> Cheers
>>
>> _______________________________________________
>> Veritas-bu maillist  -  Veritas-bu AT mailman.eng.auburn DOT edu
>> http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu
>>
>>     
>
>
>
>   


_______________________________________________
Veritas-bu maillist  -  Veritas-bu AT mailman.eng.auburn DOT edu
http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu

<Prev in Thread] Current Thread [Next in Thread>