Veritas-bu

Re: [Veritas-bu] Strange hung job problem

2010-04-20 06:21:36
Subject: Re: [Veritas-bu] Strange hung job problem
From: "WEAVER, Simon \(external\)" <simon.weaver AT astrium.eads DOT net>
To: <dave.markham AT fjserv DOT net>, <veritas-bu AT mailman.eng.auburn DOT edu>
Date: Tue, 20 Apr 2010 11:21:07 +0100
Dave
Is it just this one client?
Wondered if the client needed a bounce? Sounds like the activity from
the client is not being passed back to the Master.

If its one client - Reboot it perhaps if possible? 
Simon 

-----Original Message-----
From: veritas-bu-bounces AT mailman.eng.auburn DOT edu
[mailto:veritas-bu-bounces AT mailman.eng.auburn DOT edu] On Behalf Of Dave
Markham
Sent: Tuesday, April 20, 2010 11:02 AM
To: veritas-bu AT mailman.eng.auburn DOT edu
Subject: Re: [Veritas-bu] Strange hung job problem

Well i dont think the tape has unloaded, but it's getting stranger .

This is the final 20 odd lines from bpbkar log on the windows client. 
This to be says the job has finished and ties in with the start time of
22:00.

22:32:46.684: [4448.2988] <2> tar_base::backup_finish: TAR - backup: 
13574 files
22:32:46.684: [4448.2988] <2> tar_base::backup_finish: TAR - backup:  
file data:  650849865 bytes  6 gigabytes
22:32:46.684: [4448.2988] <2> tar_base::backup_finish: TAR - backup:  
image data:  703843328 bytes  6 gigabytes
22:32:46.684: [4448.2988] <2> tar_base::backup_finish: TAR - backup:  
elapsed time:       1590 secs      4494522 bps
22:32:46.700: [4448.2988] <2> tar_base::V_vTarMsgW: INF - EXIT STATUS 0:

the requested operation was successfully completed
22:33:09.700: [4448.2988] <4> tar_backup::OVPC_WaitForServerComplete: 
INF - 'INF - Server status = 0' received
22:33:10.028: [4448.2988] <2> tar_base::V_vTarMsgW: INF - Active
Directory: transcation logs truncated
22:33:10.028: [4448.2988] <16> dtcp_write: TCP - failure: send socket
(1840) (TCP 10058: Can't send after socket shutdown)
22:33:10.028: [4448.2988] <16> dtcp_write: TCP - failure: attempted to
send 6 bytes
22:33:10.028: [4448.2988] <16> dtcp_write: TCP - failure: send socket
(1840) (TCP 10058: Can't send after socket shutdown)
22:33:10.028: [4448.2988] <16> dtcp_write: TCP - failure: attempted to
send 45 bytes
22:33:10.028: [4448.2988] <4> OVStopCmd: INF - EXIT - status = 0
22:33:10.028: [4448.2988] <2> ov_log::V_GlobalLog: INF - BEDS_Term()
Enter InitFlags:0x1
22:33:10.090: [4448.2988] <4> OVShutdown: INF - Finished process
22:33:10.137: [4448.2988] <4> WinMain: INF - Exiting C:\Program
Files\VERITAS\NetBackup\bin\bpbkar32.exe
22:33:12.137: [4448.2988] <4> ov_log::OVClose: INF - Closing log file: 
C:\Program Files\VERITAS\NetBackup\logs\BPBKAR\041910.LOG


Yet on the Master the job is still active and says its about 90% through
(although i don't use the gui a great deal).

Anyway on the gui job details it says it started loading a new tape
P805L2 around 01:18 this morning???
There are also WRN messages from bperror on the master for the client in
question with time stamps after 22:33.
The messages i have seen before and am not too concerned about, but how
can the master still be doing things when the client bpbkar process has
finished?

# bperror -all -client <client> -U  

--extract---
04/19/2010 23:14:44 <master> <client>  from client <client>: WRN -
Removable
                    Storage Management: unable to export database (WIN32
21:
                    Unknown error)
04/19/2010 23:14:45 <master> <client>  from client <client>: ERR -
failure
                    reading file: C:\WINDOWS\system32\NtmsData\NTMSDATA
(WIN32
                    33: Unknown error)
04/19/2010 23:18:08 <master> <client>  from client <client>: ERR -
failure
                    reading file: 
C:\WINDOWS\system32\wbem\Logs\FrameWork.log
                    (WIN32 13: Unknown error) 04/20/2010 01:18:34
<master> <client>  begin writing backup id
                    <client>_1271711139, copy 1, fragment 180, to media
id
                    P805L2 on drive index 0


These messages keep repeating in the bprd log on the master every 10 odd

minutes which i'm not sure if is related.:-

10:35:30.456 [5622] <2> launch: /usr/openv/netbackup/bin/bpsched,
pid=465
10:35:31.822 [5622] <2> childterm: pid=465 exit=200, signo=0 core=no
10:35:31.822 [5622] <2> schild: child(ren) not ready to terminate
10:35:31.822 [5622] <2> listen_loop: select() interrupted
10:45:29.742 [5622] <2> get_behavior_rec: ?
10:45:29.743 [5622] <2> get_behavior_rec: MD5 digest line
10:45:30.826 [5622] <2> launch: /usr/openv/netbackup/bin/bpsched,
pid=698
10:45:32.188 [5622] <2> childterm: pid=698 exit=200, signo=0 core=no
10:45:32.188 [5622] <2> schild: child(ren) not ready to terminate
10:45:32.188 [5622] <2> listen_loop: select() interrupted

Jobs are being multiplexed and everything else seems to have finished 
apart from this client.

There are a few "db_lock_media: unable to lock media at offset 14 
(P843L2)" type messages for the first tape in use for the backups in 
bptm log

It successfully mounts the next tape it requires P805L2 (must have ran 
out of space on P843) and other jobs which were still multiplexing seem 
to write ok and finish.


Any other areas to look?



Jeff Cleverley wrote:
> Dave,
>
> Do you know if the tape library completes the media unload properly?
> I believe that the log may show the backup completed but the session
> for the master doesn't complete until the media has been unloaded.
> You may check things like the bptm or bprd logs on your master also.
>
> Jeff
>
> On Mon, Apr 19, 2010 at 8:24 AM, Dave Markham
<dave.markham AT fjserv DOT net> wrote:
>   
>> Guys anyone remember seeing something like this back on NBU 5.
>>
>> I'm running NBU 5.0MP7 on a Unix Server attached to L25 tape library.
>>
>> I have a windows client which works fine some times and then other
times
>> just sits there and hangs everything up. It never finishes and needs
>> cancelling manually.
>>
>> Windows client is Win Server 2003 SP2 and has NBU 5.0mp7 client.
>>
>> Now according to the bpkar log on the client the job has finished
>> successfully, but on the master/media server the job stays hung as
>> though its not registered it finishing.
>>
>> I have some logs if anyone could help out??
>>
>> Cheers
>>
>> _______________________________________________
>> Veritas-bu maillist  -  Veritas-bu AT mailman.eng.auburn DOT edu
>> http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu
>>
>>     
>
>
>
>   


_______________________________________________
Veritas-bu maillist  -  Veritas-bu AT mailman.eng.auburn DOT edu
http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu

This email (including any attachments) may contain confidential
and/or privileged information or information otherwise protected
from disclosure. If you are not the intended recipient, please
notify the sender immediately, do not copy this message or any
attachments and do not use it for any purpose or disclose its
content to any person, but delete this message and any attachments
from your system. Astrium disclaims any and all liability if this
email transmission was virus corrupted, altered or falsified.
-o-
Astrium Limited, Registered in England and Wales No. 2449259
Registered Office:
Gunnels Wood Road, Stevenage, Hertfordshire, SG1 2AS, England
_______________________________________________
Veritas-bu maillist  -  Veritas-bu AT mailman.eng.auburn DOT edu
http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu

<Prev in Thread] Current Thread [Next in Thread>