Veritas-bu

Re: [Veritas-bu] Catalog Backup fails with network connection timed out(41)

2013-02-07 01:20:05
Subject: Re: [Veritas-bu] Catalog Backup fails with network connection timed out(41)
From: mian71 AT gmail DOT com
To: Matthew Carr <matthew.carr AT canterbury.ac DOT nz>
Date: Thu, 7 Feb 2013 07:19:57 +0100
Hi Matthew

I have experienced that defragmentation of the netbackup catalog disk can help catalog backup performance which some times is the problem

Another thing worth trying is to increase CLIENT_READ_TIMEOUT and CLIENT_CONNECT_TIMEOUT

Regards
Michael

Den 07/02/2013 kl. 03.05 skrev Matthew Carr <matthew.carr AT canterbury.ac DOT nz>:

Gidday,

 

We recently manifested an issues where our daily full catalog backup started to fail with a network connection timed out(41) error. After a lot of research, trouble shooting, restarting, bouncing services etc I came up empty. In desperation, I froze the current tape it was using for the catalog backups, assigned some new ones to the pool and it magically started to work again (!?!?).

 

However, it began to periodically fail again, and now fails consistently. I’ tried the old trick of assigned some new tapes to sue, but the issue persists and I am completely at a loose end. Any ideas on where to look for the cause?

 

Master: Win2k3 R2 Sp2, NBU 7.5.0.3 Ent

Media: Win2k8 R2 Sp1, NBU 7.5.0.3 Ent

 

Console logs below:

 

CATALOG_DRIVEN_BACKUP staging begins.

 

7/02/2013 1:26:30 p.m. - Info bpdbm(pid=4796) staging relational database files for catalog backup     

7/02/2013 1:26:30 p.m. - Info bpdbm(pid=4796) staging NBAZDB backup to D:\VERITAS\NetBackupDB\staging       

7/02/2013 1:26:30 p.m. - Info bpdbm(pid=4796) done staging NBAZDB backup to D:\VERITAS\NetBackupDB\staging      

7/02/2013 1:26:30 p.m. - Info bpdbm(pid=4796) staging NBDB backup to D:\VERITAS\NetBackupDB\staging       

7/02/2013 1:26:49 p.m. - Info bpdbm(pid=4796) done staging NBDB backup to D:\VERITAS\NetBackupDB\staging      

7/02/2013 1:28:32 p.m. - Info bpdbm(pid=4796) validating NBAZDB backup in D:\VERITAS\NetBackupDB\staging       

7/02/2013 1:28:34 p.m. - Info bpdbm(pid=4796) done validating NBAZDB backup in D:\VERITAS\NetBackupDB\staging      

7/02/2013 1:28:34 p.m. - Info bpdbm(pid=4796) validating NBDB backup in D:\VERITAS\NetBackupDB\staging       

7/02/2013 1:28:46 p.m. - Info bpdbm(pid=4796) done validating NBDB backup in D:\VERITAS\NetBackupDB\staging      

the requested operation was successfully completed(0)

 

D:\VERITAS\NetBackupDB\staging\DARS_DATA.db writes to tape:

 

7/02/2013 1:26:52 p.m. - Info nbjm(pid=7348) starting backup job (jobid=4698153) for client ucbackupmaster, policy UCCatalog, schedule Full 

7/02/2013 1:26:52 p.m. - Info nbjm(pid=7348) requesting STANDARD_RESOURCE resources from RB for backup job (jobid=4698153, request id:{7CD98CB4-9847-4DF3-8F08-F9A593FE4EDC}) 

7/02/2013 1:26:52 p.m. - requesting resource UCBackupMedia1-SL500-STU

7/02/2013 1:26:52 p.m. - requesting resource ucbackupmaster.NBU_CLIENT.MAXJOBS.ucbackupmaster

7/02/2013 1:26:52 p.m. - requesting resource ucbackupmaster.NBU_POLICY.MAXJOBS.UCCatalog

7/02/2013 1:26:52 p.m. - granted resource ucbackupmaster.NBU_CLIENT.MAXJOBS.ucbackupmaster

7/02/2013 1:26:52 p.m. - granted resource ucbackupmaster.NBU_POLICY.MAXJOBS.UCCatalog

7/02/2013 1:26:52 p.m. - granted resource UC0514

7/02/2013 1:26:52 p.m. - granted resource HP.ULTRIUM4-SCSI.001

7/02/2013 1:26:52 p.m. - granted resource UCBackupMedia1-SL500-STU

7/02/2013 1:26:54 p.m. - estimated 0 Kbytes needed

7/02/2013 1:26:54 p.m. - Info nbjm(pid=7348) started backup (backupid=ucbackupmaster_1360196813) job for client ucbackupmaster, policy UCCatalog, schedule Full on storage unit UCBackupMedia1-SL500-STU

7/02/2013 1:26:55 p.m. - started process bpbrm (7484)

7/02/2013 1:26:56 p.m. - connecting

7/02/2013 1:26:57 p.m. - Info bpbrm(pid=7484) ucbackupmaster is the host to backup data from    

7/02/2013 1:26:57 p.m. - Info bpbrm(pid=7484) reading file list from client       

7/02/2013 1:26:58 p.m. - Info bpbrm(pid=7484) listening for client connection        

7/02/2013 1:27:07 p.m. - Info bpbrm(pid=7484) INF - Client read timeout = 300     

7/02/2013 1:27:07 p.m. - connected; connect time: 00:00:11

7/02/2013 1:27:08 p.m. - Info bpbrm(pid=7484) accepted connection from client        

7/02/2013 1:27:08 p.m. - Info bpbrm(pid=7484) start bpbkar on client        

7/02/2013 1:27:12 p.m. - Info bphdb(pid=5596) Backup started          

7/02/2013 1:27:12 p.m. - Info bphdb(pid=4796) Backup started          

7/02/2013 1:27:12 p.m. - Info bptm(pid=9240) start           

7/02/2013 1:27:12 p.m. - Info bptm(pid=9240) using 262144 data buffer size       

7/02/2013 1:27:13 p.m. - Info bptm(pid=9240) setting receive network buffer to 1048576 bytes     

7/02/2013 1:27:13 p.m. - Info bptm(pid=9240) using 64 data buffers        

7/02/2013 1:27:14 p.m. - Info bptm(pid=9240) start backup          

7/02/2013 1:27:14 p.m. - Info bptm(pid=9240) backup child process is pid 8256.13428      

7/02/2013 1:27:14 p.m. - Info bptm(pid=9240) Waiting for mount of media id UC0514 (copy 1) on server ucbackupmedia1.

7/02/2013 1:27:14 p.m. - Info bptm(pid=8256) start           

7/02/2013 1:27:14 p.m. - mounting UC0514

7/02/2013 1:27:58 p.m. - Info bptm(pid=9240) media id UC0514 mounted on drive index 4, drivepath {8,0,0,0}, drivename HP.ULTRIUM4-SCSI.001, copy 1

7/02/2013 1:27:58 p.m. - mounted; mount time: 00:00:44

7/02/2013 1:28:03 p.m. - positioning UC0514 to file 1

7/02/2013 1:28:09 p.m. - positioned UC0514; position time: 00:00:06

7/02/2013 1:28:09 p.m. - begin writing

7/02/2013 1:28:16 p.m. - Info bptm(pid=9240) waited for full buffer 439 times, delayed 445 times   

7/02/2013 1:28:16 p.m. - Info bphdb(pid=4796) done. status: 0         

7/02/2013 1:28:24 p.m. - Info bptm(pid=9240) EXITING with status 0 <----------       

7/02/2013 1:28:24 p.m. - Info bpbrm(pid=7484) validating image for client ucbackupmaster       

7/02/2013 1:28:30 p.m. - end writing; write time: 00:00:21

the requested operation was successfully completed(0)

 

The third step seems to fail:

 

7/02/2013 1:28:50 p.m. - Info nbjm(pid=7348) starting backup job (jobid=4698159) for client ucbackupmaster, policy UCCatalog, schedule Full 

7/02/2013 1:28:50 p.m. - Info nbjm(pid=7348) requesting STANDARD_RESOURCE resources from RB for backup job (jobid=4698159, request id:{0B669D3A-E130-40F6-BC95-B1E9AC203379}) 

7/02/2013 1:28:50 p.m. - requesting resource UCBackupMedia1-SL500-STU

7/02/2013 1:28:50 p.m. - requesting resource ucbackupmaster.NBU_CLIENT.MAXJOBS.ucbackupmaster

7/02/2013 1:28:50 p.m. - requesting resource ucbackupmaster.NBU_POLICY.MAXJOBS.UCCatalog

7/02/2013 1:28:50 p.m. - granted resource ucbackupmaster.NBU_CLIENT.MAXJOBS.ucbackupmaster

7/02/2013 1:28:50 p.m. - granted resource ucbackupmaster.NBU_POLICY.MAXJOBS.UCCatalog

7/02/2013 1:28:50 p.m. - granted resource UC0514

7/02/2013 1:28:50 p.m. - granted resource HP.ULTRIUM4-SCSI.001

7/02/2013 1:28:50 p.m. - granted resource UCBackupMedia1-SL500-STU

7/02/2013 1:28:51 p.m. - estimated 107257414 Kbytes needed

7/02/2013 1:28:51 p.m. - Info nbjm(pid=7348) started backup (backupid=ucbackupmaster_1360196930) job for client ucbackupmaster, policy UCCatalog, schedule Full on storage unit UCBackupMedia1-SL500-STU

7/02/2013 1:28:52 p.m. - started process bpbrm (9968)

7/02/2013 1:28:54 p.m. - Info bpbrm(pid=9968) ucbackupmaster is the host to backup data from    

7/02/2013 1:28:54 p.m. - Info bpbrm(pid=9968) reading file list from client       

7/02/2013 1:28:54 p.m. - connecting

7/02/2013 1:28:56 p.m. - Info bpbrm(pid=9968) starting bpbkar32 on client        

7/02/2013 1:28:56 p.m. - connected; connect time: 00:00:02

7/02/2013 1:28:58 p.m. - Info bpbkar32(pid=7520) Backup started          

7/02/2013 1:28:58 p.m. - Info bptm(pid=13836) start           

7/02/2013 1:28:58 p.m. - Info bptm(pid=13836) using 262144 data buffer size       

7/02/2013 1:28:58 p.m. - Info bptm(pid=13836) setting receive network buffer to 1048576 bytes     

7/02/2013 1:28:58 p.m. - Info bptm(pid=13836) using 64 data buffers        

7/02/2013 1:28:59 p.m. - Info bptm(pid=13836) start backup          

7/02/2013 1:28:59 p.m. - Info bptm(pid=13836) backup child process is pid 6932.5972      

7/02/2013 1:28:59 p.m. - Info bptm(pid=13836) media id UC0514 mounted on drive index 4, drivepath {8,0,0,0}, drivename HP.ULTRIUM4-SCSI.001, copy 1

7/02/2013 1:28:59 p.m. - Info bptm(pid=6932) start           

7/02/2013 1:28:59 p.m. - mounted

7/02/2013 1:28:59 p.m. - positioning UC0514 to file 2

7/02/2013 1:29:01 p.m. - positioned UC0514; position time: 00:00:02

7/02/2013 1:29:01 p.m. - begin writing

7/02/2013 1:34:10 p.m. - end writing; write time: 00:05:09

network connection timed out(41)

 

I’ve tried increasing timeouts to no avail.

 

Cheer,

 

Matthew

 

This email may be confidential and subject to legal privilege, it may
not reflect the views of the University of Canterbury, and it is not
guaranteed to be virus free. If you are not an intended recipient,
please notify the sender immediately and erase all copies of the message
and any attachments.

Please refer to http://www.canterbury.ac.nz/emaildisclaimer for more
information.

_______________________________________________
Veritas-bu maillist  -  Veritas-bu AT mailman.eng.auburn DOT edu
http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu
_______________________________________________
Veritas-bu maillist  -  Veritas-bu AT mailman.eng.auburn DOT edu
http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu
<Prev in Thread] Current Thread [Next in Thread>