Amanda-Users

Re: Index Tees - Data Timeouts

2002-08-14 15:33:33
Subject: Re: Index Tees - Data Timeouts
From: Jim Summers <jsummers AT bachman.cs.ou DOT edu>
To: amanda-users <amanda-users AT amanda DOT org>
Date: 14 Aug 2002 14:19:16 -0500
In an effort to debug this problem, is there a way I can interactively
run the command(s) that amanda would run to see if anything is dumped to
stdout?  If so, are these the commands in runtar and sendbackup?  Or
would it be better to comment out all filesystems except one of the ones
having problems and run amdump?

Thanks again,
Jim


On Wed, 2002-08-14 at 10:41, Jim Summers wrote:
> On Wed, 2002-08-14 at 09:44, Joshua Baker-LePain wrote:
> > On 14 Aug 2002 at 9:26am, Jim Summers wrote
> > 
> > > On Wed, 2002-08-14 at 08:23, Joshua Baker-LePain wrote:
> > > > On 14 Aug 2002 at 8:09am, Jim Summers wrote
> > > > 
> > > > > I am running Amanda 2.4.2p2 on a Redhat Linux 7.3 as my Amanda 
> > > > > server. 
> > > > > The clients are mostly Solaris.  I have been been backing up the 
> > > > > server
> > > > > and adding clients one at a time.  Everything was working well, one
> > > > > server and two clients, then I added a third client.  Now I getting 
> > > > > data
> > > > > timeouts and index tee broken messages in my Amanda reports and in the
> > > > > system log files.
> > > > 
> > > > >From which systems?  The actual error messages would be most helpful.
> > > >From one of the working systems a Sun E250 Solaris 8 and from the newly
> > > added system Sun Ultra10 Solaris 8.  I will send the amanda report when
> > > I get the next one.
> > 
> > You said you had messages in the system log files -- what are those?  You 
> Here are the messages in my system log file:
> 
> Aug 14 01:14:14 turing sendbackup[17657]: [ID 702911 auth.notice] index
> tee cannot write [Broken pipe]
> Aug 14 01:14:14 turing sendbackup[17655]: [ID 702911 auth.notice] error
> [/usr/local/bin/tar got signal 13, compress returned 1]
> Aug 14 02:06:34 turing sendbackup[17740]: [ID 702911 auth.notice] index
> tee cannot write [Broken pipe]
> 
> 
> 
> 
> > could also try increasing dtimeout...
> I have twiddled with that one went from 1800 to 3600.  Then back to 1800
> and I am currently at 2400.  
> 
> > 
> > How do your dumprates look?
> Here is the last amanda report received.  I incorrectly used the wrong
> dump type on the /usr/oracle fs.
> 
> > 
> These dumps were to tapes daily09, daily10.
> The next 2 tapes Amanda expects to used are: daily11, daily12.
> 
> FAILURE AND STRANGE DUMP SUMMARY:
>   tarjan     /opt lev 0 FAILED [data timeout]
>   turing     /cs/turing/home2 lev 1 FAILED [data timeout]These dumps
> were to tapes daily09, daily10.
> The next 2 tapes Amanda expects to used are: daily11, daily12.
> 
> FAILURE AND STRANGE DUMP SUMMARY:
>   tarjan     /opt lev 0 FAILED [data timeout]
>   turing     /cs/turing/home2 lev 1 FAILED [data timeout]
>   turing     /opt lev 0 FAILED [data timeout]
>   tarjan     /usr/oracle lev 0 FAILED [data timeout]
>   turing     /usr lev 0 STRANGE
> 
> 
> STATISTICS:
>                           Total       Full      Daily
>                         --------   --------   --------
> Estimate Time (hrs:min)    0:50
> Run Time (hrs:min)        10:21
> Dump Time (hrs:min)        4:26       4:08       0:17
> Output Size (meg)       13542.1    11766.4     1775.7
> Original Size (meg)     26361.5    24555.9     1805.6
> Avg Compressed Size (%)    47.9       47.9        7.3   (level:#disks
> ...)
> Filesystems Dumped           13          4          9   (1:9)
> Avg Dump Rate (k/s)       870.0      808.9     1742.9
> 
> Tape Time (hrs:min)        3:23       2:56       0:27
> Tape Size (meg)         13542.5    11766.5     1776.0
> Tape Used (%)             116.7      101.4       15.3   (level:#disks
> ...)
> Filesystems Taped            13          4          9   (1:9)
> Avg Tp Write Rate (k/s)  1137.5     1141.5     1111.7
> 
> 
> FAILED AND STRANGE DUMP DETAILS:
> 
> /-- tarjan     /opt lev 0 FAILED [data timeout]
> sendbackup: start [tarjan:/opt level 0]
> sendbackup: info BACKUP=/usr/local/bin/tar
> sendbackup: info RECOVER_CMD=/usr/local/bin/gzip -dc |/usr/local/bin/tar
> -f... -
> sendbackup: info COMPRESS_SUFFIX=.gz
> sendbackup: info end
> ? 
> \--------
> 
> /-- turing     /cs/turing/home2 lev 1 FAILED [data timeout]
> sendbackup: start [turing:/cs/turing/home2 level 1]
> sendbackup: info BACKUP=/usr/local/bin/tar
> sendbackup: info RECOVER_CMD=/usr/local/bin/gzip -dc |/usr/local/bin/tar
> -f... -
> sendbackup: info COMPRESS_SUFFIX=.gz
> sendbackup: info end
> ? 
> \--------
> 
> /-- turing     /opt lev 0 FAILED [data timeout]
> sendbackup: start [turing:/opt level 0]
> sendbackup: info BACKUP=/usr/sbin/ufsdump
> sendbackup: info RECOVER_CMD=/usr/local/bin/gzip -dc
> |/usr/sbin/ufsrestore -f... -
> sendbackup: info COMPRESS_SUFFIX=.gz
> sendbackup: info end
> |   DUMP: Writing 32 Kilobyte records
> |   DUMP: Date of this level 0 dump: Wed Aug 14 01:54:45 2002
> |   DUMP: Date of last level 0 dump: the epoch
> |   DUMP: Dumping /dev/rdsk/c0t0d0s5 (turing:/opt) to standard output.
> |   DUMP: Mapping (Pass I) [regular files]
> |   DUMP: Mapping (Pass II) [directories]
> |   DUMP: Estimated 6949216 blocks (3393.17MB) on 0.05 tapes.
> |   DUMP: Dumping (Pass III) [directories]
> |   DUMP: Dumping (Pass IV) [regular files]
> | 
> ? gzip: stdout: Broken pipe
> ? sendbackup: index tee cannot write [Broken pipe]
> |   DUMP: Broken pipe
> |   DUMP: The ENTIRE dump is aborted.
> ? index returned 1
> sendbackup: error [/usr/sbin/ufsdump returned 3, compress returned 1]
> \--------
> 
> /-- tarjan     /usr/oracle lev 0 FAILED [data timeout]
> sendbackup: start [tarjan:/usr/oracle level 0]
> sendbackup: info BACKUP=/usr/sbin/ufsdump
> sendbackup: info RECOVER_CMD=/usr/local/bin/gzip -dc
> |/usr/sbin/ufsrestore -f... -
> sendbackup: info COMPRESS_SUFFIX=.gz
> sendbackup: info end
> |   DUMP: Writing 32 Kilobyte records
> |   DUMP: Date of this level 0 dump: Wed Aug 14 01:55:12 2002
> |   DUMP: Date of last level 0 dump: the epoch
> |   DUMP: Dumping /dev/rdsk/c0t0d0s3 (tarjan:/usr) to standard output.
> |   DUMP: Mapping (Pass I) [regular files]
> |   DUMP: Mapping (Pass II) [directories]
> |   DUMP: Estimated 6191034 blocks (3022.97MB) on 0.04 tapes.
> |   DUMP: Dumping (Pass III) [directories]
> |   DUMP: Dumping (Pass IV) [regular files]
> | 
> ? gzip: stdout: Broken pipe
> ? sendbackup: index tee cannot write [Broken pipe]
> |   DUMP: Broken pipe
> |   DUMP: The ENTIRE dump is aborted.
> ? index returned 1
> sendbackup: error [/usr/sbin/ufsdump returned 3, compress returned 1]
> \--------
> 
> /-- turing     /usr lev 0 STRANGE
> sendbackup: start [turing:/usr level 0]
> sendbackup: info BACKUP=/usr/local/bin/tar
> sendbackup: info RECOVER_CMD=/usr/local/bin/gzip -dc |/usr/local/bin/tar
> -f... -
> sendbackup: info COMPRESS_SUFFIX=.gz
> sendbackup: info end
> ? gtar: ./local/var/amanda/gnutar-lists/tarjan_dbfiles_1.new: Warning:
> Cannot stat: No such file or directory
> | Total bytes written: 7120732160 (6.6GB, 1.7MB/s)
> sendbackup: size 6953840
> sendbackup: end
> \--------
> 
> 
> NOTES:
>   planner: Adding new disk tarjan:/usr/oracle.
>   planner: Incremental of turing:/cs/turing/facstaff1 bumped to level 3.
>   taper: tape daily09 kb 11904800 fm 13 writing file: No space left on
> device
>   taper: retrying turing:/cs/turing/facstaff1.0 on new tape: [writing
> file: No space left on device]
>   taper: tape daily10 kb 8512896 fm 1 [OK]
> 
> 
> DUMP SUMMARY:
>                                      DUMPER STATS            TAPER STATS
> HOSTNAME     DISK        L ORIG-KB OUT-KB COMP% MMM:SS  KB/s MMM:SS 
> KB/s
> -------------------------- ---------------------------------
> ------------
> bachman      /etc        1      90     32  35.6   0:01  62.0   0:08  
> 8.2
> bachman      /var/mail   0  491100 188864  38.5   9:41 325.0  
> 2:471131.8
> suman        /usr        0 2135250 731424  34.3   9:011352.0 
> 10:541119.3
> tarjan       /dbfiles    1 16572481657248   --    8:103380.0 
> 24:141140.2
> tarjan       /dblogs     1  158624 158624   --    0:423746.0  
> 2:251095.1
> tarjan       /etc        1      90     32  35.6   0:02  21.3   0:02 
> 27.3
> tarjan       /opt        0 FAILED
> ---------------------------------------
> tarjan       /usr/oracle 0 FAILED
> ---------------------------------------
> turing       -ng/dbfiles 1      32     32   --    0:00  87.3   0:02 
> 33.0
> turing       -ing/dblogs 1      32     32   --    0:00  65.3   0:02 
> 32.9
> turing       -/facstaff1 0 155651008512864  54.7 163:19 868.7
> 124:041143.6
> turing       -ring/home1 1   15720   1312   8.3   7:57   2.8   0:08
> 163.3
> turing       -ring/home2 1 FAILED
> ---------------------------------------
> turing       /etc        1     130     32  24.6   0:01  44.0   0:13  
> 5.0
> turing       /opt        0 FAILED
> ---------------------------------------
> turing       /usr        0 69538402615648  37.6  66:14 658.2 
> 38:111141.7
> turing       /var        1   16970    992   5.8   0:30  33.1   0:02
> 444.0
> 
> (brought to you by Amanda version 2.4.2p2)
> 
>   turing     /opt lev 0 FAILED [data timeout]
>   tarjan     /usr/oracle lev 0 FAILED [data timeout]
>   turing     /usr lev 0 STRANGE
> 
> 
> STATISTICS:
>                           Total       Full      Daily
>                         --------   --------   --------
> Estimate Time (hrs:min)    0:50
> Run Time (hrs:min)        10:21
> Dump Time (hrs:min)        4:26       4:08       0:17
> Output Size (meg)       13542.1    11766.4     1775.7
> Original Size (meg)     26361.5    24555.9     1805.6
> Avg Compressed Size (%)    47.9       47.9        7.3   (level:#disks
> ...)
> Filesystems Dumped           13          4          9   (1:9)
> Avg Dump Rate (k/s)       870.0      808.9     1742.9
> 
> Tape Time (hrs:min)        3:23       2:56       0:27
> Tape Size (meg)         13542.5    11766.5     1776.0
> Tape Used (%)             116.7      101.4       15.3   (level:#disks
> ...)
> Filesystems Taped            13          4          9   (1:9)
> Avg Tp Write Rate (k/s)  1137.5     1141.5     1111.7
> 
> 
> FAILED AND STRANGE DUMP DETAILS:
> 
> /-- tarjan     /opt lev 0 FAILED [data timeout]
> sendbackup: start [tarjan:/opt level 0]
> sendbackup: info BACKUP=/usr/local/bin/tar
> sendbackup: info RECOVER_CMD=/usr/local/bin/gzip -dc |/usr/local/bin/tar
> -f... -
> sendbackup: info COMPRESS_SUFFIX=.gz
> sendbackup: info end
> ? 
> \--------
> 
> /-- turing     /cs/turing/home2 lev 1 FAILED [data timeout]
> sendbackup: start [turing:/cs/turing/home2 level 1]
> sendbackup: info BACKUP=/usr/local/bin/tar
> sendbackup: info RECOVER_CMD=/usr/local/bin/gzip -dc |/usr/local/bin/tar
> -f... -
> sendbackup: info COMPRESS_SUFFIX=.gz
> sendbackup: info end
> ? 
> \--------
> 
> /-- turing     /opt lev 0 FAILED [data timeout]
> sendbackup: start [turing:/opt level 0]
> sendbackup: info BACKUP=/usr/sbin/ufsdump
> sendbackup: info RECOVER_CMD=/usr/local/bin/gzip -dc
> |/usr/sbin/ufsrestore -f... -
> sendbackup: info COMPRESS_SUFFIX=.gz
> sendbackup: info end
> |   DUMP: Writing 32 Kilobyte records
> |   DUMP: Date of this level 0 dump: Wed Aug 14 01:54:45 2002
> |   DUMP: Date of last level 0 dump: the epoch
> |   DUMP: Dumping /dev/rdsk/c0t0d0s5 (turing:/opt) to standard output.
> |   DUMP: Mapping (Pass I) [regular files]
> |   DUMP: Mapping (Pass II) [directories]
> |   DUMP: Estimated 6949216 blocks (3393.17MB) on 0.05 tapes.
> |   DUMP: Dumping (Pass III) [directories]
> |   DUMP: Dumping (Pass IV) [regular files]
> | 
> ? gzip: stdout: Broken pipe
> ? sendbackup: index tee cannot write [Broken pipe]
> |   DUMP: Broken pipe
> |   DUMP: The ENTIRE dump is aborted.
> ? index returned 1
> sendbackup: error [/usr/sbin/ufsdump returned 3, compress returned 1]
> \--------
> 
> /-- tarjan     /usr/oracle lev 0 FAILED [data timeout]
> sendbackup: start [tarjan:/usr/oracle level 0]
> sendbackup: info BACKUP=/usr/sbin/ufsdump
> sendbackup: info RECOVER_CMD=/usr/local/bin/gzip -dc
> |/usr/sbin/ufsrestore -f... -
> sendbackup: info COMPRESS_SUFFIX=.gz
> sendbackup: info end
> |   DUMP: Writing 32 Kilobyte records
> |   DUMP: Date of this level 0 dump: Wed Aug 14 01:55:12 2002
> |   DUMP: Date of last level 0 dump: the epoch
> |   DUMP: Dumping /dev/rdsk/c0t0d0s3 (tarjan:/usr) to standard output.
> |   DUMP: Mapping (Pass I) [regular files]
> |   DUMP: Mapping (Pass II) [directories]
> |   DUMP: Estimated 6191034 blocks (3022.97MB) on 0.04 tapes.
> |   DUMP: Dumping (Pass III) [directories]
> |   DUMP: Dumping (Pass IV) [regular files]
> | 
> ? gzip: stdout: Broken pipe
> ? sendbackup: index tee cannot write [Broken pipe]
> |   DUMP: Broken pipe
> |   DUMP: The ENTIRE dump is aborted.
> ? index returned 1
> sendbackup: error [/usr/sbin/ufsdump returned 3, compress returned 1]
> \--------
> 
> /-- turing     /usr lev 0 STRANGE
> sendbackup: start [turing:/usr level 0]
> sendbackup: info BACKUP=/usr/local/bin/tar
> sendbackup: info RECOVER_CMD=/usr/local/bin/gzip -dc |/usr/local/bin/tar
> -f... -
> sendbackup: info COMPRESS_SUFFIX=.gz
> sendbackup: info end
> ? gtar: ./local/var/amanda/gnutar-lists/tarjan_dbfiles_1.new: Warning:
> Cannot stat: No such file or directory
> | Total bytes written: 7120732160 (6.6GB, 1.7MB/s)
> sendbackup: size 6953840
> sendbackup: end
> \--------
> 
> 
> NOTES:
>   planner: Adding new disk tarjan:/usr/oracle.
>   planner: Incremental of turing:/cs/turing/facstaff1 bumped to level 3.
>   taper: tape daily09 kb 11904800 fm 13 writing file: No space left on
> device
>   taper: retrying turing:/cs/turing/facstaff1.0 on new tape: [writing
> file: No space left on device]
>   taper: tape daily10 kb 8512896 fm 1 [OK]
> 
> 
> DUMP SUMMARY:
>                                      DUMPER STATS            TAPER STATS
> HOSTNAME     DISK        L ORIG-KB OUT-KB COMP% MMM:SS  KB/s MMM:SS 
> KB/s
> -------------------------- ---------------------------------
> ------------
> bachman      /etc        1      90     32  35.6   0:01  62.0   0:08  
> 8.2
> bachman      /var/mail   0  491100 188864  38.5   9:41 325.0  
> 2:471131.8
> suman        /usr        0 2135250 731424  34.3   9:011352.0 
> 10:541119.3
> tarjan       /dbfiles    1 16572481657248   --    8:103380.0 
> 24:141140.2
> tarjan       /dblogs     1  158624 158624   --    0:423746.0  
> 2:251095.1
> tarjan       /etc        1      90     32  35.6   0:02  21.3   0:02 
> 27.3
> tarjan       /opt        0 FAILED
> ---------------------------------------
> tarjan       /usr/oracle 0 FAILED
> ---------------------------------------
> turing       -ng/dbfiles 1      32     32   --    0:00  87.3   0:02 
> 33.0
> turing       -ing/dblogs 1      32     32   --    0:00  65.3   0:02 
> 32.9
> turing       -/facstaff1 0 155651008512864  54.7 163:19 868.7
> 124:041143.6
> turing       -ring/home1 1   15720   1312   8.3   7:57   2.8   0:08
> 163.3
> turing       -ring/home2 1 FAILED
> ---------------------------------------
> turing       /etc        1     130     32  24.6   0:01  44.0   0:13  
> 5.0
> turing       /opt        0 FAILED
> ---------------------------------------
> turing       /usr        0 69538402615648  37.6  66:14 658.2 
> 38:111141.7
> turing       /var        1   16970    992   5.8   0:30  33.1   0:02
> 444.0
> 
> (brought to you by Amanda version 2.4.2p2)
> 
> Thanks again!
> Jim
> 
> 
> 
> 
> > -- 
> > Joshua Baker-LePain
> > Department of Biomedical Engineering
> > Duke University
> > 
> 
> 



<Prev in Thread] Current Thread [Next in Thread>