Amanda-Users

Re: Problems with Overland Library and Solaris 8

2004-10-11 09:25:09
Subject: Re: Problems with Overland Library and Solaris 8
From: Joshua Baker-LePain <jlb17 AT duke DOT edu>
To: bwkstuttgart AT yahoo DOT de
Date: Mon, 11 Oct 2004 09:20:10 -0400 (EDT)
On Mon, 11 Oct 2004 at 9:26am, bwkstuttgart AT yahoo DOT de wrote

> dumpcycle 1 weeks
> runspercycle 5
> tapecycle 5 tapes
*snip*
> runtapes 2

Note that, theoretically, this is a problem.  Tapecycle should be >= 
runspercycle*runtapes.  In your setup, since you only have 117G worth of 
data, you could easily run with runtapes=1 and be OK.  But as your data 
expands, you'll need more tapes.

> It was horrible. The complete SUN-Server crashed!!
> There was no response to pings and no chance to get a console over 
> the rsc-board....

Amanda can stress a system pretty well -- there's (sometimes multiple) 
dump streams coming over the network and being written to disk, tape 
getting written to (and thus disk reads occurring), and CPU work 
compressing the indices.  As others have suggested, this smells like an 
OS/server hardware issue.  Stress test the system.  Start some tape 
writes, run bonnie on the disks, start some wgets, (all at the same time, 
of course), and see if you can't catch some system messages if/when the 
server dies.  Remember that amanda really isn't doing anything other than 
calling other standard *nix utilities, and so if amanda can cause a crash, 
you should be able to by hand as well.

> FAIL dumper B /dev/sda10 20041009 0 [data timeout]
*snip*
>   | dump: ACLs in inode #3997763 won't be dumped: Invalid argument
>   ?   DUMP: bread: lseek fails
> ... HUNDREDS OF THESE LINES ...
>   ?   DUMP: bread: lseek fails
>   |   DUMP: 100.00% done at 6154 kB/s, finished in 0:00
>   |   DUMP: 100.00% done at 6474 kB/s, finished in 0:00
>   |   DUMP: 100.00% done at 6598 kB/s, finished in 0:00
>   |   DUMP: 100.00% done at 6682 kB/s, finished in 0:00
>   |   DUMP: 100.00% done at 6774 kB/s, finished in 0:00
>   |   DUMP: 100.00% done at 6843 kB/s, finished in 0:00
>   |   DUMP: 100.00% done at 6888 kB/s, finished in 0:00
>   |   DUMP: 100.00% done at 6929 kB/s, finished in 0:00
>   | dump: ACLs in inode #4849729 won't be dumped: Invalid argument

I'm willing to say that this problem is unrelated to your amanda server 
crashing.  That looks like a sick FS to me, or a bad version of dump, or 
both.  Be sure to use the latest dump/restore (available at 
dump.sourceforge.com) or switch to tar (which won't get the ACLs, if 
you're using those, but I don't think RH9 supports them anyway...).

Good luck.  And if this post seems somewhat stream of consciousness, 
understand that my coffee hasn't kicked in yet...

-- 
Joshua Baker-LePain
Department of Biomedical Engineering
Duke University

<Prev in Thread] Current Thread [Next in Thread>