Amanda-Users

amanda saved my butt this week

2007-06-15 17:25:43
Subject: amanda saved my butt this week
From: Chris Hoogendyk <hoogendyk AT bio.umass DOT edu>
To: AMANDA users <amanda-users AT amanda DOT org>
Date: Fri, 15 Jun 2007 17:20:45 -0400
I just thought I would let the list in on my adventures this week.

Tue 15:00: amcheck runs out of cron and checks everything.

Tue 15:50: I get an email from amanda reporting tape failure (multiple
attempts all resulting in timeout). At first I didn't read it that way.
But, after looking at amanda debug logs, dropping back to amtape, and
then further back to just mt, and finally going to the tape library
console, I realized that I had a tape stuck in the drive (which is
sealed up back inside the library).

Tue 17:00: I call sony tech support. They walked me through some steps
to try to get the tape out -- repeatedly Eject Error. Then they had me
run a trace using the sonytape utility. It generated a 1.4MB txt file
that I sent to them. By this time it's 5:40pm. I head home. Listen to my
cell phone, keep an eye on my email, fix supper. Having faith in amanda,
I sleep.

Wed 00:45: *amanda runs backups*. No tape access. Saves all backups to
spool drives. Reports backup results.

Wed 08:30: I call sony tech support. They got the nori-trace.txt file.
They escalate to engineering.

Wed 08:50: Sony support engineer calls while I'm biking in to work. I
ask him to call me back in 15 minutes.

Wed 09:15: He calls back. Explains tensioning problem with tape reported
in trace file. Walks me through a series of procedures to get the tape
out. It works. Tape actually looks alright. I load a previous tape. It
seems to work. He points me to a firmware upgrade, and I load it using
the sonytape utility. I tell him I'll have to do some testing and could
he call me back in a couple of hours.

Wed morning: amrecover gives me an error. I'm not sure if its hardware,
something with the firmware upgrade that confused amanda, or something
I've done in the configuration since last time I used amrecover on this
machine. Drop back to mt and dd. I could read the tape label and the
first file header. Cool. It actually lists the full set of unix commands
required to read the file off the tape. So, perhaps foolishly, but
watching the clock, I decide to reload the original tape (which "looked"
alright) and see if amflush will push the data out to it. It jammed
again, and now even the extra steps the engineer gave me won't get it
out. Big mistake.

Wed afternoon: I had to take off on another project. Sony engineer
called. I gave him a status update.

Thu 00:45: *amanda runs backups*. No tape access. Saves all backups to
spool drives. Reports backup results.

Thu morning: I repeatedly try a number of things that all fail.

Thu afternoon: I call sony tech support again. Engineer calls me right
back. We have a long session with him on speaker phone as he guides me
though disassembling the library, pulling the drive out, and manually
operating some gears on the circuit board to push the tape carriage out.
Once the tape is pushed out enough, I can grab it and pull it out.
There's about 6 inches of tape hanging out where it was stuck. Put
everything back together. Load tape. Looks alright. Now I have to test
everything.

Thu 16:00: Checked my amanda-client.conf on this machine. There was one
difference I fixed. Run amrecover. Look for something whose last full
was at least 3 days ago and thus on tape. Recover it. It works. Just
since I have backups sitting on the spool drive, try an amrecover from
something that's there. That works too. Cool.

Thu 17:15: Position to next tape. Run amcheck. Nope. tapecycle is 30,
and I now only have 29 tapes. Barcode a new tape. Insert it in now empty
slot in library (bad tape will be sent back for replacement and never
reused). Run amlabel. Run amcheck. Everything is ready.

Fri 00:45: *amanda runs backups*. Flushes everything to tape. Reports
backup results.

Fri 09:15: Checking all reports. Amanda is still expecting that bad tape
next. Review procedures. Run amrmtape. Completely back in business.

-----------------------

Through that whole episode, extended by my foolishness in trying to
reload a tape that had already given me trouble, amanda never even
hiccuped. I didn't miss a single backup. AND I didn't have to do
anything about it. I didn't have to go in and ask amanda to backup
anyway, or to save stuff on spool, or to flush it when it finally had
access to a tape. Amanda just did everything it needed to do to keep my
backups running. All I had to do was get the tape library working and
replace the bad tape. Gotta love it.

-----------------------

Coincidentally, there was simultaneously a somewhat related episode
going back and forth on the bacula list. It wasn't a tape failure, it
was a DVD issue. The answers were basically: "That's not how bacula does
it"; no, you won't have backups on hard disk or spool; bacula won't back
up until the problem is resolved; if you're on vacation, you're out of luck.

I didn't think it was appropriate to comment on their list, and I
imagine that someone could figure out a way of working around these
issues with more complex configurations of bacula. But I thought it was
worth mentioning here, since this is the amanda list, and we can
appreciate what we have.



---------------

Chris Hoogendyk

-
   O__  ---- Systems Administrator
  c/ /'_ --- Biology & Geology Departments
 (*) \(*) -- 140 Morrill Science Center
~~~~~~~~~~ - University of Massachusetts, Amherst 

<hoogendyk AT bio.umass DOT edu>

--------------- 

Erdös 4



<Prev in Thread] Current Thread [Next in Thread>
  • amanda saved my butt this week, Chris Hoogendyk <=