Re: Amanda Runs Hanging
2009-01-15 12:09:01
Jim Summers wrote:
Hello All,
I started having my nightly backups hang. The backups have been working
fine until Thursday of last week.
The server is amanda-2.5.2p1 on rhel4 and the client that I believe is
involved is amanda-2.6.0p2 on centos5. Their are other clients and they
are amanda-2.5.2p1 on rhel4 or fc8 also.
It seems to not be completing the estimates or ever coming out of the
planner phase. It never actually dumps or does any backup. The next
day when the scheduled amcheck runs I get an email saying that the
amdump or amflush is running and I need to run amcleanup if needed.
Running amcleanup returns "Results Missing" for all of the dle's for
each of the hosts.
So I am not sure where the issue is. I have increased the etimeout from
300 to 3600 and the results seem to be the same. I also re-compiled the
client and server to only use ipv4. Still same results.
I found the following error in the amandad.xxxxx.debug file on the client:
>>>>>
1231933529.123236: amandad: dgram_send_addr(addr=0x5b524d0,
dgram=0x2af76db07d08)
1231933529.123257: amandad: (sockaddr_in *)0x5b524d0 = { 2, 889,
129.15.11.211 }
1231933529.123274: amandad: dgram_send_addr: 0x2af76db07d08->socket = 0
1231933835.099966: amandad: /usr/local/libexec/amanda/sendsize timed out
waiting for REP data
1231933835.100027: amandad: sending NAK pkt:
<<<<<
ERROR timeout on reply pipe
>>>>>
1231933835.100056: amandad: dgram_send_addr(addr=0x5b524d0,
dgram=0x2af76db07d08)
1231933835.100076: amandad: (sockaddr_in *)0x5b524d0 = { 2, 889,
129.15.11.211 }
1231933835.100093: amandad: dgram_send_addr: 0x2af76db07d08->socket = 0
1231933835.100219: amandad: security_close(handle=0x5b52490,
driver=0x2af76dafe3e0 (BSD))
1231933864.098431: amandad: pid 19716 finish time Wed Jan 14 05:51:04 2009
==================
and then at the end of the planner.xxxxxx.debug file on the tape server
I see:
planner: time 11076.021: (sockaddr_in *)0x627530 = { 2, 10080,
129.15.11.173 }
planner: time 11092.842: dgram_recv(dgram=0x617544, timeout=0,
fromaddr=0x627530)
planner: time 11092.842: (sockaddr_in *)0x627530 = { 2, 10080,
129.15.11.173 }
planner: time 21324.684: dgram_recv(dgram=0x617544, timeout=0,
fromaddr=0x627530)
planner: time 21324.705: (sockaddr_in *)0x627530 = { 2, 10080,
129.15.11.173 }
planner: time 21630.663: dgram_recv(dgram=0x617544, timeout=0,
fromaddr=0x627530)
planner: time 21630.663: (sockaddr_in *)0x627530 = { 2, 10080,
129.15.11.173 }
==================
The odd thing there is there is not a dgram_recv for the last entry into
the log. From there everything just seems to stop.
Any ideas or suggestions?
Please let me know and I can provide more debug if needed.
TIA
I found a thread that mentioned that possibly the estimate was timing out and
to try using the calcsize program for the estimate. I made that switch and
the amdump run is now running.
Thanks
--
Jim Summers
School of Computer Science-University of Oklahoma
-------------------------------------------------
|
|
|