I want to start by saying that I appreciate all the help and
suggestions y'all have given on something that's obviously not your
problem. :) Unfortunately, it looks like this problem is (1) far
more interesting than I thought and (2) might be in BackupPC itself.
On Tue, Dec 15, 2009 at 11:29:46AM -0900, Chris Robertson wrote:
> My guess would be that your firewalls are set up to close
> "inactive" TCP sessions. Try adding "-o ServerAliveInterval=60"
> to your RsyncClientCmd (so it looks something like "$sshPath -C -q
> -x -o ServerAliveInterval=60 -l root $host $rsyncPath $argList+")
> and see if that solves your problem.
Also suggested by several other people.
Well, umm, it kept the connection open. Now I've got [counts] 8
BackupPC_dump processes doing absolutely nothing.
The one I was stracing and tcpduming and so on looks fine: ssh is
sending the keepalives, and they're being responded to
appropriately. The rsync is up on the client, the dump is up on the
server, all procs are connected to each other correctly.
They're just not *doing* anything. Nothing has errored out; BackupPC
thinks everything is fine.
Some of the places this is happening are very small backups that
usually take a matter of minutes.
Suddenly this isn't looking like a networking problem anymore; the
networking appears to be just fine. This is looking like a BackupPC
problem. My version, by the way, is 3.1.0. It's like at some point
BackupPC forgot that it was supposed to be trying to back anything
up.
Is it possible the nightly jobs are configured in some weird way
that broke things?
Anyways, the one that has the problem consistently *also* always has
it in exactly the same place; I was watching it in basically every
way possible, so here comes the debugging stuff. As will become
obvious, I probably need to turn the ssh debugging down and the
BackupPC debugging up.
If, for some reason, you want raw copies of any of this, let me know
and I'll post them.
On the machine in question, we're backing up /pictures/.
/pictures/agate/shared/pictures/ has about a thousand subdirs that
each have about a thousand subdirs; the *vast* majority of the files
on the file system are in there.
The backup always stops doing anything right after lstat of the last
file in /pictures/agate/shared/pictures/0216/2987/. According to ls
-U, /pictures/agate/shared/pictures/0216/ is the last directory in
/pictures/agate/shared/pictures/, and
/pictures/agate/shared/pictures/0216/2987/ is the last directory in
/pictures/agate/shared/pictures/0216/. Now, there are other
directories in /pictures/agate/shared/ that ls -U after pictures/,
but with rsync 3.0 on the client, I'm assuming those are handled in
parallel.
So it looks like the backup is stalling right after finishing
checking all the files.
No "new" dir is made. There's a NewFileList, but it's empty.
XferLog.z is mostly useless; too many -v in my ssh call apparently,
but here's the last bit anyways (copied from less):
- -------------
browse.jpg<CE>^P^@^@w<85>@I:1^Kdisplay.jpgp<A3>^@^@y<85>@I:1^Ksegment.jpg<8D>^G^@^@t<85>@I:$^Y3187/DSCN1886_gallery.jpg<D4>^R^@^@}^V@I:2^Pfor_cropping.jpg<94>Y^@^@<88>^V@I:2^Klibrary.jpg<8A>
^@^@<8A>^V@I:2^Qmarket_banner.jpgN<C1>^@^@<8C>^V@I:2
browse.jpg~^H^@^@<80>^V@I:1^D.JPG^T<A0>^I^@<8C>^V@I:1
_badge.jpg5@^@<87>^V@I:2^Lselector.jpg'^E^@^@<84>^V@I:3^Nhop_banner.jpg<F7><88>^@^@<89>^V@I:2^Kdisplay.jpg<BF>x^@^@<82>^V@I:2^Ksegment.jpgl^D^@^@^?^V@I:%(968/food-wrapping-mr-gallery-x_badge.jpg<D5>b^@^@^\$@I:D^Kgallery.jpg^NB^@^@^Z$@I:D^Qmarket_banner.jpg<A8><D1>^@^@^^$@I:D^Klibrary.jpg<AB>^^^@^@^]$@I<BA>D^Oshop_banner.jpg<8C><CB>^@^@:D
browse.jpg&^Z^@^@@I:C^D.jpg<E3>^C^A^@^^$@I:C^L_segment.jpg<AD>
^@^@@I:D^Pfor_cropping.jpg!*^A^@^Z$@I:D^Lselector.jpg<96>^M^@^@^\$@I:D^Kdisplay.jpgR1^A^@@I:%^K539/061.JPG<DC>f^@^@d
@I:,^L_gallery.jpgr^X^@^@a @I:-^Klibrary.jpg<D7>^I^@^@c
@I:-^Pfor_cropping.jpg<F9>U^@^@` @I:-^Lselector.jpg<8C>^E^@^@b
@I:-^Qmarket_banner.jpg<D2>l^@^@d @I:-^Oshop_banner.jpg^Hp^@^@c @I:-
browse.jpg<EE>^I^@^@b @I:-^Ksegment.jpg<8D>^D^@^@a @I:-^Kdisplay.jpgF@^@^@b
@I:-^Ibadge.jpg]$^@^@c @I:$^T7026/blog_browse.jpg<D9>^O
^@^@<D7>o@I:.^Klibrary.jpg<EC>^R^@^@<D9>o@I:.^Lselector.jpg=^H^@^@<D8>o@I:.^Kgallery.jpg,&^@^@<D5>o@I:.^Qmarket_banner.jpgl
- -------------
The only relevant, and last, LOG line is: 2009-12-16 00:12:00 incr backup
started back to 2009-12-12 00:00:08 (backup #0) for directory /
Current time on the box is Wed Dec 16 10:54:14 PST 2009
(boxes agree to within a second; didn't test beyond flipping back
and forth and typing "date").
Last bit of strace of the dump proc on the SERVER: backuppc 4607 24499 0
00:12 ? 00:00:07 /usr/bin/perl /usr/local/bin/BackupPC_dump -i
ey02-s00416_pictures
- -------------
04:34:54.081334 read(7,
"nner.jpgDX\1\0mW\342I\272-\vlibrary.jpg\177\24\0\0:,\4.JPGN
\t\0kW\342I:,\22_market_banner.jpg^\253\1\0lW\342I:$\0376998/P1010001_1_shop_banner.jpg\234\274\0\0\257F\342I:3\4.JPG\25j\t\0\255F\342I:3\f_library.jpg\340\n\0\0\256F\342I:4\21market_banner.jpg\255\356\0\0\255F\342I:4\vgallery.jpg\0\31\0\0\257F\342I:4\vsegment.jpg<\5\0\0\260F\342I\2726\nlector.jpgK\6\0\0:4\vdisplay.jpg\212\214\0\0\256F\342I:4\nbrowse.jpgU\v\0\0\257F\342I:5\10adge.jpg]$\0\0\256F\342I:$%0716/skpstonespecanbowl4_selector.jpg\10\6\0\0\347F\341I:<\4.jpg\252\313\4\0\344F\341I:<\f_gallery.jpgP\27\0\0\347F\341I:=\21market_banner.jpg\360\235\0\0\345F\341I:=\17shop_banner.jpgC\230\0\0\346F\341I:=\tbadge.jpgG\"\0\0\345F\341I:=\vlibrary.jpg\346\v\0\0\346F\341I:=\nbrowse.jpg+\n\0\0\347F\341I:=\vsegment.jpg\326\4\0\0\350F\341I:=\vdisplay.jpgh\223\0\0\345F\341I:$\0319342/CIMG8427_library.jpgd\16\0\0\1[\342I:2\fselector.jpg@\6\0\0\3[\342I:2\tbadge.jpg<0\0\0\1[\342I\2722\21market_banner.jpg\350\260\0\0:2\vgallery.jpgV\37\0\0\3[\342I:1\4.JPGR\210\2\0\0[\342I:1\20_shop_banner.jpg\260\243\0\0\2[\342I:2\vdisplay.jpg\212\311\0\0\1[\342I:2\vsegment.jpg\32\5\0\0\3[\342I\2722\nbrowse.jpg\363\v\0\0:$&7348/fuchsia_on_blue_2.JPG_display.jpg\346<\1\0\351K\342I\272?\tbadge.jpgH[\0\0:?\vsegment.jpg\305\t\0\0\353K\342I:>\4.jpg\257\360\22\0\350K\342I:>\f_library.jpg:\26\0\0\352K\342I\272?\vgallery.jpgf<\0\0:?\fselector.jpgS\f\0\0\353K\342I:@\16hop_banner.jpgA|\1\0\352K\342I:?\nbrowse.jpg\227\27\0\0\353K\342I:?\21market_banner.jpg`\301\1\0\351K\342I:$$0516/IMG_7919-copy_market_banner.jpg\363\366\0\0*D\341I:7\fselector.jpg4\n\0\0,D\341I:6\4.jpg\312y\10\0)D\341I:6\f_segment.jpg\371\7\0\0,D\341I\2727\nbrowse.jpg3\24\0\0:7\17shop_banner.jpg^\343\0\0+D\341I:7\vdisplay.jpgv\201\1\0*D\341I\2727\tbadge.jpg,K\0\0:7\vgallery.jpg\0223\0\0+D\341I\2727\vlibrary.jpg\352\23\0\0:$\0379142/CIMG8401_market_banner.jpg\205\351\0\0\253X\342I:2\vgallery.jpg\352#\0\0\254X\342I\2722\nbrowse.jpgq\r\0\0\2722\17shop_banner.jpg\264\330\0\0\2722\tbadge.jpg\3677\0\0\2722\vlibrary.jpg\225\17\0\0:2\fselector.jpg\271\6\0\0\255X\342I:1\4.JPG~\333\2\0\253X\342I:1\f_display.jpg\207\371\0\0\254X\342I:2\vsegment.jpg[\5\0\0\255X\342I:$)3457/ByNanasHands_437_gal"...,
65536) = 4092
04:34:54.081334 select(8, [7], NULL, [7], NULL) = 1 (in [7])
04:35:06.849495 read(7, "\200\f\0\7"..., 65536) = 4
04:35:06.849495 select(8, [7], NULL, [7], NULL) = 1 (in [7])
04:35:06.849495 read(7,
".jpg\3306\0\0002]\342I:7\21market_banner.jpg\323<\1\0000]\342I:$\0333057/IMG_3601new_browse.jpg\216\t\0\0\244\205\341I:5\vdisplay.jpg\274S\0\0\240\205\341I:5\vsegment.jpg\346\4\0\0\246\205\341I:7\nlector.jpg\315\5\0\0\245\205\341I:4\4.jpgy\304\22\0\235\205\341I:4\22_market_banner.jpg\r\252\0\0\236\205\341I:5\vgallery.jpgk\23\0\0\243\205\341I:5\tbadge.jpg\17\33\0\0\237\205\341I:5\vlibrary.jpg'\n\0\0\241\205\341I:5\17shop_banner.jpg\331u\0\0\242\205\341I:$\0265867/banner_browse.jpg\225\f\0\0\17*\342I\2720\vsegment.jpg\330\5\0\0\2720\vdisplay.jpg\30!\0\0\2720\21market_banner.jpg}\\\0\0\2720\vgallery.jpg\357\26\0\0\2720\vlibrary.jpgC\5\0\0\272/\4.JPG\t/\0\0\272/\r_selector.jpg\2\7\0\0\2720\tbadge.jpg\233\36\0\0\2720\17shop_banner.jpg\300?\0\0:$\0262887/williamsburg3.JPG\327v\5\0.{\341I:6\f_display.jpg\323\267\0\0/{\341I\2727\tbadge.jpg\34.\0\0:7\17shop_banner.jpg\227\344\0\0000{\341I:8\negment.jpg&\6\0\0001{\341I:9\nlector.jpg|\7\0\0000{\341I\2727\nbrowse.jpg\330\f\0\0:7\vlibrary.jpg\267\16\0\0/{\341I:7\21market_banner.jpg'\f\1\0.{\341I:7\vgallery.jpg\236\36\0\0000{\341I:$\0267919/Joy_2_segment.jpg-\7\0\0\312O\342I\272/\vdisplay.jpg\266\315\0\0\272/\20for_cropping.jpgg\227\0\0\272/\tbadge.jpg\2762\0\0\2720\trowse.jpg\27\20\0\0\272.\4.JPGZ\256\2\0\272.\20_shop_banner.jpgY\270\0\0\2720\velector.jpg\360\10\0\0\272/\vlibrary.jpg\234\22\0\0\272/\vgallery.jpg.#\0\0\272/\21market_banner.jpg\332\311\0\0:$\0320316/P1010406_selector.jpg\352\5\0\0\276B\341I:2\nbrowse.jpg\346\n\0\0\275B\341I:2\vdisplay.jpgY\275\0\0\273B\341I:2\vsegment.jpg\24\5\0\0\277B\341I:3\16hop_banner.jpg\3649\1\0\274B\341I:1\4.JPG\214$\20\0\271B\341I:1\f_library.jpg~\v\0\0\274B\341I:2\tbadge.jpg\r/\0\0\272B\341I\2722\21market_banner.jpg\310\261\1\0:2\vgallery.jpgb\35\0\0\275B\341I:$\0343257/S-54b_market_banner.jpg\245\17\0\0\233\223\341I:/\vlibrary.jpg6\v\0\0\234\223\341I\272/\vgallery.jpg\325\23\0\0\272/\17shop_banner.jpg\240\17\0\0\272/\vdisplay.jpg*[\0\0\272/\vsegment.jpg\310\4\0\0\272/\nbrowse.jpg\320\t\0\0:.\4.jpg\255T\1\0\233\223\341I:.\r_selector.jpg\33\6\0\0\234\223\341I\272/\tbadge.jpg\215\33\0\0:$\0309742/Jophiel_segment.jpg\364\5\0\0\0i\342I:1\vdisplay.jpg\23V\0\0\354h\342I:1\tbadge.jpg\321\"\0\0\377h\342I:1\17shop_banner.jpg\235\\\0\0\354h\342I:2\velector.jpgm\7\0\0\0i\342I:0\4.jpg\30a\2\0\353h\342"...,
65536) = 3200
04:35:06.849495 select(8, [7], NULL, [7], NULL
- -------------
Note the select that's been sitting there for 6+ hours.
lsof, same proc, on the SERVER:
- -------------
xc88-s00071 ~ # lsof -p 4607
COMMAND PID USER FD TYPE DEVICE SIZE NODE NAME
BackupPC_ 4607 backuppc cwd DIR 8,1 624 2 /
BackupPC_ 4607 backuppc rtd DIR 8,1 624 2 /
BackupPC_ 4607 backuppc txt REG 8,1 1104232 14724
/usr/bin/perl5.8.8
BackupPC_ 4607 backuppc mem REG 8,1 115056 76305
/lib64/ld-2.6.1.so
BackupPC_ 4607 backuppc mem REG 8,1 93392 75381
/lib64/libpthread-2.6.1.so
BackupPC_ 4607 backuppc mem REG 8,1 84904 76357
/lib64/libnsl-2.6.1.so
BackupPC_ 4607 backuppc mem REG 8,1 14632 76341
/lib64/libdl-2.6.1.so
BackupPC_ 4607 backuppc mem REG 8,1 526568 75404
/lib64/libm-2.6.1.so
BackupPC_ 4607 backuppc mem REG 8,1 22728 75346
/lib64/libcrypt-2.6.1.so
BackupPC_ 4607 backuppc mem REG 8,1 10592 76264
/lib64/libutil-2.6.1.so
BackupPC_ 4607 backuppc mem REG 8,1 1305848 76332
/lib64/libc-2.6.1.so
BackupPC_ 4607 backuppc mem REG 8,1 31136 29639
/usr/lib64/perl5/5.8.8/x86_64-linux/auto/Data/Dumper/Dumper.so
BackupPC_ 4607 backuppc mem REG 8,1 10288 29623
/usr/lib64/perl5/5.8.8/x86_64-linux/auto/Cwd/Cwd.so
BackupPC_ 4607 backuppc mem REG 8,1 14288 29711
/usr/lib64/perl5/5.8.8/x86_64-linux/auto/Fcntl/Fcntl.so
BackupPC_ 4607 backuppc mem REG 8,1 22592 29891
/usr/lib64/perl5/5.8.8/x86_64-linux/auto/Socket/Socket.so
BackupPC_ 4607 backuppc mem REG 8,1 18440 29666
/usr/lib64/perl5/5.8.8/x86_64-linux/auto/Digest/MD5/MD5.so
BackupPC_ 4607 backuppc mem REG 8,1 40080 29690
/usr/lib64/perl5/5.8.8/x86_64-linux/auto/Encode/Encode.so
BackupPC_ 4607 backuppc mem REG 8,1 18592 29610
/usr/lib64/perl5/5.8.8/x86_64-linux/auto/IO/IO.so
BackupPC_ 4607 backuppc mem REG 8,1 26904 209613
/usr/lib64/perl5/vendor_perl/5.8.8/x86_64-linux/auto/List/Util/Util.so
BackupPC_ 4607 backuppc mem REG 8,1 18560 29648
/usr/lib64/perl5/5.8.8/x86_64-linux/auto/File/Glob/Glob.so
BackupPC_ 4607 backuppc mem REG 8,1 67944 209571
/usr/lib64/perl5/vendor_perl/5.8.8/x86_64-linux/auto/Compress/Raw/Zlib/Zlib.so
BackupPC_ 4607 backuppc mem REG 8,1 84368 76323
/lib64/libz.so.1.2.3
BackupPC_ 4607 backuppc mem REG 8,1 22688 211593
/usr/lib64/perl5/vendor_perl/5.8.8/x86_64-linux/auto/File/RsyncP/Digest/Digest.so
BackupPC_ 4607 backuppc mem REG 8,1 52064 211597
/usr/lib64/perl5/vendor_perl/5.8.8/x86_64-linux/auto/File/RsyncP/FileList/FileList.so
BackupPC_ 4607 backuppc mem REG 8,1 31544 76340
/lib64/libnss_compat-2.6.1.so
BackupPC_ 4607 backuppc mem REG 8,1 43496 75398
/lib64/libnss_nis-2.6.1.so
BackupPC_ 4607 backuppc mem REG 8,1 43448 75374
/lib64/libnss_files-2.6.1.so
BackupPC_ 4607 backuppc 0w CHR 1,3 607
/dev/null
BackupPC_ 4607 backuppc 1w FIFO 0,6 15292289 pipe
BackupPC_ 4607 backuppc 2w FIFO 0,6 15292289 pipe
BackupPC_ 4607 backuppc 3w REG 8,33 2726 430785
/backups/pc/ey02-s00416_pictures/LOG.122009
BackupPC_ 4607 backuppc 4w REG 8,33 44026496 11706115
/backups/pc/ey02-s00416_pictures/XferLOG.z
BackupPC_ 4607 backuppc 5w REG 8,33 0 12301398
/backups/pc/ey02-s00416_pictures/NewFileList
BackupPC_ 4607 backuppc 7u unix 0xffff88002497f5c0 15292306 socket
- -------------
Last bits of strace of the ssh/rsyc proc on the SERVER: backuppc 4610 4607 0
00:12 ? 00:00:01 /usr/bin/ssh -p 8416 -i
/engineyard/backuppc/.ssh/id_dsa -o StrictHostKeyChecking=no -o
ServerAliveInterval=60 -vvv -x -l root 65.74.174.196 /usr/bin/rsync --server
--sender --numeric-ids --perms --owner --group -D --links --times
--block-size=2048 --recursive . /
- -------------
10:54:07.576492 write(3,
"\221\221\241\"\310\217\321\26\3\302\f\255ne8.\235\17\350~\201\234\227\305\34\200D\271\323\2463K\35\355C\355\3226\340\252hm\336\310\267-J!?\273\362\301\333\335\237'H\32\361F\370\262W\373"...,
64) = 64
10:54:07.576492 select(7, [3 4], [], NULL, {60, 0}) = 1 (in [3], left {60, 0})
10:54:07.576492 read(3,
"\3377\335\202\35\5\212_\361\205R\2176\vJ>\306Q\352\370(\224\342+I\211\234\231\250<\220L"...,
8192) = 32
10:54:07.576492 select(7, [3 4], [], NULL, {60, 0}) = 0 (Timeout)
10:55:07.577250 select(7, [3 4], [3], NULL, {60, 0}) = 1 (out [3], left {60, 0})
10:55:07.577250 write(3,
"\320\212\6\230!\253\314>j\300\356\1^\306\351\217\230f\3044G\f\235y\263>\223\352i\222\262\361\34\344G\231\v\215EyBs\300$\343\325\200\23R\216\332E\264e\217\304\336\314Q]\227$BM"...,
64) = 64
10:55:07.577250 select(7, [3 4], [], NULL, {60, 0}) = 1 (in [3], left {60, 0})
10:55:07.577250 read(3,
"$1\277{\241\253\3k\351\35\225\242\30\2\2770o\335\v\35\342\333%\251\312[\262x#y9O"...,
8192) = 32
10:55:07.577250 select(7, [3 4], [], NULL, {60, 0}) = 0 (Timeout)
10:56:07.578008 select(7, [3 4], [3], NULL, {60, 0}) = 1 (out [3], left {60, 0})
10:56:07.578008 write(3,
"\350]\373a\246G\21\255\352\246\317\27\"\252\207\256\243\337\33\343\334N2\232\350\316B\316'\204\345pq!\5\3R\217mH\225\335?e\302wI\25\0327\357V\373\350\376\235\330_\222\250\270\6\371\350"...,
64) = 64
10:56:07.578008 select(7, [3 4], [], NULL, {60, 0}) = 1 (in [3], left {60, 0})
10:56:07.578008 read(3,
"~\f\325\212\323u\227\375X\226W`>\3258G\2471\311\306S;\3\37\273\23M\263q$\v\324"...,
8192) = 32
10:56:07.578008 select(7, [3 4], [], NULL, {60, 0}
- -------------
Note the rather different timestamps.
lsof, same proc, on the SERVER:
- -------------
xc88-s00071 ~ # lsof -p 4607
COMMAND PID USER FD TYPE DEVICE SIZE NODE NAME
BackupPC_ 4607 backuppc cwd DIR 8,1 624 2 /
BackupPC_ 4607 backuppc rtd DIR 8,1 624 2 /
BackupPC_ 4607 backuppc txt REG 8,1 1104232 14724
/usr/bin/perl5.8.8
BackupPC_ 4607 backuppc mem REG 8,1 115056 76305
/lib64/ld-2.6.1.so
BackupPC_ 4607 backuppc mem REG 8,1 93392 75381
/lib64/libpthread-2.6.1.so
BackupPC_ 4607 backuppc mem REG 8,1 84904 76357
/lib64/libnsl-2.6.1.so
BackupPC_ 4607 backuppc mem REG 8,1 14632 76341
/lib64/libdl-2.6.1.so
BackupPC_ 4607 backuppc mem REG 8,1 526568 75404
/lib64/libm-2.6.1.so
BackupPC_ 4607 backuppc mem REG 8,1 22728 75346
/lib64/libcrypt-2.6.1.so
BackupPC_ 4607 backuppc mem REG 8,1 10592 76264
/lib64/libutil-2.6.1.so
BackupPC_ 4607 backuppc mem REG 8,1 1305848 76332
/lib64/libc-2.6.1.so
BackupPC_ 4607 backuppc mem REG 8,1 31136 29639
/usr/lib64/perl5/5.8.8/x86_64-linux/auto/Data/Dumper/Dumper.so
BackupPC_ 4607 backuppc mem REG 8,1 10288 29623
/usr/lib64/perl5/5.8.8/x86_64-linux/auto/Cwd/Cwd.so
BackupPC_ 4607 backuppc mem REG 8,1 14288 29711
/usr/lib64/perl5/5.8.8/x86_64-linux/auto/Fcntl/Fcntl.so
BackupPC_ 4607 backuppc mem REG 8,1 22592 29891
/usr/lib64/perl5/5.8.8/x86_64-linux/auto/Socket/Socket.so
BackupPC_ 4607 backuppc mem REG 8,1 18440 29666
/usr/lib64/perl5/5.8.8/x86_64-linux/auto/Digest/MD5/MD5.so
BackupPC_ 4607 backuppc mem REG 8,1 40080 29690
/usr/lib64/perl5/5.8.8/x86_64-linux/auto/Encode/Encode.so
BackupPC_ 4607 backuppc mem REG 8,1 18592 29610
/usr/lib64/perl5/5.8.8/x86_64-linux/auto/IO/IO.so
BackupPC_ 4607 backuppc mem REG 8,1 26904 209613
/usr/lib64/perl5/vendor_perl/5.8.8/x86_64-linux/auto/List/Util/Util.so
BackupPC_ 4607 backuppc mem REG 8,1 18560 29648
/usr/lib64/perl5/5.8.8/x86_64-linux/auto/File/Glob/Glob.so
BackupPC_ 4607 backuppc mem REG 8,1 67944 209571
/usr/lib64/perl5/vendor_perl/5.8.8/x86_64-linux/auto/Compress/Raw/Zlib/Zlib.so
BackupPC_ 4607 backuppc mem REG 8,1 84368 76323
/lib64/libz.so.1.2.3
BackupPC_ 4607 backuppc mem REG 8,1 22688 211593
/usr/lib64/perl5/vendor_perl/5.8.8/x86_64-linux/auto/File/RsyncP/Digest/Digest.so
BackupPC_ 4607 backuppc mem REG 8,1 52064 211597
/usr/lib64/perl5/vendor_perl/5.8.8/x86_64-linux/auto/File/RsyncP/FileList/FileList.so
BackupPC_ 4607 backuppc mem REG 8,1 31544 76340
/lib64/libnss_compat-2.6.1.so
BackupPC_ 4607 backuppc mem REG 8,1 43496 75398
/lib64/libnss_nis-2.6.1.so
BackupPC_ 4607 backuppc mem REG 8,1 43448 75374
/lib64/libnss_files-2.6.1.so
BackupPC_ 4607 backuppc 0w CHR 1,3 607
/dev/null
BackupPC_ 4607 backuppc 1w FIFO 0,6 15292289 pipe
BackupPC_ 4607 backuppc 2w FIFO 0,6 15292289 pipe
BackupPC_ 4607 backuppc 3w REG 8,33 2726 430785
/backups/pc/ey02-s00416_pictures/LOG.122009
BackupPC_ 4607 backuppc 4w REG 8,33 44026496 11706115
/backups/pc/ey02-s00416_pictures/XferLOG.z
BackupPC_ 4607 backuppc 5w REG 8,33 0 12301398
/backups/pc/ey02-s00416_pictures/NewFileList
BackupPC_ 4607 backuppc 7u unix 0xffff88002497f5c0 15292306 socket
- -------------
rsync/ssh on the CLIENT:
root 21661 3254 0 00:12 ? 00:00:01 sshd: root@notty
root 21663 21661 30 00:12 ? 03:18:21 /usr/bin/rsync --server
--sender --numeric-ids --perms --owner --group -D --links --times
--block-size=2048 --recursive . /
last bits of strace from rsync/ssh on the CLIENT:
- -------------
04:34:54.502841
lstat("pictures/agate/shared/pictures/0216/2987/Sapphire_Pearl_Pendant_segment.jpg",
{st_mode=S_IFREG|0644, st_size=1152, ...}) = 0
04:34:54.503024
lstat("pictures/agate/shared/pictures/0216/2987/Sapphire_Pearl_Pendant.JPG",
{st_mode=S_IFREG|0644, st_size=88733, ...}) = 0
04:34:54.503208
lstat("pictures/agate/shared/pictures/0216/2987/Sapphire_Pearl_Pendant_market_banner.jpg",
{st_mode=S_IFREG|0644, st_size=21168, ...}) = 0
04:34:54.503392 getdents(3, /* 0 entries */, 4096) = 0
04:34:54.503459 close(3) = 0
04:34:54.503523 gettimeofday({1260966894, 503548}, NULL) = 0
04:34:54.503583 gettimeofday({1260966894, 503608}, NULL) = 0
04:34:54.503648 mmap(NULL, 20447232, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x2b3d5b87c000
04:35:06.795465 munmap(0x2b3d5b87c000, 20447232) = 0
04:35:06.844403 select(2, NULL, [1], [1], {60, 0}) = 1 (out [1], left {60, 0})
04:35:06.844557 write(1, "\200\f\0\7", 4) = 4
04:35:06.844633 select(2, NULL, [1], [1], {60, 0}) = 1 (out [1], left {60, 0})
04:35:06.844739 write(1,
".jpg\3306\0\0002]\342I:7\21market_banner.jpg\323<\1\0000]\342I:$\0333057/IMG_3601new_browse.jpg\216\t\0\0\244\205\341I:5\vdisplay.jpg\274S\0\0\240\205\341I:5\vsegment.jpg\346\4\0\0\246\205\341I:7\nlector.jpg\315\5\0\0\245\205\341I:4\4.jpgy\304\22\0\235\205\341I:4\22_market_banner.jpg\r\252\0\0\236\205\341I:5\vgallery.jpgk\23\0\0\243\205\341I:5\tbadge.jpg\17\33\0\0\237\205\341I:5\vlibrary.jpg\'\n\0\0\241\205\341I:5\17shop_banner.jpg\331u\0\0\242\205\341I:$\0265867/banner_browse.jpg\225\f\0\0\17*\342I\2720\vsegment.jpg\330\5\0\0\2720\vdisplay.jpg\30!\0\0\2720\21market_banner.jpg}\\\0\0\2720\vgallery.jpg\357\26\0\0\2720\vlibrary.jpgC\5\0\0\272/\4.JPG\t/\0\0\272/\r_selector.jpg\2\7\0\0\2720\tbadge.jpg\233\36\0\0\2720\17shop_banner.jpg\300?\0\0:$\0262887/williamsburg3.JPG\327v\5\0.{\341I:6\f_display.jpg\323\267\0\0/{\341I\2727\tbadge.jpg\34.\0\0:7\17shop_banner.jpg\227\344\0\0000{\341I:8\negment.jpg&\6\0\0001{\341I:9\nlector.jpg|\7\0\0000{\341I\2727\nbrowse.jpg\330\f\0\0:7\vlibrary.jpg\267\16\0\0/{\341I:7\21market_banner.jpg\'\f\1\0.{\341I:7\vgallery.jpg\236\36\0\0000{\341I:$\0267919/Joy_2_segment.jpg-\7\0\0\312O\342I\272/\vdisplay.jpg\266\315\0\0\272/\20for_cropping.jpgg\227\0\0\272/\tbadge.jpg\2762\0\0\2720\trowse.jpg\27\20\0\0\272.\4.JPGZ\256\2\0\272.\20_shop_banner.jpgY\270\0\0\2720\velector.jpg\360\10\0\0\272/\vlibrary.jpg\234\22\0\0\272/\vgallery.jpg.#\0\0\272/\21market_banner.jpg\332\311\0\0:$\0320316/P1010406_selector.jpg\352\5\0\0\276B\341I:2\nbrowse.jpg\346\n\0\0\275B\341I:2\vdisplay.jpgY\275\0\0\273B\341I:2\vsegment.jpg\24\5\0\0\277B\341I:3\16hop_banner.jpg\3649\1\0\274B\341I:1\4.JPG\214$\20\0\271B\341I:1\f_library.jpg~\v\0\0\274B\341I:2\tbadge.jpg\r/\0\0\272B\341I\2722\21market_banner.jpg\310\261\1\0:2\vgallery.jpgb\35\0\0\275B\341I:$\0343257/S-54b_market_banner.jpg\245\17\0\0\233\223\341I:/\vlibrary.jpg6\v\0\0\234\223\341I\272/\vgallery.jpg\325\23\0\0\272/\17shop_banner.jpg\240\17\0\0\272/\vdisplay.jpg*[\0\0\272/\vsegment.jpg\310\4\0\0\272/\nbrowse.jpg\320\t\0\0:.\4.jpg\255T\1\0\233\223\341I:.\r_selector.jpg\33\6\0\0\234\223\341I\272/\tbadge.jpg\215\33\0\0:$\0309742/Jophiel_segment.jpg\364\5\0\0\0i\342I:1\vdisplay.jpg\23V\0\0\354h\342I:1\tbadge.jpg\321\"\0\0\377h\342I:1\17shop_banner.jpg\235\\\0\0\354h\342I:2\velector.jpgm\7\0\0\0i\342I:0\4.jpg\30a\2\0\353h"...,
3200) = 3200
04:35:06.845208 select(1, [0], [], NULL, {60, 0}) = 0 (Timeout)
04:36:06.844621 select(1, [0], [], NULL, {60, 0}) = 0 (Timeout)
04:37:06.949968 select(1, [0], [], NULL, {60, 0}) = 0 (Timeout)
04:38:06.950773 select(1, [0], [], NULL, {60, 0}) = 0 (Timeout)
04:39:06.951859 select(1, [0], [], NULL, {60, 0}) = 0 (Timeout)
04:40:06.952896 select(1, [0], [], NULL, {60, 0}) = 0 (Timeout)
04:41:06.953942 select(1, [0], [], NULL, {60, 0}) = 0 (Timeout)
04:42:07.181208 select(1, [0], [], NULL, {60, 0}) = 0 (Timeout)
[snip]
10:53:10.572467 select(1, [0], [], NULL, {60, 0}) = 0 (Timeout)
10:54:11.041651 select(1, [0], [], NULL, {60, 0}) = 0 (Timeout)
10:55:11.282693 select(1, [0], [], NULL, {60, 0}) = 0 (Timeout)
10:56:11.523736 select(1, [0], [], NULL, {60, 0}) = 0 (Timeout)
10:57:11.524637 select(1, [0], [], NULL, {60, 0}) = 0 (Timeout)
10:58:11.525718 select(1, [0], [], NULL, {60, 0}
- -------------
lsof of same on CLIENT:
- -------------
ey02-s00416 ~ # lsof -p 21663
COMMAND PID USER FD TYPE DEVICE SIZE NODE NAME
rsync 21663 root cwd DIR 8,1 760 2 /
rsync 21663 root rtd DIR 8,1 760 2 /
rsync 21663 root txt REG 8,1 372408 519144
/usr/bin/rsync
rsync 21663 root mem REG 8,1 114952 78335
/lib64/ld-2.6.1.so
rsync 21663 root mem REG 8,1 31296 65085
/lib64/libacl.so.1.1.0
rsync 21663 root mem REG 8,1 31424 35956
/usr/lib64/libpopt.so.0.0.0
rsync 21663 root mem REG 8,1 1305744 78362
/lib64/libc-2.6.1.so
rsync 21663 root mem REG 8,1 18696 99779
/lib64/libattr.so.1.1.0
rsync 21663 root 0u unix 0xffff8800b4546c40 175163806 socket
rsync 21663 root 1u unix 0xffff8800b4546c40 175163806 socket
rsync 21663 root 2u unix 0xffff880044935740 175163808 socket
- -------------
And, finally, tcpdump on the CLIENT:
- -------------
04:34:53.525749 IP (tos 0x8, ttl 56, id 19535, offset 0, flags [DF], proto TCP
(6), length 52) 129.177.74.65.static.xc88.engineyard.com.42585 >
ey02-s00416.ey02.engineyard.com.ssh: ., cksum 0xf81e (correct), 54385:54385(0)
ack 115876872 win 501 <nop,nop,timestamp144817076 513742551>
04:34:53.525792 IP (tos 0x8, ttl 64, id 35494, offset 0, flags [DF], proto TCP
(6), length 1284) ey02-s00416.ey02.engineyard.com.ssh >
129.177.74.65.static.xc88.engineyard.com.42585: P 115878320:115879552(1232) ack
54385 win 132 <nop,nop,timestamp 513742551 144817076>
04:34:53.526944 IP (tos 0x8, ttl 56, id 19536, offset 0, flags [DF], proto TCP
(6), length 52) 129.177.74.65.static.xc88.engineyard.com.42585 >
ey02-s00416.ey02.engineyard.com.ssh: ., cksum 0xeda6 (correct), 54385:54385(0)
ack 115879552 win 501 <nop,nop,timestamp144817076 513742551>
04:34:54.074567 IP (tos 0x8, ttl 64, id 35495, offset 0, flags [DF], proto TCP
(6), length 100) ey02-s00416.ey02.engineyard.com.ssh>
129.177.74.65.static.xc88.engineyard.com.42585: P 115879552:115879600(48) ack
54385 win 132 <nop,nop,timestamp 513742688 144817076>
04:34:54.075133 IP (tos 0x8, ttl 64, id 35496, offset 0, flags [DF], proto TCP
(6), length 2948) ey02-s00416.ey02.engineyard.com.ssh >
129.177.74.65.static.xc88.engineyard.com.42585: . 115879600:115882496(2896) ack
54385 win 132 <nop,nop,timestamp 513742688 144817076>
04:34:54.076020 IP (tos 0x8, ttl 56, id 19537, offset 0, flags [DF], proto TCP
(6), length 52) 129.177.74.65.static.xc88.engineyard.com.42585 >
ey02-s00416.ey02.engineyard.com.ssh: ., cksum 0xe6bc (correct), 54385:54385(0)
ack 115881048 win 501 <nop,nop,timestamp144817213 513742688>
04:34:54.076051 IP (tos 0x8, ttl 64, id 35498, offset 0, flags [DF], proto TCP
(6), length 1284) ey02-s00416.ey02.engineyard.com.ssh >
129.177.74.65.static.xc88.engineyard.com.42585: P 115882496:115883728(1232) ack
54385 win 132 <nop,nop,timestamp 513742689 144817213>
04:34:54.077032 IP (tos 0x8, ttl 56, id 19538, offset 0, flags [DF], proto TCP
(6), length 52) 129.177.74.65.static.xc88.engineyard.com.42585 >
ey02-s00416.ey02.engineyard.com.ssh: ., cksum 0xdc44 (correct), 54385:54385(0)
ack 115883728 win 501 <nop,nop,timestamp144817213 513742688>
04:35:06.844658 IP (tos 0x8, ttl 64, id 35499, offset 0, flags [DF], proto TCP
(6), length 100) ey02-s00416.ey02.engineyard.com.ssh>
129.177.74.65.static.xc88.engineyard.com.42585: P 115883728:115883776(48) ack
54385 win 132 <nop,nop,timestamp 513745881 144817213>
04:35:06.845281 IP (tos 0x8, ttl 64, id 35500, offset 0, flags [DF], proto TCP
(6), length 2948) ey02-s00416.ey02.engineyard.com.ssh >
129.177.74.65.static.xc88.engineyard.com.42585: . 115883776:115886672(2896) ack
54385 win 132 <nop,nop,timestamp 513745881 144817213>
04:35:06.846300 IP (tos 0x8, ttl 56, id 19539, offset 0, flags [DF], proto TCP
(6), length 52) 129.177.74.65.static.xc88.engineyard.com.42585 >
ey02-s00416.ey02.engineyard.com.ssh: ., cksum 0xbd7a (correct), 54385:54385(0)
ack 115885224 win 501 <nop,nop,timestamp144820406 513745881>
04:35:06.846353 IP (tos 0x8, ttl 64, id 35502, offset 0, flags [DF], proto TCP
(6), length 404) ey02-s00416.ey02.engineyard.com.ssh>
129.177.74.65.static.xc88.engineyard.com.42585: P 115886672:115887024(352) ack
54385 win 132 <nop,nop,timestamp 513745881 144820406>
04:35:06.847088 IP (tos 0x8, ttl 56, id 19540, offset 0, flags [DF], proto TCP
(6), length 52) 129.177.74.65.static.xc88.engineyard.com.42585 >
ey02-s00416.ey02.engineyard.com.ssh: ., cksum 0xb672 (correct), 54385:54385(0)
ack 115887024 win 501 <nop,nop,timestamp144820406 513745881>
04:36:06.846945 IP (tos 0x8, ttl 56, id 19541, offset 0, flags [DF], proto TCP
(6), length 116) 129.177.74.65.static.xc88.engineyard.com.42585 >
ey02-s00416.ey02.engineyard.com.ssh: P 54385:54449(64) ack 115887024 win 501
<nop,nop,timestamp 144835406 513745881>
04:36:06.846970 IP (tos 0x8, ttl 64, id 35503, offset 0, flags [DF], proto TCP
(6), length 52) ey02-s00416.ey02.engineyard.com.ssh >
129.177.74.65.static.xc88.engineyard.com.42585: ., cksum 0x4273 (correct),
115887024:115887024(0) ack 54449 win 132 <nop,nop,timestamp 513760881 144835406>
04:36:06.847037 IP (tos 0x8, ttl 64, id 35504, offset 0, flags [DF], proto TCP
(6), length 84) ey02-s00416.ey02.engineyard.com.ssh >
129.177.74.65.static.xc88.engineyard.com.42585: P 115887024:115887056(32) ack
54449 win 132 <nop,nop,timestamp 513760881 144835406>
04:36:06.886586 IP (tos 0x8, ttl 56, id 19542, offset 0, flags [DF], proto TCP
(6), length 52) 129.177.74.65.static.xc88.engineyard.com.42585 >
ey02-s00416.ey02.engineyard.com.ssh: ., cksum 0x40d8 (correct), 54449:54449(0)
ack 115887056 win 501 <nop,nop,timestamp144835416 513760881>
04:37:06.847775 IP (tos 0x8, ttl 56, id 19543, offset 0, flags [DF], proto TCP
(6), length 116) 129.177.74.65.static.xc88.engineyard.com.42585 >
ey02-s00416.ey02.engineyard.com.ssh: P 54449:54513(64) ack 115887056 win 501
<nop,nop,timestamp 144850406 513760881>
04:37:06.847878 IP (tos 0x8, ttl 64, id 35505, offset 0, flags [DF], proto TCP
(6), length 84) ey02-s00416.ey02.engineyard.com.ssh >
129.177.74.65.static.xc88.engineyard.com.42585: P 115887056:115887088(32) ack
54513 win 132 <nop,nop,timestamp 513775881 144850406>
04:37:06.848515 IP (tos 0x8, ttl 56, id 19544, offset 0, flags [DF], proto TCP
(6), length 52) 129.177.74.65.static.xc88.engineyard.com.42585 >
ey02-s00416.ey02.engineyard.com.ssh: ., cksum 0xcb51 (correct), 54513:54513(0)
ack 115887088 win 501 <nop,nop,timestamp144850406 513775881>
[snip]
10:53:07.575258 IP (tos 0x8, ttl 56, id 20295, offset 0, flags [DF], proto TCP
(6), length 116) 129.177.74.65.static.xc88.engineyard.com.42585 >
ey02-s00416.ey02.engineyard.com.ssh: P 78513:78577(64) ack 115899088 win 501
<nop,nop,timestamp 150490516 519400964>
10:53:07.575383 IP (tos 0x8, ttl 64, id 35883, offset 0, flags [DF], proto TCP
(6), length 84) ey02-s00416.ey02.engineyard.com.ssh >
129.177.74.65.static.xc88.engineyard.com.42585: P 115899088:115899120(32) ack
78577 win 132 <nop,nop,timestamp 519415964 150490516>
10:53:07.576056 IP (tos 0x8, ttl 56, id 20296, offset 0, flags [DF], proto TCP
(6), length 52) 129.177.74.65.static.xc88.engineyard.com.42585 >
ey02-s00416.ey02.engineyard.com.ssh: ., cksum 0x1e64 (correct), 78577:78577(0)
ack 115899120 win 501 <nop,nop,timestamp 150490516 519415964>
10:54:07.576336 IP (tos 0x8, ttl 56, id 20297, offset 0, flags [DF], proto TCP
(6), length 116) 129.177.74.65.static.xc88.engineyard.com.42585 >
ey02-s00416.ey02.engineyard.com.ssh: P 78577:78641(64) ack 115899120 win 501
<nop,nop,timestamp 150505516 519415964>
10:54:07.576514 IP (tos 0x8, ttl 64, id 35884, offset 0, flags [DF], proto TCP
(6), length 84) ey02-s00416.ey02.engineyard.com.ssh >
129.177.74.65.static.xc88.engineyard.com.42585: P 115899120:115899152(32) ack
78641 win 132 <nop,nop,timestamp 519430964 150505516>
- -------------
Suggestions welcome, but it sure doesn't look like any kind of
networking problem to me. The only place I can see where there
might have been some kind of communication breakdown is between
BackupPC_dump on the server and the ssh/rsync on the server; it
looks like BackupPC_dump is waiting for more data but rsync is done
sending it.
The last thing the client rsync sent was:
04:35:06.844739 write(1,
".jpg\3306\0\0002]\342I:7\21market_banner.jpg\323<\1\0000[snip]"..., 3200) =
3200
And the last thing BackupPC_dump saw before the select that never ends was:
04:35:06.849495 read(7,
".jpg\3306\0\0002]\342I:7\21market_banner.jpg\323<\1\0000[snip]"..., 65536) =
3200
So clearly their last communication was in accord.
It just looks like BackupPC_dump has no idea that rsync is done. So
after a while, the NAT deletes the apparently "dead" connection,
except with keepalives that doesn't happen, so we just sit here
forever.
The other 7 BackupPC_dump sessions seem to be in the same state:
select stuck on FD 7.
I'm quite stumped. Any ideas?
-Robin
--
They say: "The first AIs will be built by the military as weapons."
And I'm thinking: "Does it even occur to you to try for something
other than the default outcome?" See http://shrunklink.com/cdiz
http://www.digitalkingdom.org/~rlpowell/ *** http://www.lojban.org/
------------------------------------------------------------------------------
This SF.Net email is sponsored by the Verizon Developer Community
Take advantage of Verizon's best-in-class app development support
A streamlined, 14 day to market process makes app distribution fast and easy
Join now and get one step closer to millions of Verizon customers
http://p.sf.net/sfu/verizon-dev2dev
_______________________________________________
BackupPC-users mailing list
BackupPC-users AT lists.sourceforge DOT net
List: https://lists.sourceforge.net/lists/listinfo/backuppc-users
Wiki: http://backuppc.wiki.sourceforge.net
Project: http://backuppc.sourceforge.net/
|