[Networker] all sessions slow at some time during backup
2012-11-25 07:50:31
I've checked the SAN and the network.
There are no errors on any links that I can find, and the links are not
saturated either.
The backup server has an interface on the same segment than the servers,
meaning there is no router or firewall either in-between.
SRDF is indeed mirroring the storages, but since we're only reading from it
that wouldn't matter much. Anyway the SRDF link is not at all saturated either.
I've tested using FTP from host to server: tens of megabytes per second
throughput.
Here are some more traces (strace -tt, tshark -ta):
strace on the client:
13:34:09.117441 read(18, "<?xml version=\"1.0\" encoding=\"UT"..., 65536) = 7608
13:34:09.117551 writev(5,
[{"\200\0\0\0\0\0(l?\206\264:\0\0\0\0\0\0\0\2\0\5\363\330"..., 116},
{"\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\1\3\27X\3\0\0\0\0\0\0\0"..., 10240}], 2
here it stalls
meanwhile, wireshark on the server:
13:34:09.092376 client -> server TCP 59561 > 8094 [ACK] Seq=1519849 Ack=1
Win=1460 Len=7240 TSV=301505899 TSER=153465352
13:34:09.092505 client -> server TCP 59561 > 8094 [ACK] Seq=1527089 Ack=1
Win=1460 Len=3512 TSV=301505899 TSER=153465352
13:34:09.131955 server -> client TCP [TCP ZeroWindow] 8094 > 59561 [ACK]
Seq=1 Ack=1530601 Win=0 Len=0 TSV=153465362 TSER=301505899
13:34:09.348311 client -> server TCP [TCP Keep-Alive] 59561 > 8094 [ACK]
Seq=1530600 Ack=1 Win=1460 Len=0 TSV=301505963 TSER=153465362
13:34:09.348317 server -> client TCP [TCP ZeroWindow] 8094 > 59561 [ACK]
Seq=1 Ack=1530601 Win=0 Len=0 TSV=153465416 TSER=301505899
13:34:09.784111 client -> server TCP [TCP Keep-Alive] 59561 > 8094 [ACK]
Seq=1530600 Ack=1 Win=1460 Len=0 TSV=301506071 TSER=153465416
13:34:09.784126 server -> client TCP [TCP ZeroWindow] 8094 > 59561 [ACK]
Seq=1 Ack=1530601 Win=0 Len=0 TSV=153465525 TSER=301505899
13:34:10.648015 client -> server TCP [TCP Keep-Alive] 59561 > 8094 [ACK]
Seq=1530600 Ack=1 Win=1460 Len=0 TSV=301506287 TSER=153465525
13:34:10.648026 server -> client TCP [TCP ZeroWindow] 8094 > 59561 [ACK]
Seq=1 Ack=1530601 Win=0 Len=0 TSV=153465741 TSER=301505899
13:34:12.375904 client -> server TCP [TCP Keep-Alive] 59561 > 8094 [ACK]
Seq=1530600 Ack=1 Win=1460 Len=0 TSV=301506720 TSER=153465741
13:34:12.375919 server -> client TCP [TCP ZeroWindow] 8094 > 59561 [ACK]
Seq=1 Ack=1530601 Win=0 Len=0 TSV=153466172 TSER=301505899
13:34:15.835685 client -> server TCP [TCP Keep-Alive] 59561 > 8094 [ACK]
Seq=1530600 Ack=1 Win=1460 Len=0 TSV=301507584 TSER=153466172
13:34:15.835702 server -> client TCP [TCP ZeroWindow] 8094 > 59561 [ACK]
Seq=1 Ack=1530601 Win=0 Len=0 TSV=153467037 TSER=301505899
13:34:22.747271 client -> server TCP [TCP Keep-Alive] 59561 > 8094 [ACK]
Seq=1530600 Ack=1 Win=1460 Len=0 TSV=301509312 TSER=153467037
13:34:22.747286 server -> client TCP [TCP ZeroWindow] 8094 > 59561 [ACK]
Seq=1 Ack=1530601 Win=0 Len=0 TSV=153468765 TSER=301505899
...
13:34:28.249264 server -> client TCP [TCP Window Update] 8094 > 59561 [ACK]
Seq=1 Ack=1530601 Win=47 Len=0 TSV=153470141 TSER=301505899
13:34:28.249730 client -> server TCP 59561 > 8094 [ACK] Seq=1530601 Ack=1
Win=1460 Len=7240 TSV=301510689 TSER=153470141
13:34:28.249749 client -> server TCP 59561 > 8094 [ACK] Seq=1537841 Ack=1
Win=1460 Len=2896 TSV=301510689 TSER=153470141
13:34:28.249760 client -> server TCP 59561 > 8094 [ACK] Seq=1540737 Ack=1
Win=1460 Len=1448 TSV=301510689 TSER=153470141
13:34:28.254236 server -> client TCP 8094 > 59561 [ACK] Seq=1 Ack=1542185
Win=94 Len=0 TSV=153470142 TSER=301510689
13:34:28.254595 client -> server TCP 59561 > 8094 [ACK] Seq=1542185 Ack=1
Win=1460 Len=832 TSV=301510690 TSER=153470142
13:34:28.254633 client -> server TCP 59561 > 8094 [ACK] Seq=1543017 Ack=1
Win=1460 Len=1448 TSV=301510690 TSER=153470142
13:34:28.254658 client -> server TCP 59561 > 8094 [ACK] Seq=1544465 Ack=1
Win=1460 Len=2896 TSV=301510690 TSER=153470142
13:34:28.254717 server -> client TCP 8094 > 59561 [ACK] Seq=1 Ack=1547361
Win=94 Len=0 TSV=153470142 TSER=301510690
13:34:28.255104 client -> server TCP 59561 > 8094 [ACK] Seq=1547361 Ack=1
Win=1460 Len=2896 TSV=301510690 TSER=153470142
13:34:28.255112 client -> server TCP 59561 > 8094 [ACK] Seq=1550257 Ack=1
Win=1460 Len=1448 TSV=301510690 TSER=153470142
back on the client at that time:
13:34:28.253909 close(18) = 0
13:34:28.254017 getuid() = 0
13:34:28.254066 setreuid(65534, 4294967295) = 0
... next file
It *may* be a defective drive and/or tape, but we have that problem all the
time. I will have to check if always that drive stalls. There are no errors on
record however.
My feeling however tells me something is wrong with Networker config probably.
Maybe someone can explain more in detail what happens when there is trouble
with indexing etc. as was noted before.
+----------------------------------------------------------------------
|This was sent by jm+backupcentral AT roth DOT lu via Backup Central.
|Forward SPAM to abuse AT backupcentral DOT com.
+----------------------------------------------------------------------
|
|
|