Networker

[Networker] all sessions slow at some time during backup

2012-11-25 07:50:31
Subject: [Networker] all sessions slow at some time during backup
From: jeronimo <networker-forum AT BACKUPCENTRAL DOT COM>
To: NETWORKER AT LISTSERV.TEMPLE DOT EDU
Date: Sun, 25 Nov 2012 04:50:17 -0800
I've checked the SAN and the network.
There are no errors on any links that I can find, and the links are not 
saturated either.
The backup server has an interface on the same segment than the servers, 
meaning there is no router or firewall either in-between.

SRDF is indeed mirroring the storages, but since we're only reading from it 
that wouldn't matter much. Anyway the SRDF link is not at all saturated either.

I've tested using FTP from host to server: tens of megabytes per second 
throughput.

Here are some more traces (strace -tt, tshark -ta):

strace on the client:

13:34:09.117441 read(18, "<?xml version=\"1.0\" encoding=\"UT"..., 65536) = 7608
13:34:09.117551 writev(5, 
[{"\200\0\0\0\0\0(l?\206\264:\0\0\0\0\0\0\0\2\0\5\363\330"..., 116}, 
{"\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\1\3\27X\3\0\0\0\0\0\0\0"..., 10240}], 2
here it stalls

meanwhile, wireshark on the server:

13:34:09.092376   client -> server  TCP 59561 > 8094 [ACK] Seq=1519849 Ack=1 
Win=1460 Len=7240 TSV=301505899 TSER=153465352
13:34:09.092505   client -> server  TCP 59561 > 8094 [ACK] Seq=1527089 Ack=1 
Win=1460 Len=3512 TSV=301505899 TSER=153465352
13:34:09.131955  server -> client   TCP [TCP ZeroWindow] 8094 > 59561 [ACK] 
Seq=1 Ack=1530601 Win=0 Len=0 TSV=153465362 TSER=301505899
13:34:09.348311   client -> server  TCP [TCP Keep-Alive] 59561 > 8094 [ACK] 
Seq=1530600 Ack=1 Win=1460 Len=0 TSV=301505963 TSER=153465362
13:34:09.348317  server -> client   TCP [TCP ZeroWindow] 8094 > 59561 [ACK] 
Seq=1 Ack=1530601 Win=0 Len=0 TSV=153465416 TSER=301505899
13:34:09.784111   client -> server  TCP [TCP Keep-Alive] 59561 > 8094 [ACK] 
Seq=1530600 Ack=1 Win=1460 Len=0 TSV=301506071 TSER=153465416
13:34:09.784126  server -> client   TCP [TCP ZeroWindow] 8094 > 59561 [ACK] 
Seq=1 Ack=1530601 Win=0 Len=0 TSV=153465525 TSER=301505899
13:34:10.648015   client -> server  TCP [TCP Keep-Alive] 59561 > 8094 [ACK] 
Seq=1530600 Ack=1 Win=1460 Len=0 TSV=301506287 TSER=153465525
13:34:10.648026  server -> client   TCP [TCP ZeroWindow] 8094 > 59561 [ACK] 
Seq=1 Ack=1530601 Win=0 Len=0 TSV=153465741 TSER=301505899
13:34:12.375904   client -> server  TCP [TCP Keep-Alive] 59561 > 8094 [ACK] 
Seq=1530600 Ack=1 Win=1460 Len=0 TSV=301506720 TSER=153465741
13:34:12.375919  server -> client   TCP [TCP ZeroWindow] 8094 > 59561 [ACK] 
Seq=1 Ack=1530601 Win=0 Len=0 TSV=153466172 TSER=301505899
13:34:15.835685   client -> server  TCP [TCP Keep-Alive] 59561 > 8094 [ACK] 
Seq=1530600 Ack=1 Win=1460 Len=0 TSV=301507584 TSER=153466172
13:34:15.835702  server -> client   TCP [TCP ZeroWindow] 8094 > 59561 [ACK] 
Seq=1 Ack=1530601 Win=0 Len=0 TSV=153467037 TSER=301505899
13:34:22.747271   client -> server  TCP [TCP Keep-Alive] 59561 > 8094 [ACK] 
Seq=1530600 Ack=1 Win=1460 Len=0 TSV=301509312 TSER=153467037
13:34:22.747286  server -> client   TCP [TCP ZeroWindow] 8094 > 59561 [ACK] 
Seq=1 Ack=1530601 Win=0 Len=0 TSV=153468765 TSER=301505899
...
13:34:28.249264  server -> client   TCP [TCP Window Update] 8094 > 59561 [ACK] 
Seq=1 Ack=1530601 Win=47 Len=0 TSV=153470141 TSER=301505899
13:34:28.249730   client -> server  TCP 59561 > 8094 [ACK] Seq=1530601 Ack=1 
Win=1460 Len=7240 TSV=301510689 TSER=153470141
13:34:28.249749   client -> server  TCP 59561 > 8094 [ACK] Seq=1537841 Ack=1 
Win=1460 Len=2896 TSV=301510689 TSER=153470141
13:34:28.249760   client -> server  TCP 59561 > 8094 [ACK] Seq=1540737 Ack=1 
Win=1460 Len=1448 TSV=301510689 TSER=153470141
13:34:28.254236  server -> client   TCP 8094 > 59561 [ACK] Seq=1 Ack=1542185 
Win=94 Len=0 TSV=153470142 TSER=301510689
13:34:28.254595   client -> server  TCP 59561 > 8094 [ACK] Seq=1542185 Ack=1 
Win=1460 Len=832 TSV=301510690 TSER=153470142
13:34:28.254633   client -> server  TCP 59561 > 8094 [ACK] Seq=1543017 Ack=1 
Win=1460 Len=1448 TSV=301510690 TSER=153470142
13:34:28.254658   client -> server  TCP 59561 > 8094 [ACK] Seq=1544465 Ack=1 
Win=1460 Len=2896 TSV=301510690 TSER=153470142
13:34:28.254717  server -> client   TCP 8094 > 59561 [ACK] Seq=1 Ack=1547361 
Win=94 Len=0 TSV=153470142 TSER=301510690
13:34:28.255104   client -> server  TCP 59561 > 8094 [ACK] Seq=1547361 Ack=1 
Win=1460 Len=2896 TSV=301510690 TSER=153470142
13:34:28.255112   client -> server  TCP 59561 > 8094 [ACK] Seq=1550257 Ack=1 
Win=1460 Len=1448 TSV=301510690 TSER=153470142

back on the client at that time:
13:34:28.253909 close(18)               = 0
13:34:28.254017 getuid()                = 0
13:34:28.254066 setreuid(65534, 4294967295) = 0
... next file

It *may* be a defective drive and/or tape, but we have that problem all the 
time. I will have to check if always that drive stalls. There are no errors on 
record however.
My feeling however tells me something is wrong with Networker config probably.
Maybe someone can explain more in detail what happens when there is trouble 
with indexing etc. as was noted before.

+----------------------------------------------------------------------
|This was sent by jm+backupcentral AT roth DOT lu via Backup Central.
|Forward SPAM to abuse AT backupcentral DOT com.
+----------------------------------------------------------------------

<Prev in Thread] Current Thread [Next in Thread>