Amanda-Users

Re: Troubleshooting a slowdown problem?

2004-06-18 11:45:17
Subject: Re: Troubleshooting a slowdown problem?
From: KEVIN ZEMBOWER <KZEMBOWE AT jhuccp DOT org>
To: amanda-users AT amanda DOT org
Date: Fri, 18 Jun 2004 11:39:03 -0400
Frank, thanks, again, for your analysis.

When you mentioned the connection speed, I remembered that I had to ask our 
network administrators to change the speed and auto-negotiation properties on 
the Cisco switch that the old centernet host was plugged into to fix the speed 
at 100baseTx-FD and turn off auto-negoiation. I've forgotten to do that for the 
new centernet host. On the admin host, it was okay:

admin:~ # mii-diag
Using the default interface 'eth0'.
Basic registers of MII PHY #1:  2100 780d 02a8 0154 05e1 0000 0000 0000.
 Basic mode control register 0x2100: Auto-negotiation disabled, with
 Speed fixed at 100 mbps, full-duplex.
 You have link beat, and everything is working OK.
 Link partner information is not exchanged when in fixed speed mode.
   End of basic transceiver information.

admin:~ # 
admin:~ # ifconfig eth0
eth0      Link encap:Ethernet  HWaddr 00:90:27:B6:FB:E7  
          inet addr:172.16.2.7  Bcast:172.16.255.255  Mask:255.255.0.0
          inet6 addr: fe80::290:27ff:feb6:fbe7/10 Scope:Link
          inet6 addr: fe80::90:27b6:fbe7/10 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:22539412 errors:0 dropped:0 overruns:0 frame:0
          TX packets:16228564 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:100 
          RX bytes:768887232 (733.2 Mb)  TX bytes:1301570470 (1241.2 Mb)
          Interrupt:21 Base address:0x8000 

admin:~ # uptime
 11:00am  up 2 days, 18:06,  1 user,  load average: 2.01, 1.97, 1.59
admin:~ # date
Fri Jun 18 11:00:23 EDT 2004
admin:~ # 
 
Even thought the admin host has only been up 2 days, there are zero collisions 
and carrier errors. When I tested the file transfer speed between centernet and 
admin, before making any changes, I also noticed that auto-negoiation was on, 
and that it was not set to full duplex:

cn2:~# mii-diag
Using the default interface 'eth0'.
Basic registers of MII PHY #1:  3000 782d 02a8 0154 05e1 4081 0003 0000.
 The autonegotiated capability is 0080.
The autonegotiated media type is 100baseTx.
 Basic mode control register 0x3000: Auto-negotiation enabled.
 You have link beat, and everything is working OK.
 Your link partner advertised 4081: 100baseTx.
   End of basic transceiver informaion.

cn2:~# 

kevinz@cn2:~/dblogs$ ncftpput -u kevinz -p xxxxxx admin ~/ 
20040610.popline..wpd 
20040610.popline.wpd:                                  988.09 MB   36.13 kB/s  
ncftpput 20040610.popline.wpd: data transfer aborted by local user.
kevinz@cn2:~/dblogs$ 

After fixing the speed to 100baseTx-FD and turning auto-negoiation off, the 
speed improved 300 times:

cn2:~# mii-diag -F 100baseTx-FD
Using the default interface 'eth0'.
Setting the speed to "fixed", Control register 2100.
Basic registers of MII PHY #1:  2100 780d 02a8 0154 05e1 4081 0001 0000.
 The autonegotiated capability is 0080.
The autonegotiated media type is 100baseTx.
 Basic mode control register 0x2100: Auto-negotiation disabled, with
 Speed fixed at 100 mbps, full-duplex.
 You have link beat, and everything is working OK.
 Your link partner advertised 4081: 100baseTx.
   End of basic transceiver informaion.

cn2:~#

cn2:~# mii-diag   
Using the default interface 'eth0'.
Basic registers of MII PHY #1:  2100 780d 02a8 0154 05e1 4081 0001 0000.
 The autonegotiated capability is 0080.
The autonegotiated media type is 100baseTx.
 Basic mode control register 0x2100: Auto-negotiation disabled, with
 Speed fixed at 100 mbps, full-duplex.
 You have link beat, and everything is working OK.
 Your link partner advertised 4081: 100baseTx.
   End of basic transceiver informaion.

kevinz@cn2:~/dblogs$ ncftpput -u kevinz -p xxxxxx admin ~/ 
20040610.popline..wpd 
20040610.popline.wpd:                                  988.09 MB   11.04 MB/s  
ncftpput 20040610.popline.wpd: data transfer aborted by local user.
kevinz@cn2:~/dblogs$ 

I should have also noticed the large number of collision and carrier errors on 
centernet:

cn2:~# ifconfig eth0
eth0      Link encap:Ethernet  HWaddr 00:B0:D0:49:55:20  
          inet addr:172.16.2.4  Bcast:172.16.255.255  Mask:255.255.0.0
          IPX/Ethernet 802.3 addr:00000958:00B0D0495520
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:43966452 errors:0 dropped:0 overruns:0 frame:0
          TX packets:50236123 errors:0 dropped:0 overruns:25 carrier:10736563
          collisions:11145079 txqueuelen:100 
          RX bytes:1206617378 (1.1 GiB)  TX bytes:4000069366 (3.7 GiB)
          Interrupt:16 Base address:0x5000 

cn2:~# uptime
 11:00:06 up 45 days, 13 min,  1 user,  load average: 0.00, 0.04, 0.13
cn2:~# date
Fri Jun 18 11:00:07 EDT 2004
cn2:~# 

I compared this with the OLD centernet host, which I still have up and has been 
up much longer than either admin or the new centernet:
OLD centernet:
centernet:~ # ifconfig eth0
eth0      Link encap:Ethernet  HWaddr 00:A0:C9:E4:5B:D5  
          inet addr:172.16.2.6  Bcast:172.16.255.255  Mask:255.255.0.0
          inet6 addr: fe80::a0:c9e4:5bd5/10 Scope:Link
          inet6 addr: fe80::2a0:c9ff:fee4:5bd5/10 Scope:Link
          IPX/Ethernet 802.3 addr:00000958:00A0C9E45BD5
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:244996240 errors:0 dropped:0 overruns:0 frame:0
          TX packets:178882777 errors:0 dropped:0 overruns:0 carrier:2
          collisions:2 txqueuelen:100 
          Interrupt:14 Base address:0x2000 

centernet:~ # uptime
 11:06am  up 228 days, 5 min,  1 user,  load average: 0.99, 0.97, 0.91
centernet:~ # date
Fri Jun 18 11:06:55 EDT 2004
centernet:~ # 

So, based on all this, I think the main problem was the transfer speed. I've 
taken these steps to correct that, although I wish I could remember what I need 
to do to set the NIC to this speed and no auto-negoiation on boot-up. I'm also 
going to increase the interface le0 speed in amanda.conf from 400 to 4000kbps, 
as the backup runs at night when the network is lightly loaded. I don't know 
why the 33G of holding disk space doesn't seem like enough or isn't getting 
used. I'll reverse the order of holding disk hd1, which is only 8G, with hd2, 
33G, so maybe it'll use the larger one first.

Thanks, again, for all your help. I'll address the problem you spotted with the 
last full dump of admin://db/f$ being overwritten in another message to the 
list, as I'm confused about this, too.

-Kevin