Folks: I just brought my ampr host back online after moving to a new facility. My initial configuration is to enable internet <-> amprnet connectivity. So I have all traffic in both incoming and outgoing directions routed IPIP'd via UCSD. The host is Linux Fedora based (will be transitioning to CentOS sometime...) and I used the guidelines set on the wiki. One unexpected thing is packet loss. I am seeing very slow performance using this approach and rather poor packet loss. Are there any issues or QoS policies implemented on the amprnet router at UCSD? Thank you, Assi kk7kx/4x1kx
AMPRNet host: kk7kx.ampr.org (icmp & http only for now) Internet host: phantom.kiloxray.com (hosts kk7kx)
Assi et al;
On Mon, 2015-07-20 at 08:41 -0700, Assi Friedman wrote:
I am seeing very slow performance using this approach and rather poor packet loss.
2 weeks ago I noticed a botnet sweeping 44/8 with Pings of death trying to exploit LogJam. Welcome to global internet.
Hello Assi,
Can you elaborate on "slow performance"? Slow is significantly different than packet loss and if you're seeing loss, all performance bets are off. What does traceroute show you in terms of latency, etc? Tools like mtr can be very helpful too:
https://www.digitalocean.com/community/tutorials/how-to-use-traceroute-and-m...
--David KI6ZHD
On 2015-07-20 17:41, Assi Friedman wrote:
(Please trim inclusions from previous messages) _______________________________________________ Folks: I just brought my ampr host back online after moving to a new facility. My initial configuration is to enable internet <-> amprnet connectivity. So I have all traffic in both incoming and outgoing directions routed IPIP'd via UCSD. The host is Linux Fedora based (will be transitioning to CentOS sometime...) and I used the guidelines set on the wiki. One unexpected thing is packet loss. I am seeing very slow performance using this approach and rather poor packet loss. Are there any issues or QoS policies implemented on the amprnet router at UCSD? Thank you, Assi kk7kx/4x1kx
AMPRNet host: kk7kx.ampr.org (icmp & http only for now) Internet host: phantom.kiloxray.com (hosts kk7kx)
Performance issues might be related to the tunnels reduced MTU and hence reduced TCP MSS. You want to "clamp" the TCP MSS. You may also make sure that you do not filter ICMP DTB messages, just in case some systems/softwares make of the DF bit.
Packet loss is a different issue, obviously there will be no performance when you have packet loss. So in any case you will have to fix the packet loss first.
73 de Marc, LX1DUC
Folks: I have set up Iperf on my side to do some testing in both TCP and UDP modes as follows: TCP -> port 7000 UDP -> port 7001 If you have Iperf installed on your Linux machine, do you mind doing a couple of runs and reporting on the results? Thanks, Assi kk7kx/4x1kx
On Mon, Jul 20, 2015 at 9:42 AM, Marc, LX1DUC lx1duc@laru.lu wrote:
(Please trim inclusions from previous messages) _______________________________________________ On 2015-07-20 17:41, Assi Friedman wrote:
(Please trim inclusions from previous messages) _______________________________________________ Folks: I just brought my ampr host back online after moving to a new facility. My initial configuration is to enable internet <-> amprnet connectivity. So I have all traffic in both incoming and outgoing directions routed IPIP'd via UCSD. The host is Linux Fedora based (will be transitioning to CentOS sometime...) and I used the guidelines set on the wiki. One unexpected thing is packet loss. I am seeing very slow performance using this approach and rather poor packet loss. Are there any issues or QoS policies implemented on the amprnet router at UCSD? Thank you, Assi kk7kx/4x1kx
AMPRNet host: kk7kx.ampr.org (icmp & http only for now) Internet host: phantom.kiloxray.com (hosts kk7kx)
Performance issues might be related to the tunnels reduced MTU and hence reduced TCP MSS. You want to "clamp" the TCP MSS. You may also make sure that you do not filter ICMP DTB messages, just in case some systems/softwares make of the DF bit.
Packet loss is a different issue, obviously there will be no performance when you have packet loss. So in any case you will have to fix the packet loss first.
73 de Marc, LX1DUC
44Net mailing list 44Net@hamradio.ucsd.edu http://hamradio.ucsd.edu/mailman/listinfo/44net
Oops, left out the host name: kk7kx.ampr.org TCP -> port 7000 UDP -> port 7001 Thanks, Assi kk7kx/4x1kx
On Mon, Jul 20, 2015 at 8:32 PM, Assi Friedman assi@kiloxray.com wrote:
Folks: I have set up Iperf on my side to do some testing in both TCP and UDP modes as follows: TCP -> port 7000 UDP -> port 7001 If you have Iperf installed on your Linux machine, do you mind doing a couple of runs and reporting on the results? Thanks, Assi kk7kx/4x1kx
On Mon, Jul 20, 2015 at 9:42 AM, Marc, LX1DUC lx1duc@laru.lu wrote:
(Please trim inclusions from previous messages) _______________________________________________ On 2015-07-20 17:41, Assi Friedman wrote:
(Please trim inclusions from previous messages) _______________________________________________ Folks: I just brought my ampr host back online after moving to a new facility. My initial configuration is to enable internet <-> amprnet connectivity. So I have all traffic in both incoming and outgoing directions routed IPIP'd via UCSD. The host is Linux Fedora based (will be transitioning to CentOS sometime...) and I used the guidelines set on the wiki. One unexpected thing is packet loss. I am seeing very slow performance using this approach and rather poor packet loss. Are there any issues or QoS policies implemented on the amprnet router at UCSD? Thank you, Assi kk7kx/4x1kx
AMPRNet host: kk7kx.ampr.org (icmp & http only for now) Internet host: phantom.kiloxray.com (hosts kk7kx)
Performance issues might be related to the tunnels reduced MTU and hence reduced TCP MSS. You want to "clamp" the TCP MSS. You may also make sure that you do not filter ICMP DTB messages, just in case some systems/softwares make of the DF bit.
Packet loss is a different issue, obviously there will be no performance when you have packet loss. So in any case you will have to fix the packet loss first.
73 de Marc, LX1DUC
44Net mailing list 44Net@hamradio.ucsd.edu http://hamradio.ucsd.edu/mailman/listinfo/44net
On 2015-07-20 17:41, Assi Friedman wrote:
and rather poor packet loss. Are there any issues or QoS policies implemented on the amprnet router at UCSD?
There's no QoS policy in effect, but 'amprgw' is getting hammered at the moment, with inbound packet drops peaking in the 25% range so performance is going to be horrible.
It's hard to see precisely what's happening but it looks like multiple hosts (possibly a botnet) are sweeping through the 44/8 range looking for something.
There's not much we can do about this in the short term. Long term includes a higher-performance machine with faster network interfaces. - Brian
we need a white-hat bot-net to shut down that black-hat bot-net!! ;-)
On 15-07-21 02:32 AM, Brian Kantor wrote:
(Please trim inclusions from previous messages) _______________________________________________
On 2015-07-20 17:41, Assi Friedman wrote:
and rather poor packet loss. Are there any issues or QoS policies implemented on the amprnet router at UCSD?
There's no QoS policy in effect, but 'amprgw' is getting hammered at the moment, with inbound packet drops peaking in the 25% range so performance is going to be horrible.
It's hard to see precisely what's happening but it looks like multiple hosts (possibly a botnet) are sweeping through the 44/8 range looking for something.
There's not much we can do about this in the short term. Long term includes a higher-performance machine with faster network interfaces.
- Brian
44Net mailing list 44Net@hamradio.ucsd.edu http://hamradio.ucsd.edu/mailman/listinfo/44net
On Mon, Jul 20, 2015 at 10:32 PM, Brian Kantor Brian@ucsd.edu wrote:
There's no QoS policy in effect, but 'amprgw' is getting hammered at the moment, with inbound packet drops peaking in the 25% range so performance is going to be horrible.
It's hard to see precisely what's happening but it looks like multiple hosts (possibly a botnet) are sweeping through the 44/8 range looking for something.
There's not much we can do about this in the short term. Long term includes a higher-performance machine with faster network interfaces. - Brian
In the short term, why not blackhole our unused IP space? If they're sweeping, this should significantly cut the inbound traffic.
In the long term, if we mitigate attacks like this, it will make CAIDA's research much less interesting. If that is a problem for them, maybe they can put some of that research grant money towards an upgraded amprgw.
Another long term solution is moving to a system of regional AMPR gateways. (I believe this has already been discussed.) This would divide 44/8 between enough routers that the aggregate inbound traffic capacity would be much higher.
Tom KD7LXL
On Tue, Jul 21, 2015 at 11:00 AM, Tom Hayward esarfl@gmail.com wrote:
(Please trim inclusions from previous messages)
Another long term solution is moving to a system of regional AMPR gateways. (I believe this has already been discussed.) This would divide 44/8 between enough routers that the aggregate inbound traffic capacity would be much higher.
Wouldn't that also provide for fail-over redundancy? If the region or home gateway went down - the other would keep traffic flowing.
Bill
On Tue, Jul 21, 2015 at 11:13:53AM -0700, Bill Vodall wrote:
On Tue, Jul 21, 2015 at 11:00 AM, Tom Hayward esarfl@gmail.com wrote:
Another long term solution is moving to a system of regional AMPR gateways. (I believe this has already been discussed.) This would divide 44/8 between enough routers that the aggregate inbound traffic capacity would be much higher.
Wouldn't that also provide for fail-over redundancy? If the region or home gateway went down - the other would keep traffic flowing.
I believe Tom is speaking of dividing up the address space among several routers, which would require some tricky route advertisement to provide any kind of redundancy. - Brian
On Tue, Jul 21, 2015 at 11:18 AM, Brian Kantor Brian@ucsd.edu wrote:
Wouldn't that also provide for fail-over redundancy? If the region or home gateway went down - the other would keep traffic flowing.
I believe Tom is speaking of dividing up the address space among several routers, which would require some tricky route advertisement to provide any kind of redundancy. - Brian
Not really tricky... Just advertise the same address space from multiple locations. The users of that address space would then need to set up tunnels to ALL of the locations the address space is advertised from. You could set it up so that the address space was only advertised when the tunnel was operational.
For example, HamWAN advertises 44.24.240/20 from a handful of locations. Each of those locations have tunnels to each other, so no matter what location the inbound traffic comes in on, it is tunneled to the appropriate internal address. If one of these edge routers goes down, the address space stops getting advertised and inbound traffic starts going to the other locations instead. This is a pretty vanilla use of BGP.
Tom KD7LXL
On 7/21/15 1:32 AM, Brian Kantor wrote:
There's not much we can do about this in the short term. Long term includes a higher-performance machine with faster network interfaces.
What is the configuration of the UCSD gateway?
I've asked this before and never received an answer, other than it uses 10g ports on some "ancient" version of FreeBSD, but the switch/router it's unlinked to is a 1g only switch. From what I know the gateway software does not do anything in kernel mode, but again the software is not public so we can't work to improve it, or study it.
Why is it so hard to get details on how this stuff works?
I'm not saying we need to have this level of detail, https://wikitech.wikimedia.org/wiki/Main_Page but it would be awesome to have some details on it.
Get me a listing of the physical interconnects and a source tarball and I'll happily draw/write/format it up and add it on the wiki.
On Tue, Jul 21, 2015 at 02:26:36PM -0400, Bryan Fields wrote:
What is the configuration of the UCSD gateway?
I answered that in a previous email earlier today. Again: It's a dual-core 3.2 Ghz Xeon processor with two 1 GbE ports. The port 'em0' is connected to a 1G switch which is in turn connected at 10GbE to the building infrastructure switch/router. Port 'em1' is output-only to the network 'telescope'. The system never swaps or pages.
It does all the packet filtering, selection, and diversion using kernel-mode 'ipfw'. The very few packets which are destined for legitimate AMPR hosts are forwarded and encapsulated by a user-mode program. That program consumes almost no resource because there are so few packets headed to or from legitimate AMPR hosts and that's all it's given to handle.
Statistics and experiments show that the bottleneck is the IP input routines processing the ipfw rules. Since this is single-threaded inside the kernel, more cores over the effective 4 we have now will probably not help. As you can see from the snapshot below, the task queue for the input interface is full and that is where the packets are being dropped.
/0% /10 /20 /30 /40 /50 /60 /70 /80 /90 /100 root em0 taskq XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX root idle: cpu2 XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX root idle: cpu3 XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX root idle: cpu1 XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX root idle: cpu0 X root em1 taskq X root ipipd X
The relevant ipfw rules are these (table 1 contains the list of legitimate hosts derived by ANDing the gateways with the DNS. Socket 4444 is the ipip daemon's input; 192.168.44.252 is the network telescope.)
# known addresses go to the encapsulating router socket: ipipd ipfw add divert 4444 ip from not 10.0.0.0/8,172.16.0.0/12,169.254.0.0/16,192.168.0.0/16 to 'table(1)' in not dst-port 135-139,445,1025-1028 # other 44 addresses go next door for analysis ipfw add forward 192.168.44.252 all from any to 44.0.0.0/8
Turning off the filtering/diversion ('ipfw disable firewall') almost immediately ends the congestion with the em0 taskq sitting below 50% and packets no longer get dropped. Turning it back on resumes the problem. Of course, when it's off, no ipip is processed. - Brian
Would breaking the rule out into separate smaller rules help improve the bottleneck?
It looks like ipfw uses a 'first rule wins', so perhaps re-ordering could help. Filter all the bogon and RFC1918 IPs out first, then filter out the netbios traffic and anything else that globally shouldn't be allowed, with the divert rule being simplified and left to the end.
On 7/21/15 3:51 PM, Brian Kantor wrote:
(Please trim inclusions from previous messages) _______________________________________________ On Tue, Jul 21, 2015 at 02:26:36PM -0400, Bryan Fields wrote:
What is the configuration of the UCSD gateway?
I answered that in a previous email earlier today. Again: It's a dual-core 3.2 Ghz Xeon processor with two 1 GbE ports. The port 'em0' is connected to a 1G switch which is in turn connected at 10GbE to the building infrastructure switch/router. Port 'em1' is output-only to the network 'telescope'. The system never swaps or pages.
It's a partial answer. What version of FBSD is it running, what's the ram, what south bridge, what chip set of the nic's? All these things matter immensely in a software router.
It does all the packet filtering, selection, and diversion using kernel-mode 'ipfw'. The very few packets which are destined for legitimate AMPR hosts are forwarded and encapsulated by a user-mode program. That program consumes almost no resource because there are so few packets headed to or from legitimate AMPR hosts and that's all it's given to handle.
Cool, where is the source of the gateway program? If it's not open source, why not?
It would be cool to have some netflow or real time metering of the legit AMPRnet traffic over the gateway. ala an AMPRnet dashboard
Statistics and experiments show that the bottleneck is the IP input routines processing the ipfw rules. Since this is single-threaded inside the kernel, more cores over the effective 4 we have now will probably not help. As you can see from the snapshot below, the task queue for the input interface is full and that is where the packets are being dropped.
/0% /10 /20 /30 /40 /50 /60 /70 /80 /90 /100root em0 taskq XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX root idle: cpu2 XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX root idle: cpu3 XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX root idle: cpu1 XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX root idle: cpu0 X root em1 taskq X root ipipd X
<snip>
Turning off the filtering/diversion ('ipfw disable firewall') almost immediately ends the congestion with the em0 taskq sitting below 50% and packets no longer get dropped. Turning it back on resumes the problem. Of course, when it's off, no ipip is processed.
So this is interesting, FBSD had some issues with single threaded in older releases, which is why knowing the release running would help. In the newer releases the netperf team has really improved the ipfw performance via parallelization. My first hand experience on FBSD is a bit lacking, it's been a few years since I've touched a FBSD box other than as an Olive.
TBQH, Linux has a better networking stack in terms of performance now. I've had most of my experience with the newer Linux kernels and it's been able to handle 4xQSFP in and out (160g FD) in the internal testing I've seen at work. The iptables filter scales nicely using SMP too.
I'm not sure about DPDK on FBSD either, but linux is able to make use of it for packet filtering now.
73's
Well, I found and fixed the cause of the packet loss problem.
It turns out that the single-threaded nature of IP processing in the FreeBSD kernel means that when ipfw was told to forward the packet to the network telescope, the process blocked for a significant period of time while the outgoing packet was rewritten and enqueued. This caused the inbound work queue to lengthen to the point where incoming packets were ignored and dropped, which played Hob with the throughput for udp and tcp connections.
With the cooperation of the CAIDA people, we stopped forwarding packets to the telescope and will instead feed it off a network switch mirror port. They will filter out our legitimate subnets leaving the IBR that they want to study.
That this was the cause is shown by now lossless pinging of various end destinations on AMPRNet. For example,
--- kk7kx.ampr.org ping statistics --- 100 packets transmitted, 100 received, 0% packet loss, time 99120ms rtt min/avg/max/mdev = 26.457/30.128/189.503/17.459 ms
And is also shown by the reduced task queue on the input interface
/0% /10 /20 /30 /40 /50 /60 /70 /80 /90 /100 root idle: cpu3 XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX root idle: cpu2 XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX root idle: cpu1 XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX root idle: cpu0 XXXXXXXXXXXXXXXXXXXXXXXXX root em0 taskq XXXXXXXXXXXXXXXXXXXXX root ipipd X
At the moment, input packet rates are running around 12 MB/s, which the amprgw seems to be handling easily. It's possible that the remaining ipfw rules could be optimized somewhat to reduce the input queue even further, but I think I'll call it a day for now. - Brian
Thanks for the feedbsck, and great job!
Pardon my brevity, as I'm on a smartphone. Sent through via axMail-fax by N1URO.
Sent with AquaMail for Android http://www.aqua-mail.com
On July 21, 2015 11:15:38 PM Brian Kantor Brian@UCSD.Edu wrote:
(Please trim inclusions from previous messages) _______________________________________________ Well, I found and fixed the cause of the packet loss problem.
It turns out that the single-threaded nature of IP processing in the FreeBSD kernel means that when ipfw was told to forward the packet to the network telescope, the process blocked for a significant period of time while the outgoing packet was rewritten and enqueued. This caused the inbound work queue to lengthen to the point where incoming packets were ignored and dropped, which played Hob with the throughput for udp and tcp connections.
With the cooperation of the CAIDA people, we stopped forwarding packets to the telescope and will instead feed it off a network switch mirror port. They will filter out our legitimate subnets leaving the IBR that they want to study.
That this was the cause is shown by now lossless pinging of various end destinations on AMPRNet. For example,
--- kk7kx.ampr.org ping statistics --- 100 packets transmitted, 100 received, 0% packet loss, time 99120ms rtt min/avg/max/mdev = 26.457/30.128/189.503/17.459 ms
And is also shown by the reduced task queue on the input interface
/0% /10 /20 /30 /40 /50 /60 /70 /80 /90 /100root idle: cpu3 XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX root idle: cpu2 XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX root idle: cpu1 XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX root idle: cpu0 XXXXXXXXXXXXXXXXXXXXXXXXX root em0 taskq XXXXXXXXXXXXXXXXXXXXX root ipipd X
At the moment, input packet rates are running around 12 MB/s, which the amprgw seems to be handling easily. It's possible that the remaining ipfw rules could be optimized somewhat to reduce the input queue even further, but I think I'll call it a day for now.
- Brian
44Net mailing list 44Net@hamradio.ucsd.edu http://hamradio.ucsd.edu/mailman/listinfo/44net
Wow, what a difference. Here's iperf results from home to kk7kx.ampr.org:
[root@raptor phantom]# iperf -c kk7kx.ampr.org -p 7000 -i 1 ------------------------------------------------------------ Client connecting to kk7kx.ampr.org, TCP port 7000 TCP window size: 19.1 KByte (default) ------------------------------------------------------------ [ 3] local 192.168.1.12 port 51468 connected with 44.8.0.160 port 7000 [ ID] Interval Transfer Bandwidth [ 3] 0.0- 1.0 sec 1.12 MBytes 9.44 Mbits/sec [ 3] 1.0- 2.0 sec 1.00 MBytes 8.39 Mbits/sec [ 3] 2.0- 3.0 sec 1.12 MBytes 9.44 Mbits/sec [ 3] 3.0- 4.0 sec 896 KBytes 7.34 Mbits/sec [ 3] 4.0- 5.0 sec 896 KBytes 7.34 Mbits/sec [ 3] 5.0- 6.0 sec 768 KBytes 6.29 Mbits/sec [ 3] 6.0- 7.0 sec 512 KBytes 4.19 Mbits/sec [ 3] 7.0- 8.0 sec 640 KBytes 5.24 Mbits/sec [ 3] 8.0- 9.0 sec 640 KBytes 5.24 Mbits/sec [ 3] 9.0-10.0 sec 512 KBytes 4.19 Mbits/sec [ 3] 0.0-10.2 sec 8.12 MBytes 6.70 Mbits/sec
[root@raptor phantom]# iperf -c kk7kx.ampr.org -p 7001 -i 1 -u ------------------------------------------------------------ Client connecting to kk7kx.ampr.org, UDP port 7001 Sending 1470 byte datagrams UDP buffer size: 122 KByte (default) ------------------------------------------------------------ [ 3] local 192.168.1.12 port 53954 connected with 44.8.0.160 port 7001 [ ID] Interval Transfer Bandwidth [ 3] 0.0- 1.0 sec 129 KBytes 1.06 Mbits/sec [ 3] 1.0- 2.0 sec 128 KBytes 1.05 Mbits/sec [ 3] 2.0- 3.0 sec 128 KBytes 1.05 Mbits/sec [ 3] 3.0- 4.0 sec 128 KBytes 1.05 Mbits/sec [ 3] 4.0- 5.0 sec 128 KBytes 1.05 Mbits/sec [ 3] 5.0- 6.0 sec 128 KBytes 1.05 Mbits/sec [ 3] 6.0- 7.0 sec 129 KBytes 1.06 Mbits/sec [ 3] 7.0- 8.0 sec 128 KBytes 1.05 Mbits/sec [ 3] 8.0- 9.0 sec 128 KBytes 1.05 Mbits/sec [ 3] 9.0-10.0 sec 128 KBytes 1.05 Mbits/sec [ 3] 0.0-10.0 sec 1.25 MBytes 1.05 Mbits/sec [ 3] Sent 893 datagrams [ 3] Server Report: [ 3] 0.0-10.0 sec 1.25 MBytes 1.05 Mbits/sec 1.073 ms 0/ 893 (0%)
Note the 0% packet loss in UDP mode. MTR results are also consistent with this.
Thanks Brian! Assi
On Tue, Jul 21, 2015 at 8:15 PM, Brian Kantor Brian@ucsd.edu wrote:
(Please trim inclusions from previous messages) _______________________________________________ Well, I found and fixed the cause of the packet loss problem.
It turns out that the single-threaded nature of IP processing in the FreeBSD kernel means that when ipfw was told to forward the packet to the network telescope, the process blocked for a significant period of time while the outgoing packet was rewritten and enqueued. This caused the inbound work queue to lengthen to the point where incoming packets were ignored and dropped, which played Hob with the throughput for udp and tcp connections.
With the cooperation of the CAIDA people, we stopped forwarding packets to the telescope and will instead feed it off a network switch mirror port. They will filter out our legitimate subnets leaving the IBR that they want to study.
That this was the cause is shown by now lossless pinging of various end destinations on AMPRNet. For example,
--- kk7kx.ampr.org ping statistics --- 100 packets transmitted, 100 received, 0% packet loss, time 99120ms rtt min/avg/max/mdev = 26.457/30.128/189.503/17.459 ms
And is also shown by the reduced task queue on the input interface
/0% /10 /20 /30 /40 /50 /60 /70 /80 /90 /100root idle: cpu3 XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX root idle: cpu2 XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX root idle: cpu1 XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX root idle: cpu0 XXXXXXXXXXXXXXXXXXXXXXXXX root em0 taskq XXXXXXXXXXXXXXXXXXXXX root ipipd X
At the moment, input packet rates are running around 12 MB/s, which the amprgw seems to be handling easily. It's possible that the remaining ipfw rules could be optimized somewhat to reduce the input queue even further, but I think I'll call it a day for now. - Brian
44Net mailing list 44Net@hamradio.ucsd.edu http://hamradio.ucsd.edu/mailman/listinfo/44net
Right on! Excellent work Brian, and a learning experience too...thanks for sharing progress :-) less watts consumed due to your work also!
Hello Brian,
Thanks for working on this issue. Few questions:
1. Was this "telescope" experiment a recent change to the system or has this been there for a long time?
2. Is there a specific reason why you're using FreeBSD vs. Linux? I would assume that linux's iptables is threaded and could perform better but I don't know for sure.
3. I liked Tom Hayward's idea to automatically filter netblocks that aren't activated in the portal / DNS. That seems like a very cheap way to knock out known bogus traffic. Ideally this would be done at the farthest edge of the network to prevent the traffic from ever even reaching the Dell server.
--David KI6ZHD
On 07/21/2015 08:15 PM, Brian Kantor wrote:
(Please trim inclusions from previous messages) _______________________________________________ Well, I found and fixed the cause of the packet loss problem.
It turns out that the single-threaded nature of IP processing in the FreeBSD kernel means that when ipfw was told to forward the packet to the network telescope, the process blocked for a significant period of time while the outgoing packet was rewritten and enqueued. This caused the inbound work queue to lengthen to the point where incoming packets were ignored and dropped, which played Hob with the throughput for udp and tcp connections.
With the cooperation of the CAIDA people, we stopped forwarding packets to the telescope and will instead feed it off a network switch mirror port. They will filter out our legitimate subnets leaving the IBR that they want to study.
That this was the cause is shown by now lossless pinging of various end destinations on AMPRNet. For example,
--- kk7kx.ampr.org ping statistics --- 100 packets transmitted, 100 received, 0% packet loss, time 99120ms rtt min/avg/max/mdev = 26.457/30.128/189.503/17.459 ms
And is also shown by the reduced task queue on the input interface
/0% /10 /20 /30 /40 /50 /60 /70 /80 /90 /100root idle: cpu3 XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX root idle: cpu2 XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX root idle: cpu1 XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX root idle: cpu0 XXXXXXXXXXXXXXXXXXXXXXXXX root em0 taskq XXXXXXXXXXXXXXXXXXXXX root ipipd X
At the moment, input packet rates are running around 12 MB/s, which the amprgw seems to be handling easily. It's possible that the remaining ipfw rules could be optimized somewhat to reduce the input queue even further, but I think I'll call it a day for now.
- Brian
44Net mailing list 44Net@hamradio.ucsd.edu http://hamradio.ucsd.edu/mailman/listinfo/44net
On Wed, Jul 22, 2015 at 08:37:25AM -0700, David Ranch wrote:
- Was this "telescope" experiment a recent change to the system or
has this been there for a long time?
That configuration was there for many years; it's only in the face of the tremendous number of scans we're seeing that it became a problem. It's a non-linear thing: as long as the amount of incoming crud stays below a threshold the system doesn't lose packets, and when the crud exceeds that threshold the system suffers congestive collapse.
- Is there a specific reason why you're using FreeBSD vs. Linux?
I would assume that linux's iptables is threaded and could perform better but I don't know for sure.
I don't know either. The existing system was designed when Linux was still a toy and so it wasn't a consideration. I don't know if Linux would be superior in this precise environment; I know that in tests I've made, Linux has shown poorer network performance than FreeBSD.
And historically, the UC system invented BSD and as a result I know it much better than I know Linux. Perhaps someone with enough time on their hands could implement this configuration on both systems and make a definitive comparison.
- I liked Tom Hayward's idea to automatically filter netblocks
that aren't activated in the portal / DNS. That seems like a very cheap way to knock out known bogus traffic. Ideally this would be done at the farthest edge of the network to prevent the traffic from ever even reaching the Dell server.
It's a good idea but unfortunately impractical; to do so requires administrative access to the campus border router that we don't have. - Brian
On 7/22/15 11:17 AM, Brian Kantor wrote:
- Is there a specific reason why you're using FreeBSD vs. Linux?
I would assume that linux's iptables is threaded and could perform better but I don't know for sure.
I don't know either. The existing system was designed when Linux was still a toy and so it wasn't a consideration. I don't know if Linux would be superior in this precise environment; I know that in tests I've made, Linux has shown poorer network performance than FreeBSD.
pf and ipfw on FreeBSD are true stateful firewalls, where no Linux firewall that I'm aware of is truly stateful. iptables treats each packet individually where pf/ipfw will add it as a flow and track bi-directional traffic for the duration of the connection. This is why pf / ipfw are not threaded, however they do automatically optimize rule sets when you load them to be as efficient as possible.
- I liked Tom Hayward's idea to automatically filter netblocks
that aren't activated in the portal / DNS. That seems like a very cheap way to knock out known bogus traffic. Ideally this would be done at the farthest edge of the network to prevent the traffic from ever even reaching the Dell server.
It's a good idea but unfortunately impractical; to do so requires administrative access to the campus border router that we don't have.
Filtering at a router is a sure fire way to bring throughput to a crawl. Proper campus routers are designed with ASICs optimized for routing in hardware, and fire-walling is done in software. I have seen enterprise small office routers handle 450~500mbps of straight routing but max out around 40mbps when fire-walling because it's CPU bound. The results are similar when stepping up to large chassis routers.
A better option in my opinion is splitting up tunnelling and fire-walling onto separate machines. This would allow which ever system can handle fire-walling or tunnelling best to be configured for each task and would increase throughput capacity. Of course this does require more rack-space, power, cooling, another system to configure, and someone with the time and energy to set it up.
On Wed, Jul 22, 2015 at 12:05 PM, Will Gwin N5KH@n5kh.org wrote:
- I liked Tom Hayward's idea to automatically filter netblocks
that aren't activated in the portal / DNS. That seems like a very cheap way to knock out known bogus traffic. Ideally this would be done at the farthest edge of the network to prevent the traffic from ever even reaching the Dell server.
It's a good idea but unfortunately impractical; to do so requires administrative access to the campus border router that we don't have.
Filtering at a router is a sure fire way to bring throughput to a crawl. Proper campus routers are designed with ASICs optimized for routing in hardware, and fire-walling is done in software. I have seen enterprise small office routers handle 450~500mbps of straight routing but max out around 40mbps when fire-walling because it's CPU bound. The results are similar when stepping up to large chassis routers.
Recall that the original suggestion was to null route unused subnets. This is a routing operation, not a filtering operation. The ASICs should handle it fine.
Better logic would be to use an IGP to only advertise valid subnets. This way traffic without a destination would be dropped at UCSD's edge (or wherever the IGP reached). Brian mentioned that administrative access to the campus border router would be required--this isn't completely true. To be effective, the IGP would only have to reach beyond the bottleneck (in this case, put it right in front of amprgw instead of all the way back at the border). If you request traffic for 44/8, you're going to get all of it. If you only request traffic for a few subnets, that'll be a lot less data to send through your filter rules.
Tom KD7LXL
On 22 Jul 2015, at 9:05 PM, Will Gwin N5KH@n5kh.org wrote:
On 7/22/15 11:17 AM, Brian Kantor wrote:
- Is there a specific reason why you're using FreeBSD vs. Linux?
I would assume that linux's iptables is threaded and could perform better but I don't know for sure.
I don't know either. The existing system was designed when Linux was still a toy and so it wasn't a consideration. I don't know if Linux would be superior in this precise environment; I know that in tests I've made, Linux has shown poorer network performance than FreeBSD.
pf and ipfw on FreeBSD are true stateful firewalls, where no Linux firewall that I'm aware of is truly stateful. iptables treats each packet individually where pf/ipfw will add it as a flow and track bi-directional traffic for the duration of the connection. This is why pf / ipfw are not threaded, however they do automatically optimize rule sets when you load them to be as efficient as possible.
iptables (Netfilter) had stateful connection tracking from day one (AFAIK since ~14 years ago).
Both Linux and BSD network stacks are very mature in both features and performance. Differences exist (I believe FreeBSD is more efficient per packet) but won't matter for AMPRNet (ever) since the kernels are now being tuned to keep up with the fastest NICs on the market (that is, 100Gb/s, which is probably faster than all AMPRNet tunnels combined for the foreseeable future). At work, my colleague runs the 10Gb/s-capable firewall for the whole division with IPFW (no issues, so no need to re-write the policy for PF):
http://stats.meraka.csir.co.za/cacti/graph_view.php?action=tree&tree_id=...
Filtering at a router is a sure fire way to bring throughput to a crawl. Proper campus routers are designed with ASICs optimized for routing in hardware, and fire-walling is done in software. I have seen enterprise small office routers handle 450~500mbps of straight routing but max out around 40mbps when fire-walling because it's CPU bound. The results are similar when stepping up to large chassis routers.
Does the gateway need stateful filtering?
If not, this can be done at line-rate at the router with ACLs.
I would be curious to know if the current gateway is configured to track connection states, and if so, how many concurrent connections it peaks at?
On Wed, Jul 22, 2015 at 11:32:13PM +0200, Simeon Miteff wrote:
I would be curious to know if the current gateway is configured to track connection states, and if so, how many concurrent connections it peaks at?
It is not, it's a simple packet forwarder with no statistics gathering instrumentation. - Brian
Brian et al;
It is not, it's a simple packet forwarder with no statistics gathering instrumentation.
You could monitor it with a simple stats tool such as MRTG or if you want deeper stats look into Ganglia.
Hello Will,
pf and ipfw on FreeBSD are true stateful firewalls, where no Linux firewall that I'm aware of is truly stateful. iptables treats each packet individually
Linux's ipfwadm and ipchains were stateless "packet filters" but iptables has been fully stateful for many many years. We are now at the cusp of nftables on Linux which makes things even more programmable though I don't know about the performance.
where pf/ipfw will add it as a flow and track bi-directional traffic for the duration of the connection. This is why pf / ipfw are not threaded, however they do automatically optimize rule sets when you load them to be as efficient as possible.
There are many ways to thread stateful firewalls. A common way is to dedicate the 5-tuple pair to it's own thread via a hash or other methods. This is how commercial vendors have been doing it and getting very high throughput with network processors (just a sea of simple cores).
Filtering at a router is a sure fire way to bring throughput to a crawl. Proper campus routers are designed with ASICs optimized for routing in hardware, and fire-walling is done in software.
Modern ASIC based firewalls can handle 100,000s of stateless filters on a per interface basis. I don't think we would need anything stateful here as blocking out all non-registered AMPR addresses would probably knock out 90% of the DDOS / Scanning traffic being seen.
I have seen enterprise small office routers handle 450~500mbps of straight routing but max out around 40mbps when fire-walling because it's CPU bound. The results are similar when stepping up to large chassis routers.
It depends on the class of devices you're buying. There are many inexpensive Enterprise grade firewalls (always stateful) that can run many 100s of Megabit and a few thousand dollars will get you into the 10G+ range.
A better option in my opinion is splitting up tunnelling and fire-walling onto separate machines. This would allow which ever system can handle fire-walling or tunnelling best to be configured for each task and would increase throughput capacity. Of course this does require more rack-space, power, cooling, another system to configure, and someone with the time and energy to set it up.
Yeah.. but we don't need that throughput or scale. Just statelessly filtering at the border edge with a modern router would solve much of these issues.
--David
On 7/22/15 10:27 PM, David Ranch wrote:
Linux's ipfwadm and ipchains were stateless "packet filters" but iptables has been fully stateful for many many years. We are now at the cusp of nftables on Linux which makes things even more programmable though I don't know about the performance.
My mistake. It's been years since I was actively comparing the different fire-walling methods in use by Linux. I went hardware for years and only within the last few years went to software, when I went OpenBSD due to native IPsec support as well as pf.
Filtering at a router is a sure fire way to bring throughput to a crawl. Proper campus routers are designed with ASICs optimized for routing in hardware, and fire-walling is done in software.
Modern ASIC based firewalls can handle 100,000s of stateless filters on a per interface basis.
Note I said 'router', not 'firewall'. Routers are designed from the silicon up to forward packets, reduce broadcast domains and connect networks. Firewalls are designed from the silicon up to restrict the flow of packets. Yes firewalls will forward packets from one network to another, but their primary purpose is inspection and restriction.
I have seen enterprise small office routers handle 450~500mbps of straight routing but max out around 40mbps when fire-walling because it's CPU bound. The results are similar when stepping up to large chassis routers.
It depends on the class of devices you're buying. There are many inexpensive Enterprise grade firewalls (always stateful) that can run many 100s of Megabit and a few thousand dollars will get you into the 10G+ range.
Again, please note that I said 'router', not 'firewall'. As to the type of router I was referring to in that specific example was a Cisco enterprise branch router. Campus and data center grade routers do minimal traffic filtering if any due to the CPU hit they incur, hence why large hardware firewalls exist. Proper tool for the job.
Yeah.. but we don't need that throughput or scale.
The current configuration was choking, hence the discussion. Brian has worked with CAIDA and resolved the congestion for now.
Just statelessly filtering at the border edge with a modern router would solve much of these issues.
Please note that router and firewall are not the same thing. They can do the same job, but not as effectively as the device purpose built for the job. Also Brian already stated:
The port 'em0' is connected to a 1G switch which is in turn connected at 10GbE to the building infrastructure switch/router.
and
to do so requires administrative access to the campus border router that we don't have.
Fire-walling is done at the AMPR edge, but traffic was overwhelming the current configuration. Moving filtering to the provider router is technologically improper and operationally restricted, hence my suggestion to split filtering and tunneling onto separate machines to increase capacity.
The suggestion Tom made of running an IGP to selectively advertise only subnets which have valid destinations via the tunnels would also restrict the amount of traffic that will ultimately be blocked from reaching the firewall. This type of routing combined with a large null route is a common practice in large enterprise networks. Reducing the amount of traffic that is going to get blocked from reaching the AMPR edge will help system load but won't help with the timeouts due to slow [or down] tunnel peers.
As this thread has demonstrated, there are a few different ways to increase capacity of the AMPR gateway. While it may not be necessary at this time, it's still useful information to have for whoever is going to be responding next time there is an issue. -- Will
Hello Will,
Modern ASIC based firewalls can handle 100,000s of stateless filters on a per interface basis.
Note I said 'router', not 'firewall'. Routers are designed from the silicon up to forward packets, reduce broadcast domains and connect networks. Firewalls are designed from the silicon up to restrict the flow of packets. Yes firewalls will forward packets from one network to another, but their primary purpose is inspection and restriction.
Oopps.. and I misspoke too.. modern ASIC "ROUTERS" can handle 100k filter rules w/o much impact to the performance to the interface. That does depend on the grade of the ASIC router though.
Again, please note that I said 'router', not 'firewall'. As to the type of router I was referring to in that specific example was a Cisco enterprise branch router. Campus and data center grade routers do minimal traffic filtering if any due to the CPU hit they incur, hence why large hardware firewalls exist. Proper tool for the job.
Branch routers aren't always ASIC based and many can be CPU based still. Also consider that many Cisco enterprise products are really switches with some level of L3/L4 routing included. When I talk about ASIC-based routers, I'm specifically talking about service provider class like Juniper MX, Cisco ASR, etc.
Just statelessly filtering at the border edge with a modern router would solve much of these issues.
Please note that router and firewall are not the same thing. They can do the same job, but not as effectively as the device purpose built for the job. Also Brian already stated:
Absolutely agreed and I personally think for this role at UCSD, a stateless packet filter only allowing the specific ACTIVATED AMPR routes would work well. Tom also recommended maybe running an IGP routing protocol on the FreeBSD box as a method to update the FIB routes on the upstream UCSD routers. That would also work well and effectively drops traffic via the lack of a route vs. a stateless packet filter.
Fire-walling is done at the AMPR edge, but traffic was overwhelming the current configuration. Moving filtering to the provider router is technologically improper and operationally restricted, hence my suggestion to split filtering and tunneling onto separate machines to increase capacity.
This is a problem that has been solved for a LONG time now. There are ways to programmatically update remote routers with dynamic filters, etc. One method to to this is the BGP enabled flowspec. There are other mechanisms to do this as well.
The suggestion Tom made of running an IGP to selectively advertise only subnets which have valid destinations via the tunnels would also restrict the amount of traffic that will ultimately be blocked from reaching the firewall. This type of routing combined with a large null route is a common practice in large enterprise networks. Reducing the amount of traffic that is going to get blocked from reaching the AMPR edge will help system load but won't help with the timeouts due to slow [or down] tunnel peers.
Agreed but this specific issue wasn't about tunnel peers and like any other Internet based network, the end to end performance depends on every link in the chain.
As this thread has demonstrated, there are a few different ways to increase capacity of the AMPR gateway. While it may not be necessary at this time, it's still useful information to have for whoever is going to be responding next time there is an issue.
While I agree the current problem has been mitigated, I think that a few additional improvements could go a LONG way to prevent this from happening again in the face of say DDOS attacks, etc.
--David KI6ZHD