Subject: Re: [44net] Tunnel mesh is (mostly) down From: Eric Fort eric.fort@gmail.com Date: 01/03/2015 05:35 PM
To: AMPRNet working group 44net@hamradio.ucsd.edu
how about eliminating this issue perminantly from ever happening and moving to voluntary peering between gateways. Know thy neighbor and be responsible foryourpeers androutesseems to work really well for everyone else yet amprnet still relies upon route distribution from a single source.
Eric AF6EP
It is still not clear to me what exactly happened, and how it was resolved, but what I saw here is that the number of tunnel routes decreased dramatically and this disconnected the stations on IPIP including myself.
We already are BGP announced. That is not the problem. But as a properly setup gateway, we are both on BGP and the IPIP tunnel mesh, and the latter we configured using RIP (ampr-ripd). What apparently happened is that we received RIP broadcasts with only a very small subset of the active routes, and over time most routes got deleted. So we still had full connectivity to internet for net 44.137.0.0/16 and our statically connected subnets, but the tunnel routes inside the country and to the rest of the world mostly vanished.
We don't really need those tunnels to everywhere over the world, but we run them to remain compatible with the rest of the system. It would be sufficient for us to have only tunnels to our local users, and have the remainder of the traffic routed over plain internet.
However:
- there is no way in the portal to specify that you want to use tunnel routes only for your own subnets
- there are access lists in use at other stations that would block traffic going outside the tunnel system, because they want to limit traffic to only net-44, so we would get obscure routing problems
So for now we will keep running a tunnel mesh system, and I hope that everyone else who prefers functionality over other reasons will do the same. (I fail to see how a single-point-of-failure solution can be a worse choice than a configuration that does not work *at all* even when everything is up and running)
In the meantime, I hope some people can find some time to debug what was going on here. I have seen a similar problem in the RIP broadcasts before, a set of routes that appears and disappears at random. They appear in some RIP broadcast sets, then do not appear in some, then re-appear, etc. There must be a problem somewhere, but it is unclear if it is in the RIP server or in the code that delivers info to it. Maybe the problem of this morning is caused by the same bug, as it appears to have affected only a subset of all routes.
Rob