Subject:
Re: [44net] Tunnel mesh is (mostly) down
From:
Eric Fort <eric.fort(a)gmail.com>
Date:
01/03/2015 05:35 PM
To:
AMPRNet working group <44net(a)hamradio.ucsd.edu>
how about eliminating this issue perminantly from ever happening and
moving to voluntary peering between gateways. Know thy neighbor and
be responsible foryourpeers androutesseems to work really well for
everyone else yet amprnet still relies upon route distribution from a
single source.
Eric
AF6EP
It is still not clear to me what exactly happened, and how it was resolved,
but what
I saw here is that the number of tunnel routes decreased dramatically and this
disconnected
the stations on IPIP including myself.
We already are BGP announced. That is not the problem.
But as a properly setup gateway, we are both on BGP and the IPIP tunnel mesh, and the
latter we configured using RIP (ampr-ripd).
What apparently happened is that we received RIP broadcasts with only a very small subset
of the active routes, and over time most routes got deleted. So we still had full
connectivity
to internet for net 44.137.0.0/16 and our statically connected subnets, but the tunnel
routes
inside the country and to the rest of the world mostly vanished.
We don't really need those tunnels to everywhere over the world, but we run them to
remain
compatible with the rest of the system. It would be sufficient for us to have only
tunnels to
our local users, and have the remainder of the traffic routed over plain internet.
However:
- there is no way in the portal to specify that you want to use tunnel routes only for
your own
subnets
- there are access lists in use at other stations that would block traffic going outside
the tunnel
system, because they want to limit traffic to only net-44, so we would get obscure routing
problems
So for now we will keep running a tunnel mesh system, and I hope that everyone else who
prefers functionality over other reasons will do the same.
(I fail to see how a single-point-of-failure solution can be a worse choice than a
configuration
that does not work *at all* even when everything is up and running)
In the meantime, I hope some people can find some time to debug what was going on here.
I have seen a similar problem in the RIP broadcasts before, a set of routes that appears
and
disappears at random. They appear in some RIP broadcast sets, then do not appear in
some, then re-appear, etc. There must be a problem somewhere, but it is unclear if it
is
in the RIP server or in the code that delivers info to it.
Maybe the problem of this morning is caused by the same bug, as it appears to have
affected
only a subset of all routes.
Rob