Brian Kantor wrote:
On Sun, Jan 04, 2015 at 09:23:08PM +0100, Rob Janssen
wrote:
The ampr-ripd has a route lifetime of only 600
seconds. Routes are announced every
300 seconds, so when two subsequent announces are incomplete we lose the route.
It happened again this morning at 08:05 local (07:05 UTC). My route again was lost, and
recovered at 08:10.
I am at somewhat of a loss to explain why this might have
happened; the rip sender logged
that it was fetching the proper number of subnet routes (428) from the routing database,
and
generating the proper number of rip packets. No transmission errors were logged at the
time you mention.
It is possible that you did not receive all the packets. They are sent as datagrams
so there is nothing to retry or notice if one of them goes missing in transit.
Perhaps it would have been smarter to use a connected mode (TCP) to transmit the routing
information. We could convert to doing that, with some significant effort.
I agree that making the timeout much longer than 10 minutes is wise. It might also be
wise
to control for a large delta in routes received. Logging the number of packets and
subnet
routes received to syslog might provide some additional data if/when this happens again.
- Brian
What I have observed in the past is that there is a small subset of the routes that
appear and disappear
in my list quite regularly. I discovered this when I made an auto-adapting filter that
allows tunnel traffic
only from registered gateways, where new items are always inserted at the top, and when I
list that
filter there are a few gateways that regularly appear at the top of the list.
(it is initially loaded in sorted numeric sequence so this is quite apparent)
For example, 44.140.0.1 is always amongst these. I mentioned it on this mailing list but
there was no
followup on it.
So probably there is something going on that is a bit more systematic than just random
packet loss.
It could be that your RIP server sends out all packets in one burst without any delay
inbetween, there
is some queue length limit somewhere (either locally in your system or along the path to
here), and
the later packets in the burst have a high chance of getting dropped.
That could probably be fixed by putting a small usleep between the packet transmissions,
so that the
queues can drain.
Such a change would be much easier than to go to the use of TCP. Of course that is a more
stable
solution, but a protocol like RIP should survive some random packet loss. Systematic
packet loss
is a different story.
Maybe something else you can do is drop the extra transmission of RIP packets from the
public IP
address. I think nobody is really using those (if not because of the funny destination
port number), and
they only add to this problem by putting even more data in the queue.
Rob