Re: [44net] Tunnel mesh is (mostly) down

5 Jan 2015

      Brian Kantor wrote:
...
On Sun, Jan 04, 2015 at 09:23:08PM +0100, Rob Janssen wrote:
...
The ampr-ripd has a route lifetime of only 600 seconds.   Routes are announced every
300 seconds, so when two subsequent announces are incomplete we lose the route.
It happened again this morning at 08:05 local (07:05 UTC).  My route again was lost, and
recovered at 08:10.
I am at somewhat of a loss to explain why this might have happened; the rip sender logged
that it was fetching the proper number of subnet routes (428) from the routing database, and
generating the proper number of rip packets. No transmission errors were logged at the
time you mention.
It is possible that you did not receive all the packets.  They are sent as datagrams
so there is nothing to retry or notice if one of them goes missing in transit.
Perhaps it would have been smarter to use a connected mode (TCP) to transmit the routing
information.  We could convert to doing that, with some significant effort.
I agree that making the timeout much longer than 10 minutes is wise.  It might also be wise
to control for a large delta in routes received.  Logging the number of packets and subnet
routes received to syslog might provide some additional data if/when this happens again.

Brian

What I have observed in the past is that there is a small subset of the routes that appear and disappear
in my list quite regularly.   I discovered this when I made an auto-adapting filter that allows tunnel traffic
only from registered gateways, where new items are always inserted at the top, and when I list that
filter there are a few gateways that regularly appear at the top of the list.
(it is initially loaded in sorted numeric sequence so this is quite apparent)
For example, 44.140.0.1 is always amongst these.  I mentioned it on this mailing list but there was no
followup on it.
So probably there is something going on that is a bit more systematic than just random packet loss.
It could be that your RIP server sends out all packets in one burst without any delay inbetween, there
is some queue length limit somewhere (either locally in your system or along the path to here), and
the later packets in the burst have a high chance of getting dropped.
That could probably be fixed by putting a small usleep between the packet transmissions, so that the
queues can drain.
Such a change would be much easier than to go to the use of TCP.  Of course that is a more stable
solution, but a protocol like RIP should survive some random packet loss.   Systematic packet loss
is a different story.
Maybe something else you can do is drop the extra transmission of RIP packets from the public IP
address.  I think nobody is really using those (if not because of the funny destination port number), and
they only add to this problem by putting even more data in the queue.
Rob

2026

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

Re: [44net] Tunnel mesh is (mostly) down