Fri, Jun 2, 2017 at 5:12 PM
I found a possible source of memory corruption - a hash routine
might have returned a negative number, causing the flow stats
gathering to index off the beginning of a large array that
appears in memory adjacent to the routing table. This might
have resulted in an entry being stomped on. I don't know.
But the hash routine won't return negative or too large numbers
anymore. I did fix that. We'll see if that prevents the problem
from recurring. I'd hate to have to go through 16 million route
entries in a core dump.
- Brian
On Fri, Jun 02, 2017 at 11:12:16AM -0700, Brian Kantor wrote:
> On Fri, Jun 02, 2017 at 02:03:56PM -0400, lleachii--- via 44Net wrote:
> > From my perspective, I stop seeing all inbound Internet traffic from AMPRGW
> > to 44.60.44.0/24, except for the intermittent data to another subnet (now
> > 44.62.1.81). I transmit, and never receive replies.
> > Although, I'm still able to send and receive traffic to/from the other 44
> > GWs.
>
> Thanks, that helps. I checked, and the on-disk copy of the routing
> table is still correct even when this is happening, so I'm beginning
> to suspect memory corruption in the router software itself.
>
> As you know, that's difficult to find, but I'm looking through the
> code to make sure I haven't done any of the usual errors. The next
> time it happens, I'll take a core dump of the running process and
> see if that tells me anything. Be sure to let me know.
> - Brian