We've identified the problem that caused this morning's outage. It seems that when amprgw is rebooted, as it was this morning to apply a kernel security patch, the machine comes back up just fine, but the Ethernet switch it's plugged into crashes.
There's been a project to replace that switch with a 10 GbE switch (already on hand, I'm told) for some months; I'll see if maybe we can get a little higher priority on the installation now that we know the old one is a problem source.
May you live in interesting times.... - Brian