Hello Will,
pf and ipfw on FreeBSD are true stateful firewalls, where no Linux firewall that I'm aware of is truly stateful. iptables treats each packet individually
Linux's ipfwadm and ipchains were stateless "packet filters" but iptables has been fully stateful for many many years. We are now at the cusp of nftables on Linux which makes things even more programmable though I don't know about the performance.
where pf/ipfw will add it as a flow and track bi-directional traffic for the duration of the connection. This is why pf / ipfw are not threaded, however they do automatically optimize rule sets when you load them to be as efficient as possible.
There are many ways to thread stateful firewalls. A common way is to dedicate the 5-tuple pair to it's own thread via a hash or other methods. This is how commercial vendors have been doing it and getting very high throughput with network processors (just a sea of simple cores).
Filtering at a router is a sure fire way to bring throughput to a crawl. Proper campus routers are designed with ASICs optimized for routing in hardware, and fire-walling is done in software.
Modern ASIC based firewalls can handle 100,000s of stateless filters on a per interface basis. I don't think we would need anything stateful here as blocking out all non-registered AMPR addresses would probably knock out 90% of the DDOS / Scanning traffic being seen.
I have seen enterprise small office routers handle 450~500mbps of straight routing but max out around 40mbps when fire-walling because it's CPU bound. The results are similar when stepping up to large chassis routers.
It depends on the class of devices you're buying. There are many inexpensive Enterprise grade firewalls (always stateful) that can run many 100s of Megabit and a few thousand dollars will get you into the 10G+ range.
A better option in my opinion is splitting up tunnelling and fire-walling onto separate machines. This would allow which ever system can handle fire-walling or tunnelling best to be configured for each task and would increase throughput capacity. Of course this does require more rack-space, power, cooling, another system to configure, and someone with the time and energy to set it up.
Yeah.. but we don't need that throughput or scale. Just statelessly filtering at the border edge with a modern router would solve much of these issues.
--David