My estimate of 128MB was based on having 4 bytes per entry and 2 tables for convenient updating (update one table then toggle a single indicator or pointer to make the updated table active). Of course when you require more bytes per entry the table will expand, but 8-16 bytes per entry should still fit comfortably in a modern machine.
Another possibility would be to have a 16-million entry array of short integers holding a "gateway number" (starting at 1) for each IP address, and a separate table of gateways holding all the other info you want to keep per gateway. (e.g. counters)
Then the processing of a packet would first index the destination IP in the first array, retrieving the gateway number (0 means drop the packet), then use that number as an index in the gateway table to access the per-gateway data including the endpoint address and the counters. This would require only slightly more than 32MB of memory for the tables, which should be no problem without any tricks.
Rob