You're absolutely right. I dropped some decimal places. Thanks! However, gateway addresses don't fit in a byte, they're stored in an unsigned long, which is 4 bytes each. 4 * 16 million, right? And I think you need counters and flags. So it's an array of structs of some modest size each.
My estimate of 128MB was based on having 4 bytes per entry and 2 tables for convenient updating (update one table then toggle a single indicator or pointer to make the updated table active). Of course when you require more bytes per entry the table will expand, but 8-16 bytes per entry should still fit comfortably in a modern machine.
But I see a complication: You will have to have n entries per route, which will make loading the table a little less straightforward.
Well, searching a datastructure for the correct route isn't straightforward either... remember when there is a tunnel to e.g. 44.137.0.0/16 and another one to 44.137.1.80/28 (an example from the current table) then any traffic to 44.137.1.81 should go to the tunnel for 44.137.1.80/28, not something that you can easily do with a binary search. However, a lookup in the table populated with entries for 44.137.0.0/16 and then overwritten with entries for 44.137.1.80/28 is easy. It will find the correct gateway with a single array index operation. So you only need to build the table starting from the subnets sorted by number of subnet bits to get it right.
Indeed it is best to make the program multithreaded, or you could put the lookup table in shared memory and have a process doing the routing and another process doing the updating. I would still opt for having a "current" and a "next" table where the routing code always has a working table and the switch to the next version is instaneous. Of course you can also do a quick check before starting the laborious update process, to see if the new encap table is different from the previous one.
Rob