The delay is entirely caused by clearing the array.
I'm using 'bzero()'
to do that, as that's the fastest way I know of to zero an existing
array, but it still takes 25ms to zero a vector of 16 million shorts.
(Stepping through it with a for loop takes roughly 5 times as long.)
Well this is where you could gain some time using the mmap, depending on various
factors. When you do a fresh mmap of /dev/zero (or ANON) every time you need a
new clear array, that will execute much quicker than clearing all that space,
because in fact it is only setting up some page table entries that all point to
the same already zeroed memory block.
Then, when you start populating it of course you lose some of that advantage as
each write into a page causes a page fault and a new memory block being allocated
and zeroed and inserted into the page table.
Only experimentation can show what the total time of the mmap + page COW operations
is when compared to the bzero. It will depend on the density of the routed AMPRnet
space.
So, you would change the array[2][2**24] into a *array[2] (2 pointers to 16M entries)
and mmap/munmap them every time you need them cleared.
Rob