I think what you call a word is a 128 byte block.
Sure, it's a 1024-bit word. I think of it that way because it's the word size of the memory on the ideal hardware for scrypt(N=1).
Whatever order you generate could be trivially stored in a map of N indices,
so you could still leave out alternate blocks and use the map to find the block from which to generate a missing one?!
True, but only if the word size is much larger than the memory's address size -- otherwise the "map" takes more memory than you save.
So, therefore, make the memory have 2^N words where N is the number of bits read or written in each memory transaction. So, for example, 2^32 entries each being 32 bits. Or 2^16 entries of 16 bits each. Then the "backward pointers" cost as much space as they save.
Worse, N/(lookup gap), so for lookup gap 4 this map will take only 256 bytes.
What is a lookup gap?