Sure, it's a 1024-bit word. I think of it that way because it's the word size of the memory on the ideal hardware for scrypt(N=1).
So, therefore, make the memory have 2^N words where N is the number of bits read or written in each memory transaction. So, for example, 2^32 entries each being 32 bits. Or 2^16 entries of 16 bits each. Then the "backward pointers" cost as much space as they save.
Hmm, I see what you mean.
Maybe you don't even need separate filling-memory and reading-memory phases. Something like
(assuming a hash function mapping 1024 bits to 1024 bits):
state = hash(header)
do 2^32 times {
interpret state as 16 pairs (i,v) of 32-bit values; for each pair set mem[i] := v
state = hash(state)
interpret state as vector of 32 bit indices; mix in all 32 mem[i]
state = hash(state)
}
Hmm; that still needs an initialization phase, unless we assume the memory starts out all 0.
But I still don't see these schemes as viable PoWs due to the lack of quick verifiability...