Okay, I was able to make a first pass. I want to call out three really awesome contributions you've made:
1. Identifying cycles as the structure to be sought (this is a big deal, more on this later)
2. Making the memory size be (at least the high-order part of) the difficulty. This avoids having to pick magic numbers like "16GB" out of the air.
3. Using siphash() -- very cool, I've updated my initial post to mention this.
Typo: second page, "proof-of-work system. which". I think the period should be a comma.
I think cycles could be substituted by other structures, like cliques. Obviously you'd need to use
plain instead of bipartite graphs, and a larger fraction of edges to get something
like a 10-clique occurring with non-negligible probability.
I originally used sha256 to generate edges. I was very happy when I later realized I could use
this lightweight function instead. Being keyed made its use even more natural.
I introduced several typos when hastily rewriting parts of the paper to reflect the shift of focus
from memory-hardness (since it's not as tmto proof as i thought it was) to being graph-theoretical.
So please just gloss over the obvious grammatical mistakes. They'll get fixed in the weeks ahead.
Disclaimer: I didn't read the paper on cuckoo hash tables because your description of the proof-of-work (an undirected cycle of length >L in the graph of the converse composition of two hash functions with disjoint ranges) made sense right away. Let me know if I missed anything as a result. Or if I misunderstood the POW, but I think I got it.
I don't know what you mean by converse composition, given that the two hash functions are not injective.
Also, I look for cycles of length exactly L rather than > L.
One thing that nagged me was that the size of the proof scales with L. Since the proof has to go in the block headers, which even SPV clients need, it's the most space-sensitive part of a cryptocoin. Could one look for a cycle instead in the graph of a hash function postcomposed with "modulo N" where N is the targeted memory size? Then you can prove a cycle of length L by giving any element in the cycle, and a verifier can check your proof with O(log N)-memory and O(L)-time. This is partly why I'm excited about cycles as the proof. Also, in a sense this is what scrypt is, except that it asks for a path rather than a cycle and the hash function is H(n)=salsa^n(0) (where salsa^0(x)=x and salsa^(n+1)(x)=salsa(salsa^n(x))) -- the write-phase is a sort of Ackermann construction.
Yes, proof size of 168 bytes, or Lx4 bytes in general, is a downside.
If you're suggesting looking for cycles in a hash function with equal domain and range, that
doesn't need any memory. You can use either Floyd's or Brent's cycle detection algorithm in constant space.