Re: Altcoin POW innovations headed in the wrong direction. Let a chip designer pick.

Quote from: eldentyrell on April 12, 2014, 03:12:18 AM

So glad you actually wrote a whitepaper.

I'm still going through the paper (should finish this weekend) but just one nitpick for now:

Okay, I was able to make a first pass. I want to call out three really awesome contributions you've made:

1. Identifying cycles as the structure to be sought (this is a big deal, more on this later)

2. Making the memory size be (at least the high-order part of) the difficulty. This avoids having to pick magic numbers like "16GB" out of the air.

3. Using siphash() -- very cool, I've updated my initial post to mention this.

Typo: second page, "proof-of-work system. which". I think the period should be a comma.

Disclaimer: I didn't read the paper on cuckoo hash tables because your description of the proof-of-work (an undirected cycle of length >L in the graph of the converse composition of two hash functions with disjoint ranges) made sense right away. Let me know if I missed anything as a result. Or if I misunderstood the POW, but I think I got it.

One thing that nagged me was that the size of the proof scales with L. Since the proof has to go in the block headers, which even SPV clients need, it's the most space-sensitive part of a cryptocoin. Could one look for a cycle instead in the graph of a hash function postcomposed with "modulo N" where N is the targeted memory size? Then you can prove a cycle of length L>L_min by giving any element in the cycle, and a verifier can check your proof with O(log N)-memory and O(L)-time. This is partly why I'm excited about cycles as the proof. Also, in a sense this is what scrypt is, except that it asks for a path rather than a cycle and the hash function is H(n)=salsa^n(0) (where salsa^0(x)=x and salsa^(n+1)(x)=salsa(salsa^n(x))) -- the write-phase is a sort of Ackermann construction. (okay that's not exactly scrypt -- the read phase maintains an accumulator which gets xored in before each salsa20/8).

The cycle-in-a-single-hash-function scheme proposed in the previous paragraph is best implemented with N-many log(L_max)-bit words of memory, so it sort of assumes something close to the "liveness bitvector" approach like Dave used in his comments on the cuckoo cycle it isn't so much an attack as the intended implementation. It can't be effectively parallelized since the amount of data you'd need to communicate is proportional to the amount of computation (total) the communication ends up costing more than the computation you're parallelizing. But the real drawback is that the memory accesses are not in any way serial. So the ideal hardware looks like a tiny loop running the hash over and over, blasting addresses at a custom memory. The memory always writes a "1" to the address given, and reports "hit" when a "1" is written over an existing "1" (memory initializes with all zeroes). So there's perfect pipelining between the memory and the computation. That's a big problem. I suspect the way around it is some Ackermann-like construction.

The other reason I'm excited about cycles is that they don't yield well to divide-and-conquer. Two halves of an L-cycle are an X-path and Y-path where (X+Y=L) but long paths are so commonplace that the coordination between two parallel path-finders to see if any of their paths form a cycle is likely to be as expensive as just looking for the cycle.