Verificationthat a hash output is valid for a given inputshould be orders-of-magnitude more efficient than computing the hash.
The validation ratio when used in a proof-of-work system also depends on how fast the hash is, because validation only requires one hash whereas solving proof-of-work requires as many hashes as the collective difficulty or more importantly the pool's minimum share difficulty.
Cuckoo Cycle is not a hash.
I was using the term 'hash' to mean proof-of-work problem with a verifier in the sense as defined here:
https://eprint.iacr.org/2015/946.pdf#page=2I wasn't focused on defining terms (in an informal internal paper for my own analysis that I never prepared for publishing), but rather on analyzing the issues pertaining to memory hardness, such as resistance to parallelization and/or trading computation for space as should be obvious from my analysis of Scrypt.
The Cuckoo Cycle hash [19] significantly increases the asymmetric validation ratio, but has unprovable security because its algorithm is based on permutated orderings which do not incorporate diffusion and confusion [20]
Diffusion and confusion are properties of hash functions. Cuckoo Cycle is not a hash function.
You need to get your basics straight.
Careful of that condescending attitude. You are starting to act like Shen-noether, Gregory Maxwell and other condescending jerks that have come before you.
What I am saying is that the entropy of your problem space is large but limited, which is indeed because the confusion and diffusion injected into the memory space is not entirely randomized over the entire memory space allocated to the PoW computation. Duh. Which is precisely what Andersen discovered when he broke your Cuckoo Cycle as I warned you would be the case. Quoting from the above paper:
A more promising scheme was proposed by Tromp [37] as the Cuckoo-cycle PoW. The prover must find a cycle of certain length in a directed bipartite graph with N vertices and O(N) edges. It is reasonably efficient (only 10 seconds to fill 1 GB of RAM with 4 threads) and allows very fast verification. The author claimed prohibitive time-memory tradeoffs. However, the original scheme was broken by Andersen [6]: a prover can reduce the memory by the factor of 50 with time increase by the factor of 2 only. Moreover, Andersen demonstrated a simple time-memory tradeoff, which allows for the constant time-memory product (reduced by the factor of 25 compared to the original). Thus the actual performance is closer to 3-4 minutes per GB1). Apart from Andersens analysis, no other tradeoffs were explored for the problem in [37], there is no evidence that the proposed cycle-finding algorithm is optimal, and its amortization properties are unknown
Although its entropy is large, i.e. the factorial permutation of all buckets in the hash space, it doesn't require that its entire memory space be accessed, thus possibly bit flags could be employed to reduce the memory used and make it faster.
There is no "entire memory space". Cuckoo Cycle uses memory for data structures, not for filling up with random garbage like scrypt does.
You lack imagination and abstraction conceptualization skills. Think of about the algorithm from the analogy that applies, and discover new insights.
Think about the entropy of the memory allocated for the data structures. If the entropy is not maximum one can in theory find a way to shrink that memory space and trade computation for memory space, then your algorithm is no longer memory hard. As well if one can parallelize computation within the same memory structures, it is no longer memory hard. As well if the entropy is not maximized, there is no evidence that a faster algorithm can't be found.
The "random garbage" is precisely necessary to maximize entropy and eliminate the possibility to trade memory for computation and/or parallelize the computation. Even the above paper's proposal seems to give up parallelization to the GPU (although they claim only by an order-of-magnitude and that is not including the GPU has multi-GB of memory and can run more than one instance if not memory bandwidth bounded).
I agree that Cuckoo Cycle can be parallellized. In fact the GPU solver works best with 16384 threads.
But it's only marginally faster than 4096 threads. Because they're saturating the random access
memory bandwidth.
Not just parallelization but also memory for computation time improvement as I predicted and now confirmed:
"Andersen [6]: a prover can reduce the memory by the factor of 50 with time increase by the factor of 2 only."