Re: The Ethereum Paradox

Quote from: iamnotback on June 09, 2016, 07:41:57 AM

Quote from: tromp on June 09, 2016, 07:09:40 AM

Cuckoo Cycle doesn't rely on cryptographic security of its underlying hash function,
and Dan Bernstein himself attested to its use in Cuckoo Cycle being perfectly sound.

Please provide me a copy of that discussion, because I believe I have already shown it is potentially unsound in some logic I have written down and not yet revealed.

I spoke to him in person at BITCOIN'15 where I presented Cuckoo Cycle.

Quote

Are you referring to the latest version of your code, because I am referring to the code that was in the Appendix of your December 31. 2014 white paper which references int buckets.

The bucketing in an earlier version was just something inessential to the algorithm that was
found to be beneficial to performance. Last year I replaced it by prefetching in commit
https://github.com/tromp/cuckoo/commit/5aef59659e77c599c730ece6a42a7a2597de80da
which turned out to be more beneficial.

Quote

Is this edge trimming coming from a suggestion from Dave Andersen?

Yes, implemented back in April 2014.

Quote

Quote from: tromp on June 09, 2016, 07:09:40 AM

In one round of edge trimming, Cuckoo Cycle typically updates 2^29 counters
spread over 2^8 memory banks. Each memory bank thus holds 2^21 counters,
only 2^14 of which typically fit in a single row. This is where the latency comes in.
To avoid latency, you could increase the number of memory banks by a factor of 2^7,
but this similarly increases die area, hardware costs, and energy usage.

Or increase the number of h/w threads to 2^15 (2^8 multiplied by 2^21 bits for 2-bit counters divided by 2^14 bits per row) which have some efficient h/w mechanism for waiting to be synchronized on a memory page/row.

I don't see why you multiply with the number of rows that Cuckoo uses per memory bank.
Your 2^15 threads will each hash some nonce and then most will want to access a unique row.
Then what? There's little to be synchronized.

Quote

Quote from: tromp on June 09, 2016, 07:09:40 AM

Alternatively, the algorithm parameter PART_BITS allows for a reduction in the number
of counters in use at the same time, which is what your proposal essentially amounts to.
Setting this to 21 will require only 2^8 counters,
one per memory bank. But now your hash computations have increased by a factor 2^21,
over 2 million.

No that is not equivalent to increasing the number of h/w threads and syncing them to pause until 2^13 of them are queued up to read a shared memory page/row.

In my code a thread is something that processes a sequence of nonces, hashing each one, and
then atomically updating a corresponding counter, without any coordination.

Are you proposing a thread for each nonce? Half a billion threads?
I really don't understand what you are proposing and how this syncing is supposed to be implemented.

Quote

Btw, electrical power cost of SRAM is only one order-of-magnitude increase:

I'm talking about the dollar cost of SRAM vs DRAM.