Re: The Ethereum Paradox

Quote from: tromp on June 10, 2016, 03:25:38 PM

Quote from: iamnotback on June 10, 2016, 02:15:52 PM

we could also consider enumerate multiple nonces and discarding those that don't match a subset of page rows, i.e. process the cuckoo table in chunks with multiple passes.

As mentioned before, this is exactly what higher values of PART_BITS achieve, as you can see in cuckoo_miner.h and cuda_miner.cu

Each trimming round uses 2^PART_BITS passes with the number of counters reduced by the same factor. With sufficient reduction, you end up using only one row in each memory bank.

What I meant before is that it isn't the same tradeoff as the h/w threads because as you said it increases the number hashes computed:

Quote from: iamnotback on June 09, 2016, 07:41:57 AM

Quote from: tromp on June 09, 2016, 07:09:40 AM

Alternatively, the algorithm parameter PART_BITS allows for a reduction in the number of counters in use at the same time, which is what your proposal essentially amounts to. Setting this to 21 will require only 2^8 counters, one per memory bank. But now your hash computations have increased by a factor 2^21, over 2 million.

No that is not equivalent to increasing the number of h/w threads and syncing them to pause until 2^13 of them are queued up to read a shared memory page/row.

The point is the ASIC can balance some of the two strategies to find the optimum. The CPU can't effectively leverage either strategy.