Re: [neㄘcash, ᨇcash, net⚷eys, or viᖚes?] Name AnonyMint's vapor coin?

Quote from: tromp on January 30, 2016, 07:07:21 PM

Quote from: TPTB_need_war on January 30, 2016, 06:52:44 PM

What I am saying is that the entropy of your problem space is large but limited, which is indeed because the confusion and diffusion injected into the memory space is not entirely randomized over the entire memory space allocated to the PoW computation. Duh. Which is precisely what Andersen discovered when he broke your Cuckoo Cycle as I warned you would be the case. Quoting from the above paper:

You're living in the past quoting 2 papers that both focus on an early 2014 version of Cuckoo Cycle.

What David Andersen did in april 2014 is to reduce memory consumption by a factor of 32, which has become
part of the reference miner in may 2014, well in the past when my Cuckoo Cycle paper was published in BITCOIN 2015.

The paper says:

Quote

¹The project webpage [37] claims Andersens optimizations be integrated into the miner, but the performance numbers are mainly unchanged since before the cryptanalysis appeared

What is the performance per Watt and performance per $ hardware comparing CPU and GPU now for reference miners?

It should be possible to use the superior FLOPS of the GPU to trade less memory for more computation and/or parallelization, thus giving the GPU an advantage over the CPU. Maintaining parity for the CPU was the entire point of a memory hard PoW algorithm. Also the more parallelization, the lower the effective latency of the GPU's memory because latency gets masked by computation proceeding in parallel. Up to the limit of memory bandwidth (which is very high on the GPU as you know).

Edit: I will need to study this when I am not so sleepy, to give it proper thought.

Edit#2: you added the following your post after I replied to it:

Quote from: tromp on January 30, 2016, 07:07:21 PM

But you don't need to read that paper to learn of the linear time memory trade off, which is right on the project page:

"I claim that trading off memory for running time, as implemented in tomato_miner.h, incurs at least one order of magnitude extra slowdown".

Btw, there is maximum entropy in the bitmap of alive edges once 50% of them have been eliminated.

But the GPU gets that computation for free because it is masked by the latency for the random accesses. That is why I asked for some performance figures above comparing CPU and GPU. I haven't looked at your project page for 2+ years.

Edit#3: so apparently about a 3X advantage in rate per watt since the TDP of the GPU you cited is 165W (including 4 GB ram) and i7 afair is about 125W:

https://github.com/tromp/cuckoo#1-memory-bank--1-virtual-core--1-vote

Was that running all threads of the i7 versus all compute units of the GPU? Did you maximize the # of instances each could do with its available compute units, i.e. normalize for if the GPU has 8 or 16GB as necessary to max out its FLOPS? I see you said you maxed out memory bandwidth, but what about trading some memory for 10X more computation until the memory bandwidth bound and computation bound (FLOPS) are matched?