In fact siphash-2-4 is already overkill for my purposes, with siphash-4-8 being close to cryptographically secure. Your attack is about as conceivable to me as P being equal to NP, in which case there is no cryptographically secure hash function anyway.
Fact? Based on what published cryptanalysis.
I meant to say: In fact I believe that
You need millions to find 2^12 of them wanting to access a single row.
With 2^15 threads you expect 1 access per row.
Correct on 2^15 but not on the conclusion.
With 2^27 you expect 2^12 accesses per row. That's hundreds of millions.
A CUDA gpu can apparently do 671 million threads
Not in hardware. It can only run a few thousand in hardware (simultaneously).
So the 671 have to run in thousands of batches, one after the other.
I don't think it is safe to assume that ASICs can't be designed to support millions of very efficient threads for this very customized computation
Your ASIC would require an insanely huge die and be orders of magnitude more expensive than the DRAM it tries to optimize access to.
and again sharing the computation transistors amongst only the threads that aren't stalled, thus not needing millions of instances of the compute units.
How would you know what to stall without doing its computation first??
Also remember I need for instant transactions to have a very fast proving time, which thus means at 2^20 counters it could be trivially parallelized losing ASIC resistance with only thousands of threads.
For such small instances, latency is not that relevant as it all fits in SRAM cache.