Post
Topic
Board Pools (Altcoins)
Re: [ANN][POOL] Profit switching pool - wafflepool.com
by
tachyon_john
on 20/03/2014, 19:19:18 UTC
i can see what you are saying,  so as ltc was the asnwer to BTC becoming too high in difficulty due to ASICS we gotta wait and see what the answer to LTC is against the coming ASICS., if there is a such a coin that stands a chance and pools can adapt to it that would be the new standard...

An easy way of making ASICs unprofitable is to design algorithms that require large memory buffers and that have performance bound by memory bandwidth rather than arithmetic.  ASICs provide the greatest benefits for algorithms that are arithmetic-bound, and they provide the least benefits for algorithms that are bound by memory bandwidth.  By combining a large size memory buffer with random access patterns, we would get a level playing field that evolves very slowly.  GPUs of today have 200-300GB/s memory bandwidth which has only increased by a small margin generation-to-generation.  GPUs are expected to get a nice jump in bandwidth when memory technologies like die-stacked memory show up in a few years, but after that bandwidth growth will be very very slow again.  A large part of the complexity and cost in a GPU is the memory system, and this is something that is only feasible to build because millions of GPUs are sold per week.  By developing an algorithm that requires a hardware capability that is only cost-feasible in commodity devices that are manufactured in quantities of several million or more, it would push ASICs completely out, and keep them for a very long time, perhaps indefinitely.  It's one thing to fab an ASIC chip, it's another thing to couple it to a high-capacity high-bandwidth memory system.  If you design an algorithm that uses the "memory wall" as a fundamental feature, it will make ASICs no better than any other hardware approach.

Great Post and so true...

If they want a leveled plane of mining, that should be the way...

Best Regards,

LPC

Ya, so there's already coins that do this.  YACoin was the first, and currently takes 4 MB per thread to complete a calculation.  That will be 8 MB on May 31st.  All the other scrypt-chacha coins will get there eventually, but YAC is the trailblazer Cheesy

Sorry, but 4MB isn't a lot of memory.  1GB or more would start to be the size of memory I'm talking about.  Anything that's just a few megabytes in size is small enough that someone that wanted it badly enough could just put SRAM on-die.  CPUs and GPUs already have aggregate on-chip cache sizes that are 10 times that size, so 4MB is nowhere near large enough.  The data size has to be large enough so that the on-chip caches are useless, and remain useless over at least a 10 year period.  I would put that at something over 1GB.

We'll have to disagree on what constitutes "a lot", but even in YACoin, the effects of 4 MB hashes are taking their tolls.  You can't parallelize as many threads on today's GPUs as you can at lower N Factors.  A Radeon R9 290 with 2560 shaders would need 40 GB (no, not 4, 40!) to fully utilize the card.  Luckily, OpenCL is flexible, and we can adapt the code and recompile the OpenCL kernel we are using to utilize lookup-gap to give a larger effective memory size and thus use more threads.  If we were unable to change lookup-gap, the performance would degrade MUCH faster than 50% for every N-Factor change.  An ASIC is, by definition, a hard-coded piece of software in silicon format.  If they could utilize lookup-gap, it would need to be set in the design, and it would then be that balance between speed of the computations vs the amount of memory included.  But then, it will only work for a given N-Factor, so you'd have to switch to a different coin eventually.  How much dram can you fit in an ASIC die?  I would guess not enough to do more than a couple of hashes at a time, and unless the speed of the chip is significantly faster than today's GPU cores, I think we're still a long way off from ASICs for high memory (even 4 MB, NF=14) coins.


I'm not convinced that the only way to parallelize these algorithms is the way that it's being done currently.  It is often possible to write GPU codes where several threads or even a whole warp/wavefront work collectively on an algorithm step.  I haven't looked at the details of scrypt-chacha specifically, but I wouldn't be surprised if there are alternative algorithm formulations other than the one you refer to.  The top-end GPUs today have 8GB to 12GB of RAM.  In the next two years, there will be GPUs and other GPU-like hardware (e.g. Xeon Phi) that will have significantly more memory than they do now, likely in the range of 32GB.  I've read analyst articles that expect that Intel will put at least 16GB of eDRAM onto the next Xeon Phi (though likely on its own separate die), a much larger scale variant of what Intel is already doing for integrated graphics.  Next week is NVIDIA's GPU conference, perhaps there will be some public announcements about what they're doing for their next-gen GPUs.