Post
Topic
Board Mining
Re: Intel 50-core Knights Corner
by
gat3way
on 12/06/2012, 21:24:23 UTC
Quote
Well this is quite dishonest, you're claiming one scrypt operation can keep each KC-core utilized? I doubt it. It will probably require more than 4 scrypt operation per KC-core to keep the hardware utilize, so the memory shoots up 12.8 GB... also out of practical limits for an add-on board.

It can keep it. In the CPU world you don't hide latencies by scheduling other threads on a core when a memory-bound thread is stalled on a memory access - the opposite, context switches are expensive (hyperthreading being a special exception here but there you have two register sets per core, and things are different). Since you are "1337 c0d3r" I assume you've written a compute-intensive multithreaded application some time ago. Did increasing the threads count beyond the number of CPUs improve performance because you somehow "utilized" the cores better? Or did just the opposite happen because all you did was introducing scheduling contention?

Quote
You can decouple the fetching and decoding from the execution. Instructions do not execute until they are ready.

Again, pipelines can't help much in the situation where an instruction depends on the result of a previous one. SHA256 has 64 steps, and each step depends on the result of the previous one. Now there are a number of independent instructions within each step, this is not unlimited though.

Quote
This is a disadvantage of VLIW4/5 and has nothing to do with GPGPU.

Inadequate register/shared memory is a disadvantage of any GPU, not only VLIW ones. That makes them much less suitable for memory-intensive algorithms, even if they are embarassingly parallel. Moreover, resource-limited occupancy is a general GPGPU problem, far away from being bitcoin-related or VLIW-related, it is a problem even for ALU-bound kernels like the bitcoin one.


Quote
Of course you would increase the GPR if you increase the number of ALU per CU. But note that a CU generates a structural dependency, a dependency that we created in order to accommodate GPGPU.

What makes you think you would increase the registers count if you increase the ALU units? I see....mmmm....no relation between both.


Quote
If you were to make an ASIC miner, you sure as hell dont need a crapton of GPRs or CUs or "wavefront"... all you would need is tons and tons of ALUs.

Care to elaborate what "ALU" means in terms of ASIC?