Post
Topic
Board Mining (Altcoins)
Re: [ANN] sgminer v5 - optimized X11/X13/NeoScrypt/Lyra2RE/etc. kernel-switch miner
by
damm315er
on 09/01/2015, 23:01:11 UTC
Here is a small neoscrypt kernel improvement for free, since I am mostly doing X11 anyway.
It gave me a 5.8% speedup on my reference R9 290 card (with Stilt bios),
from 290.2 to 307Kh/s  at 800/1500 core/mem freq on Ubuntu 12.04 with stock drivers.
I didnt try it on my R9 280x cards, so please post your results if you try this.

You will have to mod the kernel as per the code below.
The bottleneck in this kernel is the way it stores the 128 intermediate results of chacha and salsa in global memory.
By doing the change below you are reducing stalls/latency by not making read/writes to same/adjacent memory banks.

Change:
void ScratchpadStore(__global void *V, void *X, uchar idx)
{
   ((__global ulong16 *)V)[idx << 1] = ((ulong16 *)X)[0];
   ((__global ulong16 *)V)[(idx << 1) + 1] = ((ulong16 *)X)[1];
}

void ScratchpadMix(void *X, const __global void *V, uchar idx)
{
   ((ulong16 *)X)[0] ^= ((__global ulong16 *)V)[idx << 1];
   ((ulong16 *)X)[1] ^= ((__global ulong16 *)V)[(idx << 1) + 1];
}

To:
void ScratchpadStore(__global void *V, void *X, uchar idx)
{
   ((__global ulong16 *)V)[idx] = ((ulong16 *)X)[0];
   ((__global ulong16 *)V)[idx + 128] = ((ulong16 *)X)[1];
}
void ScratchpadMix(void *X, const __global void *V, uchar idx)
{
   ((ulong16 *)X)[0] ^= ((__global ulong16 *)V)[idx];
   ((ulong16 *)X)[1] ^= ((__global ulong16 *)V)[idx + 128];
}


CORRECTION:

That made a 8 kh/s increase on my 290's.. from 341 to 349 kh/s. (dumb azz me, I forgot to delete the bin)