Post
Topic
Board Beginners & Help
Re: My initial Radeon HD 7970 mining benchmarks
by
DiabloD3
on 09/01/2012, 23:24:37 UTC
Code:
    int16 selection = XG2 == (x)(0x136032ED);
    if (any(selection))
    {
       x mask = Xnonce & 0xF;
       x temp = shuffle(select(Xnonce, 0, selection), mask);
       vstore16(temp, 0, output);
    }

That "if" might be totally unneccesary, and I still don't quite understand how the output array works, but it might give you a better idea of what I was trying to do to avoid all those branches.

I'll go add official 8 and 16 wide support in a bit, should be useful on, say, AVX if you manually enable CPU mining in the code. SDK 2.6's cpu compiler apparently has gotten a lot better from what I've heard.

I'll be watching the repository then Smiley It should almost definitely help with more modern CPUs and Larrabee/Intel MIC.

The output array is basically a massive hack to prevent multiple outputs from hitting each other, although the chances of getting multiple outputs is extremely low. The size of the array now is massive overkill, but it also seems to be a strangely optimum size for hardware.

Now, what would give me the most benefit is some way of sorting the outputs in a single cycle so that the pair of { nonce, H } could instantly give me the best nonce, and then only evaluate that. There seems to be no way to do this (and yes, I imply reverting that one bit of math so that H == 0 is literally done at the end again, makes it much easier to sort on shit). The nonces themselves can't be sorted because its completely random, they're meaningless values essentially.