Post
Topic
Board Mining (Altcoins)
Re: [ANN] sgminer v5 - optimized X11/X13/NeoScrypt/Lyra2RE/etc. kernel-switch miner
by
BitmoreCoin
on 26/01/2015, 15:46:52 UTC
.....This is worth 20KH/s on my 280X......from 343KHs to 363KH/s at 1020MHz clock
.....now somebody needs to find 20KH/s more for me....  Smiley

change the XORBytesInPlace call from
Code:
XORBytesInPlace(B + bufidx, input, BLAKE2S_OUT_SIZE);
to
Code:
     XORBytesInPlace(B + bufidx, input, bufidx);
and change the function itself to perform some byte alignment checking
Code:
//
// a bit of byte alignment checking goes a long ways...
//
void XORBytesInPlace(void *restrict dst, const void *restrict src, uint mod)
{
  switch(mod % 4)
  {
  case 0:
    #pragma unroll 2
    for(int i = 0; i < 4; i+=2)
    {
      ((uint2 *)dst)[i]   ^= ((uint2 *)src)[i];
      ((uint2 *)dst)[i+1] ^= ((uint2 *)src)[i+1];    
    }
    break;    

  case 2:  
    #pragma unroll 8
    for(int i = 0; i < 16; i+=2)
    {
      ((uchar2 *)dst)[i] ^= ((uchar2 *)src)[i];
      ((uchar2 *)dst)[i+1] ^= ((uchar2 *)src)[i+1];
    }
    break;

  default:
  #pragma unroll 8
   for(int i = 0; i < 31; i+=4)
   {
    ((uchar *)dst)[i] ^= ((uchar *)src)[i];
    ((uchar *)dst)[i+1] ^= ((uchar *)src)[i+1];
    ((uchar *)dst)[i+2] ^= ((uchar *)src)[i+2];
    ((uchar *)dst)[i+3] ^= ((uchar *)src)[i+3];  
    }
  }
}


Later you said
Quote
Very interesting.   I get about 2% gain on 7950 and need to use (mod % 2) with the case statements adjusted accordingly.
My 280X gains almost 6% as is, but the gain difference between (mod % 2) and (mod % 4) is pretty small, like 1-2 KHs
When you used (mod %2, same as mod &1), what are the case statements inside XORBytesInPlace(void *restrict dst, const void *restrict src, uint mod)?