Post
Topic
Board Mining software (miners)
Re: further improved phatk OpenCL Kernel (> 2% increase) for Phoenix - 2011-07-11
by
Vince
on 12/07/2011, 02:27:20 UTC
So here are some more changes:

I introduced const uint W17_2, containing P1(19) + 0x11002000, thats 3 shifts, 2 xor, 1 add traded against one extra parameter, well worth it,

extended self.f:
self.f = np.zeros(5, np.uint32)
to
self.f = np.zeros(6, np.uint32)

just after W17 calculation in calculateF:
        #W17_2
        self.f[5] = np.uint32(0x11002000+(
                rot(self.f[2], 32-13) ^
                rot(self.f[2], 32-15) ^
                (self.f[2] >> 10)
                ))

added the parameter (right after W17) in call and function

=> Effectively 3 Op's saved.

next change:
You can cut out all W0 to W14! Most of them are zero anyway, just needed to hardcode the first ones.
Also W[73] to W[78] are not used anymore with some small changes, so no need to initialize them.

=> less memory use, but has the same speed for me

Next one:
Round 3

#ifdef VECTORS
        Vals[4] = (W_3 = ((base + get_global_id(0)) << 1) + (uint2)(0, 1)) + PreVal4;
#else
        Vals[4] = (W_3 = base + get_global_id(0)) + PreVal4;
#endif

--
        // Round 3
        Vals[0] = state0 + Vals[4];
        Vals[4] += T1;

--

W[64 - O] = state0 + Vals[0];

you can reorganize and shorten round 3 to:
        Vals[0] = T1 + Vals[4];

needed changes in precalculation:
Preval4 += T1
T1 = state0 - T1

=> another addition almost effortless


here the files with these changes:
http://www.filesonic.com/file/1423103594

still some more to come!