If you didn't build the whole thing your own that is, check out a line that has i believe a "224u" line in it assigning a value to memHash..which is padding for a 256 byte chunk..so it should be something that takes it out at way before 255 and bam, well just comment it off and add + 1; How stable is it? i've only ran it an hour... but if it can do ~2-5 hashes for me.. imaging what it may do for multiple not using the basis of bits..
const size_t perThread = hashMemSize + 1u; // +224u; // Seems to be a speed improvement over padding and probably half unstable / more shares.
try a good test on a more speedy video card with that..doing before fix and after fix results and of course will it stay running for an hour or 300?
it's in auto config... source.