TPruvot has it, does it work better? I've already looked at the code.
My first glance shows it's a completely new algo and can't benefit from any of the canned
optimizations. To optimize it requires a detailed analysis of the code to look for opportunities to
vectorize either serially, parallelly, or not at all. I expect the scalar code to be near optimum already.
It's a huge task to do the whole algo at once. Not really interested at this time.
Hashrate displayed by the miner, both thread and share, are artificially
calculated based on the number of iterations over time. The pool calculates based on the number and
difficulty of submitted valid shares. Perhaps there's a math error in the miners calculations.
Tpruvot has the first version of the algo, but they released a tweaked one (RFv2).
There is also a pull request with RFv2, but it has the same issue.