you are talking about code, its P kangroo, you are right at your own way, i am talking code running at hardware structure space, as first kangroo was by pinkachunka at bitcrack, his stats was 100 bit in 3 days, and your 90 bit are shown for 8 days, he uses full power of GPU, mean full space by useing bitcrack switches, he uses 15.9gb during workout, and your ram used is 5.5gb , maybe you need expand table size or design by hardware structure switches, where maximum result for less time
Honestly, I have not studied the bitcrack code. I do not like AMD. Perhaps he uses group inversion for several threads with a previously specified distance between them or uses tables. There is a clear advantage of speed, I agree. In this code, for each thread, the start key is random, the counter counts the total speed. In general, according to Polard - the number of parallel threads does not particularly affect the speed of the solution - this is a sequential algorithm. So maybe I'm wrong)
As first kangroo was by pinkachunka at bitcrack
These were preliminary calculations, bitcrack has nothing to do with kangaroo.
https://bitcointalk.org/index.php?topic=1306983.msg51848002#msg51848002