i get 52kh with 8 threads and max 65kh with 16

Yes, the algorithm is pretty heavy on CPU.
What CPU are you using anyway?
Can you run windows and linux and compare?
I think linux version should give you more hashes because it's 64-bit
just for info im running a 8 core ryzen 7 1700x 10 threads getting about 50kh on average , on windows , but i am doing other stuff to and my gpu rig is connected but hey getting a block on average every 30 to 40 minutes