Say for something like XopMC/CudaBrainSecp on GPU - where we have to do point multiplications for all keys, do you know what's the current best implementation? Or do you have any ideas to make it faster? Here, we can't use point additions as pvt keys are unrelated to each other.