I still don't understand what you need to do to find the private key after find the key subtraction result, if I understand it maybe I could write a CUDA version for it.
The main goal is to find a way to divide an unknown key( a puzzle key) as first target and use a known key as second target then do the division until you find a known key in the subtraction results, if you find 1 known key in sub result, you can derive the private key for the puzzle out of it.
Some examples to simplify the method used:
P= 2678845/52453 = 51.07134005681277
What do we need here? We need to divide n by a number to get this : 0.07134005681277 then if we subtract the result of p/52453 from this 0.07134005681277 we will get 51, bingo, since we can generate 1 up to 51 and store on device, whenever we hit 51 we can immediately notice there is a match found. But in reality we would need to store at least 1 TB public keys to compare for a match.
The problem is finding a way to divide n by scalar to reach 0.07134005681277 or even close enough to that depending on how many of such results we can store.
On another note, I'm interested to figure out, when we divide n by e.g, n/45, if we then multiply the result by 450, we get something like 1/45*450 = 9
.9999999 finding the part in bold will help solving the key, but these are just ideas, needs testing.