libsecp256k1 has the fastest code to compute a public key from a private key.
Avoid computing from private keys at all costs. Precompute any public keys you know are constant (or any deltas that you know are constant), and do pubA + pubB in affine instead. If you have a batch of (pubA, pubB) pairs, use a single field inverse. If you also know that you want to compute pubA - pubB (for example, if all keys of pubB are distanced in identical deltas from both sides), move to the middle of the batch and re-use the same resulted inverse at each step to do both pubA + pubB and pubA - pubB.
Dump JLP, it is slower than libsecp256k1. Use the fe_* and ge_* primitives to generate public keys and implement affine point addition (the lib does A + J because it specializes on k * G, not on P + Q). Affine + affine is faster than A + J if you batch it and if you actually need affine points, not Jacobian.
Hi kTimesG,
Are you referring to this repo
https://github.com/bitcoin-core/secp256k1 ?