That's it, basically.
In the end, we came up with a sort of Frankenstein code together, and this is it:
// Generate public keys in batches
for (; localBatchCount < HASH_BATCH_SIZE && currentKey <= threadRangeEnd; ++localBatchCount, ++currentKey) {
if (!secp256k1_ec_pubkey_create(ctx, &pubkey, priv)) {
std::cerr << "Failed to derive public key.\n";
continue;
}
size_t len = 33;
secp256k1_ec_pubkey_serialize(ctx, localPubKeys[localBatchCount], &len, &pubkey, SECP256K1_EC_COMPRESSED);
}
}
Fastest way to faster code: screw batch addition altogether, compute from pvt.
But I forgive you - after all, this is what happens when two kinds of gibberish code collide.

To use internal APIs like secp256k1_ecmult_gen, you must install secp256k1 form source:
You don't need that function if you don't ever need point multiplications. Neither a context at all. Simply include the headers with the implementation for group & field ops. That's how I do it. Zero issues.