libsecp256k1's primitives can be easily used to implement batched addition, or anything else.
Alright, can you hook this part up for me using libsecp256k1?
Hooking up libsecp256k1Disclaimer: some biffing may be included - also known as "cat walked over the keyboard effect". This code was definitely tested (as in - no Red Alerts in the IDE - definitely the best indicator of everything working perfectly).
1. Build the constant hook jumpers!
What we want:
happiness multiples of G up to whatever.
static void compute_const_points(
secp256k1_ge * out,
uint32_t numPoints
) {
// Note: this uses a naive method, and should only be called once.
// If numPoints is large, this method can be HEAVILY optimized.
secp256k1_gej tmp;
*out = secp256k1_ge_const_g;
secp256k1_gej_set_ge(&tmp, out++);
for (uint32_t i = 1; i < numPoints; i++) {
secp256k1_gej_add_ge_var(&tmp, &tmp, &secp256k1_ge_const_g, NULL);
secp256k1_ge_set_gej(out++, &tmp);
}
}
2. Now we're already deep inside the secp256k1 inner guts. Let's delve deeper, but strategically, in order not to become obsessed maniacs over the calling conventions.
#define FE_INV(r, x) secp256k1_fe_impl_inv_var(&(r), &(x))
#define FE_MUL(r, a, b) secp256k1_fe_mul_inner((r).n, (a).n, (b).n)
#define FE_SQR(r, x) secp256k1_fe_sqr_inner((r).n, (x).n)
#define FE_ADD(r, d) secp256k1_fe_impl_add(&(r), &(d))
#define FE_NEG(r, a, m) secp256k1_fe_impl_negate_unchecked(&(r), &(a), (m))
#define FE_IMUL(r, a) secp256k1_fe_impl_mul_int_unchecked(&(r), (a))
Now we have the fastest paths to play with field
mines elements.
3. Now we're already shit in into the ECC multiverse, but wait... what are we even trying to do? I hope nobody forgot - we're trying to do
batched addition of group elements. And do it faster than light, if possible (not proven).
Let's focus on doing it to a single point together with the constant points array we already cooked up earlier, just to demonstrate.
Let's build up a really cool batch addition manager that handles the Space aspect (also called the RAM we require). Because the n00bz need to know: batched addition requires some additional buffers to hold stuff.
Advanced professionals haters might say we're using too much memory here. And they're right. Except they're wrong, because more memory helps with doing some things faster, or even in parallel!
static
void batch_addition_wrapper(
secp256k1_ge * geMiddle, // a single point
const secp256k1_ge * const_points,
U32 num_const_points,
U32 num_repeats
) {
size_t tree_sz = (num_const_points * 2 - 1) * sizeof(secp256k1_fe);
// printf("Allocating %zu bytes for tree\n", tree_sz);
secp256k1_fe * xz_1 = malloc(tree_sz);
if (NULL == xz_1) return;
secp256k1_fe * xz_2 = malloc(tree_sz);
if (NULL == xz_2) return;
for (uint32_t loop = 0; loop < num_repeats; loop++) {
batch_addition(geMiddle, const_points, xz_1, xz_2, num_const_points);
}
free(xz_1);
free(xz_2);
}
Now we're talking code, apparently. So many stars! However the magic is missing.
For who made it so far: you're a hero! Quick reminder: we're trying to add a lot of points to a single point, left and right, if not even up and down. The 3D version coming soon.
4. THE MAGIC METHOD
Phew, so far so good, but where's the actual addition? Maybe smth like this may work, who knows? We didn't got here just to get dumped with a "write your own, I never share code" lame comment, right?
static
void batch_addition(
secp256k1_ge * ge, // a single point
const secp256k1_ge * jp,
secp256k1_fe * xz, // product tree leafs + parent nodes
secp256k1_fe * xzOut,
U32 batch_size
) {
secp256k1_fe t1, t2, t3;
S64 i;
for (i = 0; i < batch_size; i++) {
xz[i] = ge[0].x;
FE_NEG(t1, jp[i].x, 1); // T1 = -x2
FE_ADD(xz[i], t1); // XZ[i] = x1 - x2
}
// up-sweep inversion tree [SIMD friendly]
for (i = 0; i < batch_size - 1; i++) {
FE_MUL(xz[batch_size + i], xz[i * 2], xz[i * 2 + 1]);
}
FE_INV(xzOut[batch_size * 2 - 2], xz[2 * batch_size - 2]);
// down-sweep inversion tree
for (i = batch_size - 2; i >= 0; i--) {
FE_MUL(xzOut[i * 2], xz[i * 2 + 1], xzOut[batch_size + i]);
FE_MUL(xzOut[i * 2 + 1], xz[i * 2], xzOut[batch_size + i]);
}
// TODO - this should be returned one by one, this is for demo only
secp256k1_ge result;
secp256k1_ge * _a = &result;
const secp256k1_fe * _inv = xzOut;
for (i = 0; i < batch_size; i++) {
const secp256k1_ge * _b = &jp[i];
// 1. do P + Q
result = ge[0];
FE_NEG(t1, _b->y, 1); // T1 = -y2
FE_ADD(_a->y, t1); // Y1 = y1 - y2 m = max_y + 2(1)
FE_MUL(_a->y, _a->y, *_inv); // Y1 = m = (y1 - y2) / (x1 - x2) m = 1
FE_SQR(t2, _a->y); // T2 = m**2 m = 1
FE_NEG(t3, _b->x, 1); // T3 = -x2
FE_ADD(t2, t3); // T2 = m**2 - x2 m = 1 + 2(1) = 3(2)
FE_NEG(_a->x, _a->x, 1); // X1 = -x1 m = max_x + 1
FE_ADD(_a->x, t2); // X1 = x3 = m**2 - x1 - x2 max_x = 3 + max_x + 1
secp256k1_fe_normalize_weak(&_a->x);
FE_NEG(t2, _a->x, 1); // T2 = -x3 m = 1 + 1 = 2
FE_ADD(t2, _b->x); // T1 = x2 - x3 m = 2 + 1 = 3
FE_MUL(_a->y, _a->y, t2); // Y1 = m * (x2 - x3) m = 1
FE_ADD(_a->y, t1); // Y1 = y3 = m * (x2 - x3) - y2 m = 1 + 2 = 3
secp256k1_fe_normalize_weak(&_a->y);
// TODO - consume first result = P + Q
// 2. Do P - Q using the same inverse
result = ge[0];
FE_ADD(_a->y, _b->y); // Y1 = y1 + y2 m = max_y + 2(1)
FE_MUL(_a->y, _a->y, *_inv); // Y1 = m = (y1 + y2) / (x1 - x2) m = 1
FE_SQR(t2, _a->y); // T2 = m**2 m = 1
FE_NEG(t3, _b->x, 1); // T3 = -x2
FE_ADD(t2, t3); // T2 = m**2 - x2 m = 1 + 2(1) = 3(2)
FE_NEG(_a->x, _a->x, 1); // X1 = -x1 m = max_x + 1
FE_ADD(_a->x, t2); // X1 = x3 = m**2 - x1 - x2 max_x = 3 + max_x + 1
secp256k1_fe_normalize_weak(&_a->x);
FE_NEG(t2, _a->x, 1); // T2 = -x3 m = 1 + 1 = 2
FE_ADD(t2, _b->x); // T1 = x2 - x3 m = 2 + 1 = 3
FE_MUL(_a->y, _a->y, t2); // Y1 = m * (x2 - x3) m = 1
FE_ADD(_a->y, _b->y); // Y1 = y3 = m * (x2 - x3) + y2 m = 1 + 2 = 3
secp256k1_fe_normalize_weak(&_a->y);
// TODO - consume second result = P - Q
++_inv;
}
}
Damn, this one looks like Chinese. WTF is even happening here? Basically a dry run useless cycle of operations, apparently. And you are 100% correct. In fact - do not even attampt to consider this code as production ready.
5. But.... does it work?
A: God knows. Maybe it does, maybe not. Maybe it regresses itself into Curve25519 by accident. You may also want to fully normalize the resulting GE back to, IDK... 33 byte public keys? Here's one way to do it and I'm out.
secp256k1_fe_normalize_var(x); // final Normalization of X (or Y)
// Convert X to 32 bytes... just for fun, or when you're done with the arithmetical nonsense
secp256k1_fe_to_storage(...);
// Check Y parity (no need to convert it) to set the first byte to get the final Compressed Public Key
That's it, basically.