How ?

Example
Int generateRandomPrivateKey(Int minKey, Int maxKey, Xoshiro256plus &rng) {
// Validate inputs
if (intGreater(minKey, maxKey)) {
throw std::invalid_argument("minKey must be <= maxKey");
}
// Calculate range = maxKey - minKey + 1 (inclusive)
Int rangeSize;
rangeSize.Set(&maxKey);
rangeSize.Sub(&minKey);
Int one;
one.SetBase16("1");
rangeSize.Add(&one); // rangeSize = maxKey - minKey + 1
if (rangeSize.IsZero()) {
return minKey; // Edge case: minKey == maxKey
}
// (4x 64-bit numbers at once)
__m256i rand_vec;
{
uint64_t rand_vals[4] = {
rng.next(),
rng.next(),
rng.next(),
rng.next()
};
rand_vec = _mm256_loadu_si256((__m256i*)rand_vals);
}
// Convert to Int (using the first random number)
Int randomPrivateKey;
randomPrivateKey.SetInt32(0);
// Extract the first 64-bit random value
uint64_t rand_val = _mm256_extract_epi64(rand_vec, 0);
// Split into 64-bit chunks for Int
for (int i = 0; i < NB64BLOCK; i++) {
randomPrivateKey.ShiftL(64);
randomPrivateKey.Add(rand_val);
rand_val = rng.next(); // Get next random number if needed
}
// Ensure within range: randomPrivateKey % rangeSize + minKey
randomPrivateKey.Mod(&rangeSize);
randomPrivateKey.Add(&minKey);
return randomPrivateKey;
}
This test will work with the existing script on GitHub. The script will start by slowly generating random numbers and then ramp up to 45 Mkeys/s on your specific CPU.
There will be workload imbalance in distribution.
You will see CPU warm-up effects (the CPU takes time to reach full turbo boost).
You need to implement AVX2 and thread pre-warming to prevent workload imbalance.
AVX2 RNG + Warm-up = Achieves 50 Mkeys/s (stable).
Dynamic Scheduling = Achieves 55 Mkeys/s (no ramp-up).
Simple, isn’t it ?
