for a couple of seconds it gave 1000 Mkeys/sec, and then the speed dropped

test1
# ./mutagen -p 38 -f 3
=======================================
== Mutagen Puzzle Solver by Denevron ==
=======================================
Starting puzzle: 38 (38-bit)
Target HASH160: b190e2d40c...9e7ba39364
Base Key: 0x1757756A93
Flip count: 3 (override, default was 21)
Total Flips: 8436
Using: 12 threads
Progress: 100.000000%
Processed: 385330
Speed: 23.05 Mkeys/s
Elapsed Time: 00:00:00
No solution found. Checked 386575 combinations
Time: 0.14 seconds (00:00:00)
Speed: 22.98 Mkeys/s
test2
# ./mutagen -p 34 -f 14
=======================================
== Mutagen Puzzle Solver by Denevron ==
=======================================
Starting puzzle: 34 (34-bit)
Target HASH160: f6d67d7983...ab265f1bfa
Base Key: 0x1A96CA8D8
Flip count: 14 (override, default was 16)
Total Flips: 1391975640
Using: 12 threads
Progress: 3.592017%
Processed: 50000000
Speed: 34.18 Mkeys/s
Elapsed Time: 00:00:11
=======================================
=========== SOLUTION FOUND ============
=======================================
Private key: 0x34A65911D
Checked 55182106 combinations
Bit flips: 14
Time: 13.13 seconds (00:00:13)
Speed: 30.94 Mkeys/s
Solution saved to puzzle_34_solution.txt
https://github.com/NoMachine1/Mutagen/blob/main/mutagen.cppOnly this was added.
void worker(Secp256K1* secp, int bit_length, int flip_count, int threadId, AVXCounter start, AVXCounter end) {
const int fullBatchSize = 2 * POINTS_BATCH_SIZE;
alignas(32) uint8_t localPubKeys[HASH_BATCH_SIZE][33];
alignas(32) uint8_t localHashResults[HASH_BATCH_SIZE][20];
alignas(32) int pointIndices[HASH_BATCH_SIZE];
// Precompute target hash for comparison
__m256i target16 = _mm256_loadu_si256(reinterpret_cast<const __m256i*>(TARGET_HASH160_RAW.data()));
// Precompute points with AVX2-aligned storage
alignas(32) Point plusPoints[POINTS_BATCH_SIZE];
alignas(32) Point minusPoints[POINTS_BATCH_SIZE];
for (int i = 0; i < POINTS_BATCH_SIZE; i++) {
Int tmp; tmp.SetInt32(i);
plusPoints[i] = secp->ComputePublicKey(&tmp);
minusPoints[i] = plusPoints[i];
minusPoints[i].y.ModNeg();
}
Aligned loads/stores (_mm256_load_si256) are faster than unaligned ones (_mm256_loadu_si256).
=======================================
== Mutagen Puzzle Solver by Denevron ==
=======================================
Starting puzzle: 68 (68-bit)
Target HASH160: e0b8a2baee...7451fc8cfc
Base Key: 0x730FC235C1942C1AE
Flip count: 30 (override, default was 34)
Total Flips: 17876288714431443296
Using: 32 threads
Progress: 0.000000%
Processed: 30000000
Speed: 12861.49 Mkeys/s
Elapsed Time: 00:00:00
But then it quickly falls
