Re: Bitcoin puzzle transaction ~32 BTC prize to who solves it

Quote from: nomachine on Today at 10:08:46 PM

Quote from: kTimesG on May 21, 2024, 12:31:07 PM

Any sort of strategy is useless if you use either Python or ASM as long as any sort of higher-level op like SHA / RIPEMD is the actual bottleneck.

Nothing better (faster) and regularly updated is available than the following:

https://github.com/JayDDee/cpuminer-opt/tree/master/algo/ripemd (ripemd)
https://github.com/JayDDee/cpuminer-opt/tree/master/algo/sha (sha)

4-way, 8-way, avx2/avx512vl optimizations.

I don't see these implemented in the tools we use here; they are only used in the miner.

These existing ones have been deprecated.

Unfortunately, I don't have the time to address this myself.

Code:

while (1) {
#if defined(USE_CUSTOM_SHA256)
sha256_init(&s256ctx);
sha256_update(&s256ctx, compressed_pubkey, 33);
sha256_final(&s256ctx, sha256hash);
#else
#if defined(__APPLE__) && defined(USE_CC_SHA)
CC_SHA256(compressed_pubkey, 33, sha256hash);
#else
SHA256(compressed_pubkey, 33, sha256hash);
#endif
#endif

// RIPEMD160(sha256hash, 32, rmd_hash);

++count;
if (count % (1 << 26) == 0) {
ticks = clock();
speed = count * CLOCKS_PER_SEC / (ticks - start);
printf("SHA hashes: %" PRIu64 " speed: %" PRIu64 " hashes/s\n", count, (uint64_t) speed);
}
}

SHA hashes: 134217728 speed: 7485947 hashes/s

Code:

SHA256_Init(&shaCtx);
while (1) {
//...
SHA256_Update(&shaCtx, compressed_pubkey, 33);

++count;
if (count % (1 << 26) == 0) {
ticks = clock();
printf("Hashed bytes: %" PRIu64 " speed: %" PRIu64 " MB/s\n", count, (uint64_t) (speed * 33) >> 20);
}
}

Hashed bytes: 469762048 speed: 1712 MB/s

So 1.7 GB/s with your everyday SHA hasher is not bad, what's bad is that it's doing a single hash of a 1.7 GB message, not 50 million hashes of 33 bytes.
In our case, the hash context needs to be reinitialized for every public key we need to hash, so AVX512 and so on maybe can bring a 50% speed-up, nothing to write home about.

Benchmarked OpenSSL / Apple CommonCrypto and fast SHA with SSE3.2 intrinsics (last one was like 10% faster, probably because of inlining). I would bet that the CPUs that have hardware support for SHA instructions are actually used by the SHA routines available from the system APIs, and we wouldn't need to hack them ourself.

For AVX you'd actually need a distributed scheduling: https://github.com/minio/sha256-simd