while (1) {
#if defined(USE_CUSTOM_SHA256)
sha256_init(&s256ctx);
sha256_update(&s256ctx, compressed_pubkey, 33);
sha256_final(&s256ctx, sha256hash);
#else
#if defined(__APPLE__) && defined(USE_CC_SHA)
CC_SHA256(compressed_pubkey, 33, sha256hash);
#else
SHA256(compressed_pubkey, 33, sha256hash);
#endif
#endif
// RIPEMD160(sha256hash, 32, rmd_hash);
++count;
if (count % (1 << 26) == 0) {
ticks = clock();
speed = count * CLOCKS_PER_SEC / (ticks - start);
printf("SHA hashes: %" PRIu64 " speed: %" PRIu64 " hashes/s\n", count, (uint64_t) speed);
}
}
SHA hashes: 134217728 speed: 7485947 hashes/s
SHA256_Init(&shaCtx);
while (1) {
//...
SHA256_Update(&shaCtx, compressed_pubkey, 33);
++count;
if (count % (1 << 26) == 0) {
ticks = clock();
printf("Hashed bytes: %" PRIu64 " speed: %" PRIu64 " MB/s\n", count, (uint64_t) (speed * 33) >> 20);
}
}
Hashed bytes: 469762048 speed: 1712 MB/s
So 1.7 GB/s with your everyday SHA hasher is not bad, what's bad is that it's doing a single hash of a 1.7 GB message, not 50 million hashes of 33 bytes.
In our case, the hash context needs to be reinitialized for every public key we need to hash, so AVX512 and so on
maybe can bring a 50% speed-up, nothing to write home about.
Benchmarked OpenSSL / Apple CommonCrypto and fast SHA with SSE3.2 intrinsics (last one was like 10% faster, probably because of inlining). I would bet that the CPUs that have hardware support for SHA instructions are actually used by the SHA routines available from the system APIs, and we wouldn't need to hack them ourself.
For AVX you'd actually need a distributed scheduling:
https://github.com/minio/sha256-simd