Re: Bitcoin puzzle transaction ~32 BTC prize to who solves it

Quote from: kTimesG on May 22, 2024, 11:49:17 PM

AVX512 and so on maybe can bring a 50% speed-up, nothing to write home about.

Benchmarked OpenSSL / Apple CommonCrypto and fast SHA with SSE3.2 intrinsics (last one was like 10% faster, probably because of inlining). I would bet that the CPUs that have hardware support for SHA instructions are actually used by the SHA routines available from the system APIs, and we wouldn't need to hack them ourself.

For AVX you'd actually need a distributed scheduling: https://github.com/minio/sha256-simd

I achieved a 20% performance increase in Keyhunt on Zen3 architecture compared to GCC versions 12, 13, and 14.
To compile with Clang, I used the AOCC compiler located at /opt/AMD/aocc-compiler-4.2.0/bin/clang.

However, it was essential to remove all Intel intrinsics (_builtin_ia32) from the code since these intrinsics are specific to Intel processors and incompatible with AMD processors.

In my case, I need to rewrite both the SHA and RIPEMD implementations for Zen3 to achieve a significant performance boost.

Imagine achieving a 70% performance increase! Grin

Additionally, optimizing for Zen4 by leveraging its specific architectural features can lead to even greater efficiency gains.