Hello everyone. I have published my optimized versions of VanitySearch (CUDA) with speed boost in case anyone is interested

The "bitcrack" version is specific to the puzzle and allows searching for addresses and prefixes (compressed) within a given range. The speed is about 6900 MKey/s on a 4090 and 8800 MKey/s on 5090.
The second version, on the other hand, performs a standard search for vanity addresses (not just P2PKH compressed) but with the same optimizations in terms of math and CUDA code. Random searches with endomorphisms.
https://github.com/FixedPaul/VanitySearch-Bitcrackhttps://github.com/FixedPaul/VanitySearchThank you for your work – it's truly impressive! The first program achieves a speed higher than any other solution I've seen. Even with a 33% power limit on an RTX 4090, it reaches around 2.3G keys per second. The second program delivers an even more record-breaking speed of about 4G keys per second under the same power limit. However, unfortunately, these impressive numbers are merely theoretical and not useful for solving puzzles, as the program does not support working with ranges.