Hello everyone. I have published my optimized versions of VanitySearch with speed boost in case anyone is interested

The "bitcrack" version is specific to the puzzle and allows searching for addresses and prefixes (compressed) within a given range. The speed is about 6900 MKey/s on a 4090 and 8800 MKey/s on 5090.
The second version, on the other hand, performs a standard search for vanity addresses (not just P2PKH compressed) but with the same optimizations in terms of math and CUDA code. Random searches with endomorphisms.
https://github.com/FixedPaul/VanitySearch-Bitcrackhttps://github.com/FixedPaul/VanitySearch