Hello everyone. I have published my optimized versions of VanitySearch (CUDA) with speed boost in case anyone is interested

The "bitcrack" version is specific to the puzzle and allows searching for addresses and prefixes (compressed) within a given range. The speed is about 6900 MKey/s on a 4090 and 8800 MKey/s on 5090.
The second version, on the other hand, performs a standard search for vanity addresses (not just P2PKH compressed) but with the same optimizations in terms of math and CUDA code. Random searches with endomorphisms.
https://github.com/FixedPaul/VanitySearch-Bitcrackhttps://github.com/FixedPaul/VanitySearchThank you for your work – it's truly impressive! The first program achieves a speed higher than any other solution I've seen. Even with a 33% power limit on an RTX 4090, it reaches around 2.3G keys per second. The second program delivers an even more record-breaking speed of about 4G keys per second under the same power limit. However, unfortunately, these impressive numbers are merely theoretical and not useful for solving puzzles, as the program does not support working with ranges.
I wonder if it is possible to implement Bitcoin address prefix searching not only by the starting characters but also by any other positions within the address. For example, searching for characters at the end, in the middle, or even a combined search where part of the characters are at the beginning, part in the middle, and part at the end, and so on.
Thanks! But why only 2.3 Gkey/s? A 4090 @350W should run at around 6 Gkey/s.
As for the second program, as soon as I find some time, I'll also implement search within a specific range there (without endomorphisms, of course), so that it can search within a range rather than randomly—both for prefixes and wildcards, which is what you're asking for, if I understood correctly.