Really wonder if someone was able to run this against compute_75 & what speed bitcrack would hit. I've been running a modified VanitySearch, doing 4.6GK/s on a single 3090. Sadly due to the 86k threads it trying to fill, it goes out of bounds now & then (GPU/GPUCompute.h:54). Just cannot wrap my head around that funny one yet. But besides of me trying to understand that & learning a lot, CUDA should be doing something near that speed on bitcrack too

Neat idea. I might give that a go and submit a pull request or fork BitCrack with that function. It should be possible.
Edit: my repo is at
https://github.com/bitcoinforktech/BitCrack.git which will have some updates in the next few days.
Yeah, cuda on bitcrack has this interesting problem on the new drivers. Will try with line info later, was just doing a quick run of your repo.
[2021-01-19.17:31:52] [Info] Error: misaligned address
========= Misaligned Shared or Local Address
========= at 0x0000e610 in keyFinderKernelWithDouble(int, int)
========= by thread (160,0,0) in block (0,0,0)
When you say modified VanitySearch, what do you mean? How is it modified? Still searching for vanity/prefixes or doing a search sequentially like bitcracK? Vanity in general, is much more faster than bitcrack.