Really wonder if someone was able to run this against compute_75 & what speed bitcrack would hit. I've been running a modified VanitySearch, doing 4.6GK/s on a single 3090. Sadly due to the 86k threads it trying to fill, it goes out of bounds now & then (GPU/GPUCompute.h:54). Just cannot wrap my head around that funny one yet. But besides of me trying to understand that & learning a lot, CUDA should be doing something near that speed on bitcrack too

Neat idea. I might give that a go and submit a pull request or fork BitCrack with that function. It should be possible.
Edit: my repo is at
https://github.com/bitcoinforktech/BitCrack.git which will have some updates in the next few days.
Yeah, cuda on bitcrack has this interesting problem on the new drivers. Will try with line info later, was just doing a quick run of your repo.
[2021-01-19.17:31:52] [Info] Error: misaligned address
========= Misaligned Shared or Local Address
========= at 0x0000e610 in keyFinderKernelWithDouble(int, int)
========= by thread (160,0,0) in block (0,0,0)
Edit:
Most fascinating thing about this issue, is that it runs my full test keyspace in debug exe (400M)[ofc slow af], the release crashes on the error above.
I have just installed my 3070 and giving it a go, I've compiled the CUDA version a few times but only for older cards.
I hear that I have to roll back my driver to get it working for 3070, 3080 or 3090 cards, but not sure which one. I can't get it to start at all right now on the RTX 3070, using the driver that comes with CUDA development kit 11.2.
Aside, I think I know where to fix this, if I can just get it to work on my card so I can give it a whirl :/