Hang on, I'm zoning in on the error. It appears to be returned from a CUDA API function so that's some good news

This issue is also reproducible on RTX 20 cards.
I'm currently busy strapping all the API calls with printf's after the error checks are triggered to see which one it is.
Cool! Hopefully your printfs do not slow it down too much causing it to work. That was our biggest pain in the .........
I've been checking our discord, but there is really nothing you didn't already know/ran into it.