The annoying part is that the offending region of memory is 32-bit aligned
Vector instructions need 16byte alignment.
In bitcrack sp-mod #4 ~66% of the time is used to multuply numbers. Pretty stupid algorithm. With tensor cores enabled, might push the hashrate abit.