Nothing. Earlier termination would cost you more than going through all the checks.
You cannot use preprocessor directives for that because v and g are not known in compile time, so forget about #ifdef's, #else's, #define's and so on. I've seen such confusion from people that have been coding in interpreted languages mostly and recently switched to C.
Anyway. If I were to search for improvements in the kernel (assuming I changed vector width to

, perhaps the final checks is not the right place. If you have a look at the kernel, you'd notice that a lot of code has been "reordered" so that higher ALUPacking is achieved. For example sometimes several w[X] values are calculated in a row, sometimes it is done with each SHA256 round step. Another thing is order of operations in the macros, it is not random, I bet whoever coded it has profiled ALUPacking and chosen the best case. However, switching to uint8 would definitely break that. I believe you can get at least 1-2% performance improvement from tighter alupacking which is much more than what you'd get from saving several ALU ops in the final checks
