I do not see any performance degradations caused by my code supporting other NR_ROWS_LOG values (I have tried implementing only 20). Because all the non-20 cases are #ifdef'd out of the code. Plus the OpenCL compiler is very good at removing loops such as the for-loop in equihash_round() that becomes useless with 20.
This surprises me. I'll do some testing to confirm, but I don't think the the compiler will optimize away all the code that is useless with 2^20 bins.