I revisited the URandomHash2 code, the problem with the nonce scan still exists, and I found a new flaw.
Too small memory requirements will make FPGAs easier to implement.
Suggest:
1. Do not use final round in the middle state, miner must calculate the full round
2. Increase the M size so that the final memory requirement reaches 2MB or more.