Yep, that's entirely correct. I remember it wasn't much fun to figure this out; took me several hours to get right myself.
No kidding. It broke my brain for awhile, until I realized it was just a delay chain, so you could add to cnt to get what cnt "looks like" at each stage in the chain.
Edit: Fix tested in Modelsim at all working LOOP_LOG2 values (0, 1, 2 and 3) and pushed to partial-unroll-opt branch.
Wonderful, thank you for checking!
Sorry again about that bug.
No worries. Your work is greatly appreciated, and I'm very excited to get my LX150 dev kit mining

The guys over at the Modular FPGA hardware design thread will also be quite happy, since their design is based around the LX150.
I ran a LOOP_LOG2=0 compile overnight. Turns out, the compiles actually take very little time; under an hour. And yes, it completes just fine at 50MHz

However, I've made silly mistake after silly mistake in the code, resulting in countless re-compiles. I'm
hoping the compile I have going right now is the last one, and I can finally get correct results from the live hardware. I will report back with success once I've got it.
Looks like device utilization is about 50%, which is good, and XPower estimates 2.2W of consumption (FF toggle at 200, BRAM at 100%). I measure 50C on the chip's surface, ~38C with a small fan.