Your idea about starting at a larger node is a good one, you would certainly want to debug on a cheap process.
There's nothing to debug at the transistor level that is process-independent. In fact, even the transistor model changed from BSIM3 to BSIM4-family when you move from cheap to expensive processes.
The general topology of the models is already well known and open sourced:
http://bsim.berkeley.edu/models/What is secret? The parameter values of those models. And even if you use MOSIS/Europractice or similar program you won't be able to publish those secret values. Without those you can't optimize in any sensible way beyond "sandbag the hell out of it and keep your fingers crossed". KnC did that already.
This is by far one of the better threads I have come across on Bitcointalk.
If its not too much, could you describe a little on how KnC "sandbagged" the design and why didn't they use Europractice?