hmm... not really... it's a bit complicated. The most gains in the hash are eliminating loops.
For example, the kernel has an #pragma unroll which removes loop and generates the call line by line. If a variable is unknown, #pragma unroll is not done. Also if cgminer is not compiled with -O3 option (most are being compiled -O2, I can assure you this), the kernel will not even be unrolled at all, so it's better for you to do the unrolling manually.
EDIT: oh, and my settings are these:
"intensity" : "19",
"vectors" : "1",
"worksize" : "256",
"kernel" : "scrypt",
"lookup-gap" : "2",
"thread-concurrency" : "24000",
"gpu-engine" : "1140,1140,1020",
"gpu-memclock" : "1250"
The last card was a lemon, doesn't want to get higher than 1020, that's what's you see on the stats above (about 633~ish) still pretty decent for hash/watt ratio.