As far as getting a rig mining with a specific N-Factor, in my experience it's far more easy to do this on the NVIDIA side than it is on the AMD side of things. Cudaminer/ccMiner is often not faster (though sometimes it is), but there's no mucking around with thread concurrency, intensity, etc. to worry about. Yes, the launch config can help, but I've always found it a little odd that we get HW errors in the first place -- I guess I don't know what they really mean, but an error suggests that the hardware performed all the calculations that were asked of it and somehow got the wrong answer. And with SGminer or whatever, if you go past certain levels of intensity these become very real, and also a real pain to debug. If you only have one rig, or perhaps if you have one type of rig (e.g. all of them are running R9 280X GPUs or similar), it's not the worst thing in the world, but if you have a variety of hardware (which I do), it means debugging on every single rig. It's why I eventually quit mining UTC/THOR/MRC/etc. (never mind that THOR ended up being a complete bust regardless).
Some smart people are working to fix the whole way the amd cards have to be tweaked for different NFactor changes and even different algos. Hopefully, your point here will be moot in the near future.
If you're adventurous, there is a branch(chacha-flexible) of YACMiner that I've never found the time to finalize that includes:
hot changes of lookup gap (sometimes fails and crashes)
autotune - automatically select settings for your card to avoid HW errors (fails when mining locally due to timing I haven't looked at fixing)
It still includes using buffer-size to specify settings instead of having to deal with thread-concurrency and other fiddly bits. Anyone who wants to help fix the timing issues and the crashing on lookup-gap change would probably earn a bounty of YAC from Beave
