290x/win7 x64/13.12catbefore(2%-version): -w 256 -tc 20481 -g 1 -I 21 -----> 3.85mhs
after(free-version): -w 256 -tc 20481 -g 2 -I 15 ----->2.75mhs
no way for -g 1/-w64/-w128-parameters - it`s always 2.6mhs or less
PS What are the optimal settings for 290x/290? I still think that the lack off "-I" (crash of miner when "-I" is above 15) is the main issue in reaching more mhs.

Unfortunately, I haven't any 290x card.
Try to raise -g parameter.
Your worksize is too high. Here's what I use:
Sapphire 290x and use the following to get
3.475M:
kernel: x11mod
intensity: 15
worksize: 128
lookup-gap: 2
thread-concurrency: 8192 (I had better luck with lower TC, probably because we are using multiple threads)
gpu-threads: 6
gpu-engine: 1050 (1075 works for a slight improvement and 1100 crashes my driver)
gpu-memclock: 1250
gpu-powertune: 0
Let me know how that goes or if you find a better settings.