Try fiddling with worksize, i.e reducing it.
Tried 32, 48, 64, 96, 128, 192. The best result was 2.74 MH/s at --worksize 64.
i have same problem by my 290, it seems that improvement is not support for 290s cards
best i can get is 2.9mhs
Sounds weird and took me a while, but many GPU threads seem to work well on 290s with this kernel.
This is what I run now on 290 TRI Xs under PIMP, gives me about 3,4 MH.
It crashes sometimes on startup, but once it runs its rather stable.
There seems to be a limit of 10 threads in sgminer, wouldn't it sound legit to have 11 threads then?
"intensity" : "15",
"worksize" : "256",
"lookup-gap" : "1",
"thread-concurrency" : "25601",
"expiry" : "1",
"auto-fan" : true,
"log" : "5",
"queue" : "0",
"scan-time" : "1",
"gpu-threads" : "10",
"vectors" : "1",
"temp-cutoff" : "95",
"temp-overheat" : "85",
"temp-target" : "75",
"temp-hysteresis" : "3",
"gpu-fan" : "40-100",
"gpu-engine" : "1055",
"gpu-memclock" : "1250",
"gpu-powertune" : "20",
"gpu-dyninterval" : "7",