I got reports that on Opteron (old non-aes AMD) i also give +30% perf, so maybe the Nehalem is a special case of non-AES cpu which runs bad on my code

Yes I can confirm this too, with xmr-stak I get 160H/s and with JCE:
{
"hashrate":
{
"thread_0": 14.55,
"thread_1": 14.55,
"thread_2": 14.56,
"thread_3": 14.55,
"thread_4": 14.55,
"thread_5": 14.56,
"thread_6": 14.64,
"thread_7": 14.64,
"thread_8": 14.64,
"thread_9": 14.64,
"thread_10": 14.64,
"thread_11": 14.64,
"thread_all": [14.55, 14.55, 14.56, 14.55, 14.55, 14.56, 14.64, 14.64, 14.64, 14.64, 14.64, 14.64],
"total": 175.11,
"max": 176.21
},
"result":
{
"wallet": "xxx",
"pool": "xxx",
"ssl": false,
"reconnections": 0,
"currency": "Monero (XMR/XMV)",
"difficulty": 5160,
"shares": 2673,
"hashes": 13941780,
"uptime": "22:27:24",
"effective": 172.45
},
"miner":
{
"version": "jce/0.33k/cpu",
"platform": "Dual Six-Core AMD Opteron(tm) Processor 2435",
"system": "Linux 64-bits",
"algorithm": "15"
}
}