The only CPU so far where i saw a perf increase was a core2 duo Conroe (2 cores, 2M cache).
Dual mining is perfect for AMD Phenom II X4. It has an L3 cache of 6 MB, so 3 cryptonight threads fit into it. But is has 4 cores, so with a normal miner, one core is without a job. Thanks to dual mining I can fill the L3 with 3 threads and add a "helper thread" to the 4th core. It looks like this:
"cpu_threads_conf" :
[
{ "cpu_architecture" : "deneb", "affine_to_cpu" : 0, "use_cache" : true },
{ "cpu_architecture" : "deneb", "affine_to_cpu" : 1, "use_cache" : true },
{ "cpu_architecture" : "deneb", "affine_to_cpu" : 2, "use_cache" : true },
{ "cpu_architecture" : "deneb", "affine_to_cpu" : 3, "use_cache" : true, "dual_mine_with": 2 },
]
Without dual mining the Phenom II X4 905e hashes up to 58 H/s. Thanks to dual mining on the last core, it's 64 H/s

I setup my PhenomII 955 @stock and I got 94-98H/s, startupconfig on SS.