I'meant that affinity makes a thread more energy efficent if it works alone on 2-core piledriver module, for example two threads working on one module, sharing the 2MB cache make max 40H/s each, while one thread working alone on 2-core module has 60H/s and thats because it has all the 2MB L3 for itself.
Efficency depends on algo's demand for memory, CryptoNight works best with 2MB and I realy don't know about other algos.
To make sure I need to plug my rig to power meter.
I will try tomorrow if its really more efficent and post the results.
Cryptonight is the only algo I'm aware of that performs faster with fewer threads, optimum is L3 cache MB / 2 MB.
The default affinity should assign the threads as you want, just need to confirm it.
Be careful about assumptions about power, lower power often means less efficient because the CPU spends a lot
of time idle waiting for data from memory.