CPU solving us highly memory and clock frequency bound and doesn't scale with cores well at all.
It scales reasonably well, althouh obviously sublinearly.
1x dev1 = 6.2 Sol/s
8x dev1 = 25.6 Sol/s
on a 4Ghz i7. So more than a 4x increase from using 8 instances.