Last, I also tried poclbm and this can just use one core at a time, if I start one process on GPU2 and then the other one on GPU1 both get on GPU2, reverse the order and both get on GPU1. At one time I got it to run on both GPU cores, but 100% load on the first and low single digit % on the second.
Regarding this configuration, have you set the environment variable "DISPLAY=:0" ? I'm running kernel 2.6.34, fglrx 8.762 (not sure which Catalyst version this corresponds to) and SDK 2.1.
I can't help with 11.1 since I haven't tried it yet.