Thanks, using aticonfig --odgc --adapter=all gives me the loads.
I've checked the DISPLAY variable using echo $DISPLAY. I've also tried setting the COMPUTE variable (which is preferred over DISPLAY by the SDK) to :0 but it's still the same.
Basically if I start two instances of poclbm the one I start first (say on the first card) will do about 250000 khash/s and the second one will do 50000 khash/s (it varies but those seem to be roughly the averages). The load using aticonfig reflects this (i.e. one GPU is like 80% and the other jumps around a bit but is much lower). If I then stop the first instance the load shifts to the second GPU, also reflected in aticonfig.
So it seems I am definitely starting the program correctly (i.e. each instance is using it's own GPU), but it's like the work gets serialized somewhere and is limited to the performance of one card.