CM 8.0 getting 187Sol/s on 480 4GB (7Gbps), and 176 on 470 4GB (7Gbps), Stock clocks, so it seems 173 is not the theoretical limit.
Correct. See my Nov 20 post about data written to half a cache line. That puts the theoretical limit up around 200 for a OpenCL implementation, with the 1250Mhz core clock being the limiting factor rather (described in my Nov 22 post).