SHA1 is based on a Merkle-Damgard construction. It doesn't matter much whether you hash a lot of data once or a small amount of data multiple times. In both cases you are calculating multiple times the same compression function. Of course, there are some differences like hashing small amount of data can be more cache-friendly (with GPUs that could mean less __global reads). There are quite a lot of optimizations that can be done when input is fixed and small enough like what's the case with bitcoin. Anyway I believe that the ratios would be more or less the same provided that the graphs are correct.
However, this does not take into consideration whether the code is optimized better for a specific platform, the quality of the drivers and the OpenCL stack (which does not perform as well as CUDA on NVidia).