While my initial analysis was focused on the external GDDR5 bandwidth limits, current ZEC GPU mining software seems to be limited by the memory controller/core bus. On AMD GCN, each memory controller can xfer 64 bytes (1 cache line) per clock. In SA5, the ht_store function, in addition to adding to row counters, does 4 separate memory writes for most rounds (3 writes for the last couple rounds). All of these writes are either 4 or 8 bytes, so much less than 64 bytes per clock are being transferred to the L2 cache. A single thread (1 SIMD element) can transfer at most 16 bytes (dwordX4) in a single instruction. This means a modified ht_store thread could update a row slot in 2 clocks. If the update operation is split between 2 (or 4 or more) threads, one slot can be updated in one clock, since 2 threads can simultaneously write to different parts of the same 64-byte block. This would mean each row update operation could be done in 2 GPU core clock cycles; one for the counter update, and one for updating the row slot.
Even with those changes, my calculations indicate that a ZEC miner would be limited by the core clock, according to a ratio of approximately 5:6. In other words, when a Rx 470 has a memory clock of 1750Mhz, the core would need to be clocked at 1750 * 5/6 = 1458Mhz in order to achieve maximum performance.
If the row counters can be kept in LDS or GDS, the core:memory ratio required would be 1:2, thereby allowing full use of the external memory bandwidth. There is 64KB of LDS per CU, and the AMD GCN architecture docs indicate the LDS can be globally addressed; i.e. one CU can access the LDS of another CU. However the syntax of OpenCL does not permit the local memory of one work-group to be accessed by a different work-group. There is only 64KB of GDS shared by all CUs, and even if the row counters could be stored in such a small amount of memory, OpenCL does not have any concept of GDS.
This likely means writing a top performance ZEC miner for AMD is the domain of someone who codes in GCN assembler. Canis lupus?
Core speed has more of an effect on 480s but they are still limited by memory bandwidth.