You are incorrect. Cache lines are 64-bytes long because AMD memory channels are 64-bits wide (i.e. 2 DDR5 chips). The GCN memory controller is not 32-bits wide, with 2 consecutive bursts to fill a cache line.
"Each memory controller is 64-bits wide and composed of two independent 32-bit GDDR5 memory channels." - pg 10:
https://www.amd.com/Documents/GCN_Architecture_whitepaper.pdfI read tons and tons of docs (including this whitepaper), but somehow missed that one line. Ok. Misconception clarified

Marc did some testing following this post, and determined that while reads result in 32 bytes being read from the two GDDR5 memory channels to fill a 64-byte cache line, writes are different. When data is only written to half of a cache line (32 bytes), due to the dirty byte mask the controller knows only one of the 2 GDDR5 memory channels is affected, and so will only write to one of them. However this does not mean the write bandwidth is double what I originally calculated, as writing 2 32-byte chunks of memory to the memory controller requires 2 core clocks. This would require the GPU core to be clocked the same as the memory, i.e. 2Ghz for a Rx 480 with 8Gbps RAM. This is due to the core:memory clock ratio limit I described here:
https://bitcointalk.org/index.php?topic=1682003.0