Post
Topic
Board Mining (Altcoins)
Re: limits of ZEC mining
by
nerdralph
on 22/11/2016, 20:15:08 UTC
While my initial analysis was focused on the external GDDR5 bandwidth limits, current ZEC GPU mining software seems to be limited by the memory controller/core bus.  On AMD GCN, each memory controller can xfer 64 bytes (1 cache line) per clock.  In SA5, the ht_store function, in addition to adding to row counters, does 4 separate memory writes for most rounds (3 writes for the last couple rounds).  All of these writes are either 4 or 8 bytes, so much less than 64 bytes per clock are being transferred to the L2 cache.  A single thread (1 SIMD element) can transfer at most 16 bytes (dwordX4) in a single instruction.  This means a modified ht_store thread could update a row slot in 2 clocks.  If the update operation is split between 2 (or 4 or more) threads, one slot can be updated in one clock, since 2 threads can simultaneously write to different parts of the same 64-byte block.  This would mean each row update operation could be done in 2 GPU core clock cycles; one for the counter update, and one for updating the row slot.

Even with those changes, my calculations indicate that a ZEC miner would be limited by the core clock, according to a ratio of approximately 5:6.  In other words, when a Rx 470 has a memory clock of 1750Mhz, the core would need to be clocked at 1750 * 5/6 = 1458Mhz in order to achieve maximum performance.

If the row counters can be kept in LDS or GDS, the core:memory ratio required would be 1:2, thereby allowing full use of the external memory bandwidth.  There is 64KB of LDS per CU, and the AMD GCN architecture docs indicate the LDS can be globally addressed; i.e. one CU can access the LDS of another CU.  However the syntax of OpenCL does not permit the local memory of one work-group to be accessed by a different work-group.  There is only 64KB of GDS shared by all CUs, and even if the row counters could be stored in such a small amount of memory, OpenCL does not have any concept of GDS.

This likely means writing a top performance ZEC miner for AMD is the domain of someone who codes in GCN assembler.  Canis lupus?