Odd, I was mining Scrypt profitably with GPUs for a couple months into the Gridseed era - "private kernels" did NOT kill Scrypt mining.
Why yes, I DO base my "efficiency" numbers off current conditions - but I don't just look at ONE algorythm that's still new and not optimised for NVIdia, I also look at others that ARE optimied for both and are similar in conditions.
Keep in mind that I SPECIFICALLY STATED "Genoil's miner" for ETH. Your comments about "that was Dagger" just show you didn't bother to read what I POSTED.
RX 480 has faster (8000 Mhz effective) but narrower (256 bit) memory than the R9 290 and R9 390 that gives it overall slightly better memory bandwidth than the R9 290 (5000 Mhz effective at 384 bit) but slightly worse than the R9 390 (6000 effective Mhz at 384 bit).
The RX 480 has 12.5% MORE compute cores (2304 vs. 2048 for exactly a 9:8 ratio) at quite a bit HIGHER clock rate than the R9 390 and even more so than the R9 290.
RX 480 and R9 390 are both PCI-E 3.0 cards, R9 290 is only PCI-E 2.0, but that has little or no measurable effect on most mining.
The RX 480 is NOT "close or a bit less than a R9 290" but in fact is a superior card across the board except ONLY for memory bus width (which is made up for and more by it's much faster memory), but it's speed on ETH and ZEC is almost identical, definitely NOT seeing 12.5% better speed much less it's actual 12.5% MORE CORES TIMES IT'S HIGHER CLOCK SPEED which would be the case on a compute-limited algorythm.
On an actual compute-limited algorythm like SHA256 (which is still used by a few sites like GPUBoss for a benchmark), the RX480 blows the R9 290 and R9 390 completely out of the water.
Might also want to pay attention to the R9 290x vs the R9 290 as they have the same memory system but the 290x has the same 2304 cores that the RX 480 does - yet doesn't hash any faster than the R9 290 despite having 12.5% more cores.
Am I saying there isn't room for improvement on the NVIdia side for ZEC mining? Definitely not!
Am I saying I doubt that NVidia will surpass AMD on ZEC? Given the obvious "heavy memory usage for ASIC resistance" design of ZEC and th very similar memory systems on both sides, definitely.
Yes, I'm fully aware that the FuryX and Nano have 4096 cores and fairly high core clock rates (Higher than most if not all R9 390 as I recall, definitely higher than any R9 290, but not quite as high as the RX480) - which just MAGNIFIES my point as they should be completely destroying anything else AMD on both ETH and ZEC if the protocals were compute-bound, but in actual fact the RX 480 hashes ETH noticeably better and is close or better on ZEC from the benchmarks I've seen posted.
Apparently HBM 1 has some latency issues that make it quite a bit slower than it's "raw memory access speed" would indicate, which doesn't apply when comparing various cards that all have GDDR 5 to each other.
Scrypt GPU mining ended in the fall of 14 without private kernels. x11 started up shortly there after, became unprofitable at the beginning of winter. Gridseed weren't ASICs either, the first ones weren't very profitable or good. You may have just remembered those little USB things coming out and thought 'well those were ASICs', they weren't. There were a lot of really bad ASICs. Gridseeds were never a good deal.
Unless you were running private kernels yourself, it wasn't happening.
What other algo are you looking at that's mature? Dagger doesn't count. That's a very niche scenario and it's bound almost exclusively by bus width. The GPUs never get a chance to even be close being fully utilized.
R9-290 has a 512bit bus as was already mentioned.
Who tests GPUs on sha-256? How about trying something remotely relevant to the discussion like say NeoS, Lyra2v2, or even x11. People haven't made optimized miners for Sha in years. As mentioned before if you're talking about 'theoretical usage' scenarios, video games are a very good example of that as GPUs are made to run as fast as possible on them.
Memory usage doesn't need to be about bandwidth or bus width, it could just be the total memory usage as well. Not just that, it doesn't need to be restricted JUST to throughput, it can utilize memory and still do a lot of processing on GPUs. At this point though you're just making shit up and theorycrafting again.
You can blame latency all you want, but Fury not only has a 4096 bit bus, but also gobs of memory bandwidth, it's not eight times faster then R9-290 or even twice as fast. It's not just all about memory speeds here or even latency.