Now ... 0.3ms is too small IMO - doubling to have one job pending - 0.6ms is also too small IMO (the BFL queue design says 20 work items)
So if your queue only allows one work item waiting in it then the code still has to hit a target, that it is going to (sometimes/often?) be late for, due to USB and OS constraints
Thanks for an informative post.
I'm wondering why even use Linux to control chips via SPI? What is the point? I haven't worked on ARM recently, but I did on Xilinx and Microblaze. Going standalone for SPI and I2C access was a
major win in terms of power usage, I didn't even bother to measure speed: it was much faster, but not critical in what I did. Only lwIP is somewhat harder to use than the network interface through sockets.
Did Linux SPI driver had any recent major improvements?
I wonder what bitfury has to say about it, or will he just shut up and smile to avoid disclosing some other, much better, solution to the competition.
Very simple. But better code speaks for itself -
www.bitfury.org/chainminer.zip - you all can see there hardware definition in spictrl_hw subdirectory - there's spictrl.vhd file, this chip (for second spartan deployment - not rs485 bus like in racks) - altera cpld installed on every board and boards connected in chains. you can see HOW SIMPLE IT IS... and also benefits like predetermined addressing - i.e. I can always tell in frames which board is not working, etc, and easily track it in chain. works like charm. BUT - you have to understand that linux SPI can't do well with bitbanging - so basically you're moving in/out large frames - 4 KB or 16 KB - don't remember - better to look in code. so if you're sending tiny messages to every chip - mechanism of addressing and dispatching should be timeless and work relatively to spi bitstream that linux sends. this is what I achieved actually there - state machine of all spictrl chips on boards is driven exactly by bitstream sent from linux/spi. I see this thing to be superior to rs485 automated address assigning (or usb automated addressing) when something not works and you cannot immediately tell where the trouble is. Manual address assigning is big pain for bulk installations (i.e. with jumpers or whatever). I like when physical position determines address (as in this case).
2 kano - I think different ASIC vendors will make different protocols, because hey would look at problem differently... as from fpga times I think nobody actually tried to solve complexity of building rigs. say what you see now with Avalon - there's difficulties to build them :-) not goes fast.... what I want (why I mentioned passive solution) - that I can say order in Taiwan 10'000 boards and they are manufactured used automated production line, and then with minimal efforts assembled into devices, with minimal labor. Of course with added labor heatsinks and such stuff can be assembled - but that won't scale well to 500-kilowatt installations - so this is probably point where DIY-oriented people will have edge - because they could spend their time to tune equipment, however such tuning at large scale would produce significant delays as big farm maintainers unlikely will have as much incentive as DIYer would. But the main problem I see that with years passed paradigm should shift a bit and mining should produce about 20% ROI per year, not like current expectations to return money ASAP or within 6 month. This would produce situation where attack against network protected by proof-of-work mechanism would be prohibited not by say maskset/NRE cost (as point we are now - we have significant share of NRE investment and low demand for wafers), but with overall chain cost - including wafers AND time to produce (i.e. make it unfeasible even to start another miner project). But this is not today of course - first we should scale and "land" to 28 or 22nm tech node. Also it would be very nice that such device could be bought with guarantees in usual retail electronic shops like you can buy AMD GPU and not with some kind of "preorder" batch which is difficult to fullfill at this stage.
2 2112 - I don't understand why 100 ms is too short period ? Pipelining 20 things in work queue I think is not good - anyway what you need to pipeline is just to compensate communication lag - i.e. to make sure that hashing core is always busy. tolerance to seconds of delay is overkill I think. ARM cpu is perfectly capable to handle 50-100 ms latencies, however not microsecond-scale latencies. I won't laugh, and don't fear to disclose this to competition. There's really tons of little details in this project that actually put this project far ahead of anything I seen now :-) Anyway these would be known when devices will be delivered. This communication mechanism is just one nice finding, tested in real equipment that I liked. I am not against if others will replicate this approach instead of doing more expensive and complicated solutions.
2 mrb - yes - it is me, account is not hacked. I still control web site, and posted some sourcecode. If it is interesting - I may disclose bitstreams for FPGAs - I don't consider them as secret now. But really don't have time to polish sources. So if someone is interested to maintain github or opencores version control - I would be glad to upload. At this time I don't consider that it will impact any way my full-custom design - there's many more new fascinating things invented that are beyond these bitstreams, however fundamentals of rolled hasher design still have roots there.
Tech node is scaled 65nm - so drawn as 65nm but scaled down to 55nm _optically_. So to get 3.8x3.8 die I draw 4.2x4.2 mm sealring+padframe in GDS. Then it is scaled at foundry. I suppose that others who do 65nm has the same as this is trivial part and still labeled as 65nm. About power estimations - actually tytus took my early estimations when I had no core drawn for 65nm, but scaled down "optically" core from 150nm that I was worked before. Main cause to increase power - was that I used dynamic nodes (
http://en.wikipedia.org/wiki/Flip-flop_%28electronics%29 -
http://en.wikipedia.org/wiki/File:TSPC_FF_R.png - this one is example - TSPC flip-flop - single clock and holds bit in charge of transistor gate), but in 65nm later I have found that I should increase frequency to about 2 Ghz to make that work because of high leakages. I of course could use different threshold voltage transistors (it's available) - but then it kills performance. So finally I dropped dynamic nodes and switched to classic master-slave latch flip-flop. This in turn increased power consumption roughly 1.5 times and increased by about 20% core size. So efficency dropped. But this sacrifice I think is OK, because it dramatically decreases chances that device would not work - I have really good margins everywhere (these margins also tells that really if several tapeouts are affordable I can improve performance maybe twice, but - that's too much time - likely for further evolution this would be done). Also there exists unusual logic styles (resonant logic), that with sacrifice of area could give dramatic power systems - but that's mainly like analog design - so chances to get it working core with single tapeout are low.
But anyway - that means only that I work with tytus, I have trust in him, work with him and know him personally. This means that this project would likely get working chips. But I still clearly say that I haven't worked with Dave and cannot say how smooth this installation will go on his side. So if you're going to invest significant amount there - I think it is best to check it personally with Dave and clarify this yourself. If tytus is trusting Dave - it is his decision, and not mine. For me this 100 Th/s mine is bulk order that of course get good pricing for equipment. And also I see this as a good case how mining should be done - I expect in future expansion of more professional installations of mines - in special places with good electricity prices or at homes where this is used at heating... So it would be again distributed but at a larger scales... Don't think of it like in 2014 100Th/s mine would be something very rare and unique. It makes no sense to operate equipment where electricity is expensive and environment is hot. Those who have cheaper electricity would have edge over time, and significant edge. Please remember what Satoshi said - that mining in indefinite interval would be barely more profitable than electricity spent on mining. That is what all of you should remember, that all higher revenues you get are temporary in nature maybe years... but not forever :-) This means that efforts to design fancy gamer-style end-user devices makes sense only to pump money and doing that for fun. Because when you imagine collocation of massive BFL minirigs + their maintenance - you'll understand why I am saying this - as this product looks glossy, but have more expenses to maintain than rack or passive heater. So if you today compare solutions and check for best opportunity - this is right of course to hunt for best tech, but I think about next block halving you'll compare electricity prices in different locations and not only performance of equipment suppliers but also abilities to scale, maintenance, etc. Maybe at that time I could even disclose design as such chip and design methodology would be something not cutting edge, but well-understood and tech itself would not contain any magic for anyone. Progress is happening really quickly :-)
PS. What I am thinking also about future versions of chips - I can implement hardware chip protection-encryption. That chip would want remote activation code. This could give equipment owners possibility to collocate them without problems for example - as if activation code is not presented - chips won't compute. Before delivery to your collocation - chips can be fused with your key to verify it against challenge from you. Bitcoin itself allows to track nTime - that can be used for timed activation (i.e. solve blocks only when nTime <= - that it supplied by you in signed activation message to chip). This way also fully transparent and zero-trust mining markets could be made. When there's thirdparties that not own equipment maintain security in datacenters. This is just idea - it is neither easy, nor fast to implement it - so it won't be done in first chips, but if there's demand for this idea - likely in next generations I'll implement this. Basically in few words - chip knows its owner.