Hey,
Thanks for the video, really informative!
In the video I use an Nvidia H100 and it floats around 3500 mk/s
Is that card really that slow ? For 35k USD i expected much more.
Without the proper mathematics all the computer power in the world is just wasted energy so if your gonna shell out the money to run clusters make sure your math is on point =D
Also one other thing not mentioned often, but when you run large clusters, you need to make sure no mistakes can happen, so you either underclock/undervolt the chips/gpus/fpgas/whatever or repeat chunks around, etc. Imagine running whole floor of machines for few months and one machine craps out on the chunk with the winning key ;(
Your welcome =D I hope it can help get more people to test the integrity of the blockchain & bring a bit of awareness to how it functions.
I think there are a number of huge optimizations that could be made to increase the mk/s for the H100's but it would require some changes to the .cu files & headers & whatnot which is over my head haha. I'm definitely gonna play around anyways. All of that bandwidth & memory is not being leveraged properly most dolphinately.
So many questions come to my mind when you brought up failures when running clusters. Would you be able to use the save work function to recover? I wonder what types of redundancies could be put in place, like you mentioned underclocking/volting. From what I was just reading when they trained Bloom 176b LLM they had some failure issues too. I was just reading their paper on the model so i'll copy paste what they were saying:
"During training, we faced issues with hardware failures: on average, 1–2 GPU failures
occurred each week. As backup nodes were available and automatically used, and checkpoints were saved every three hours, this did not affect training throughput significantly."
They were using 384 NVIDIA A100 80GB GPUs (48 nodes) with 32 spare GPUs for about 3 & a half months! Trained on nuclear energy which is awesome too.
Really crazy what they did with AI and those top end video cards! Still not much changed in the last 15 years - ATI5970, 3200 cores at 800 Mhz,
today top card, only 5-6 times that, with double the frequency, and 20x times the cost. Ok, it has crazy amount of RAM and bandwidth, i am sure
it is important for harvesting people private data, too bad is not much use for us. I honestly believed some 20 years ago when trying to bruteforce
a TEA key(only 8 bytes), that in the future we will be able to flip 64 bits in a second with ease.
The biggest cluster i have personally run was only 20 video cards(10 machines), a friend run like 1500 cards, and that was a nightmare to maintain.
As for scaling up, the only option is to have some kind of server/arbiter/job manager, written in higher language that splits the keyspace into very
small chunks that serve to nodes for crunching, and then verify, barely trust the results. Because hw tricks are simply not enough if you'll be drawing MW from the power grid and pay people to maintain your racks.