I'll try tomorow with a H100 (if free) just to see the performance.
At my job we have only gpu dedicated to scientific calculus which may be less adapted to integer calculus than a 4090.
In any case it will require a large number of boards and considerable amout of time to get the BTC

I tested a H100 SXM card and got 13,600 MKey/s, with a different Kangaroo program. I am curious what kind of speeds you will achieve.
I never mention those because they are extremely expensive to buy and expensive to rent on vast.