I'll try tomorow with a H100 (if free) just to see the performance.
At my job we have only gpu dedicated to scientific calculus which may be less adapted to integer calculus than a 4090.
In any case it will require a large number of boards and considerable amout of time to get the BTC
