So are there any providers with Blackwell GPUs yet?
NB regarding RCKangaroo - it runs 1.5x slower than my kernel, it's obviously not optimized other than tweaking usage of the fast caches to minimize memory latencies. Inversion is a performance killer, using old SafeGCD. You'll get better speeds and faster solve times if running a normal 3-kang algorithm (with no cycles headaches) instead, with low-level optimizations. Why? Because the faster speed of simpler code compensates for the slightly higher algorithm complexity (which is already around 1.4 - 1.5 in secp256k1 specific case, not 1.7, and definitely not 2.08 like in 2-kang case).