This is a benchmark post with speeds for Kangaroo GPU Solver. All the tests were made with default DP and default grid size (calculated by a program). I guess that some plays with DP and grid size could change (increase or decrease the speed). Is somebody knows the optimal values, please let us know.
Card Model Grid size DP Tested speed
---------------------------------------------------------------
GTX 1050 Ti Grid(12x256) DP 16 115 MKey/sec
GTX 1080 Ti Grid(56x256) DP 15 500 MKey/sec
Tesla T4 16Gb Grid(80x128) DP 14 565 MKey/sec
RTX 2080ti 11Gb Grid(136x128) DP 13 1225 MKey/sec
Tesla V100 32Gb Grid(160x128) DP 13 1420 MKey/sec
---------------------------------------------------------------
It looks like the Kangaroo solver works fine with GeForce family cards but not with Quadro/NVS family. For example Tesla V100 is 4-5 times more expensive than RTX 2080ti, but the speed is only 15% higher.
And we also can learn from this table that the most efficient card for kangaroo is RTX 2080ti 11Gb which has the best performance in terms of speed/cost.