1060 6GB card, 25 seconds total time, start to finish.
Start to finish tests is not a good way to compare 2 programs in a simple problem. My program use time to setup a good grid in order to solve the harder problem.
If I remove ptx and support for other cards than compute 6.1 I gain 10 sec's++