Yes I'm using bitcrack. RX 580 89MKey / Sec
>cuBitCrack-0.30 -d 0 -t 512 -p 1024 1CABDYTie48wXV93XJ4Bdk7MFSTyshTXxg
[2019-08-20.06:44:00] [Info] Compression: compressed
[2019-08-20.06:44:00] [Info] Generating 16,777,216 starting points (640.0MB)
GeForce GTX 980 2059 / 8192MB | 1 target 153.59 MKey/s (1,694,498,816 total) [00:00:09]
>clBitCrack-0.30 -d 0 -t 512 -p 1024 1CABDYTie48wXV93XJ4Bdk7MFSTyshTXxg
[2019-08-20.05:50:09] [Info] Compression: compressed
[2019-08-20.05:50:11] [Info] Generating 16,777,216 starting points (640.0MB)
GeForce GTX 980 1024 / 8192MB | 1 target 67.72 MKey/s (671,088,640 total) [00:00:07]
>oclvanitygen64.exe -v -D 0:0 -F compressed 12345678
Device: GeForce GTX 980
Grid size: 4096x4096
[59.23 Mkey/s][total 402653184][Prob 0.0%][50% in 2.9h]
y, it sad

Just optimizing the compiler Cuda has gone a lot ahead.
Look for example:
https://arxiv.org/ftp/arxiv/papers/1005/1005.2581.pdfon the other hand... new release JohnTheRipper 1.9
https://www.openwall.com/lists/announce/2019/05/14/1The release of the new version 1.9.0-jumbo-1 took place more than four years after the release of the previous 1.8.0-jumbo-1.
In the new version of John the Ripper, developers abandoned the CUDA architecture due to a decrease in interest in it and focused on the more portable OpenCL framework that works great on NVIDIA graphics cards.
opencl trash? or Brichard19 need to train more? unclear