is there maybe some cuda environment setting i'm missing or havent set ?!
Do you have the same issue with the 1.17 and CUDA 10.0 ?
However I have significant speed loss if I am searching for a large number of prefixes.
When searching, are you using only compressed, both, only uncompressed ?