doesn't work..
I am testing on gtx 970

Maybe It is because different numbers of SM on 970, but I hardcoded BLOCKS and THREADS number, due to silentarmy algo worksize=NR_ROWS, so in cuda blocks=NR_ROWS/THREAD_PER_BLOCK. I don't have 970 cards, but a have 980, will take a look tomorrow.
UPD. It seems you are running with bad options, because launching cuda_tromp_STUB
Start options are buggy, only works with
./nheqminer_cuda_sa -t 0 -cv 0 -cd 0 -cs -l SERVER:PORT -u USERNAME
UPD2.
Tpruvot added my port to his repo. I think, he fixed launch bugs.
https://github.com/tpruvot/nheqminer/tree/cuda-silentarmy