Hi,
After all the miner is working, I've succeed to find blocks both on testnet and main net but it takes some time. The trick is that the above mentioned throughput parameter has to be lowered or the number of threads (which was hard coded to 4) must be lowered. Currently with a single 750ti I can get about 16Mhash/s which was enough to get around 1 block/hour after running 3-4h.
In case someone else want to give a try, here is an updated binary where the number of threads can be specified as a parameter after the port number.
actually it is a bug and shouldn't be a free parameter (well not yet actually and not like that) it should be "num_processors" ie in the context, the numbers of cards (put it in the printf but forgot to put it in the routine). putting a number smaller than the actual number of card will run on a fewer number of cards