I then ran into another weird problem when compiling the kernels
For the record, here's the error:
Write buffer vPrimes, 6302644 bytes. Status: 0
Compiling kernel... this could take up to 2 minutes.
ptxas error : Entry function 'CalculateMultipliers' uses too much shared data (0x5078 bytes + 0x10 bytes system, 0x4000 max)
What GPU? It seems it only has 16 kilobytes of local memory, whereas I've programmed the miner with the assumption of 32 kilobytes, which is what ~all AMD GPUs have.
It's a NVIDIA Corporation GT215 [GeForce GT 240]. It's a few years old, so might not be the best choice. Just happens the only one I can easily test on.
It seems tha Nvidia cards with a "compute capability version" < 2.0 have only 16KB of local memory, all above 512KB. See
http://en.wikipedia.org/wiki/CUDA#Supported_GPUs for a list which GPU has which compute capability version.