I then ran into another weird problem when compiling the kernels
For the record, here's the error:
Write buffer vPrimes, 6302644 bytes. Status: 0
Compiling kernel... this could take up to 2 minutes.
ptxas error : Entry function 'CalculateMultipliers' uses too much shared data (0x5078 bytes + 0x10 bytes system, 0x4000 max)
What GPU? It seems it only has 16 kilobytes of local memory, whereas I've programmed the miner with the assumption of 32 kilobytes, which is what ~all AMD GPUs have.