I've been googling for ages and I have read the "README" file at least five times but no one could I find the intensity setting K28x8 or similar actually mean.
I'm using the argument -l 28x8 but I might as well be burning my GPU right now because I have no actual idea what any of those numbers means and it is nowhere explained.
Could please, someone, explain how those numbers work so that I can use them properly on my GTX 660?
Thank you!
It would be a bit like explaining to a passenger how to land a plane. Wouldn't it be easier if I just showed him how to push the auto-land button? (yes, newer models have that feature).
To understand the terminology of launch configurations like -l K28x8 you would have to understand the CUDA programming model, what a launch grid is, what a thread block is, and how it consists of warps that are independently scheduled on your Kepler multiprocessor's for warp schedulers. And you would have to understand what parameters could make sense on your particular GPU architecture to achieve high occupancy. You would also have to know certain limits imposed by shared memory use and registers used by a given kernel.
Try auto-tuning first. Pass either -l auto, or no -l argument at all.
If that doesn't find a satisfactory configuration, we can talk about blocks and warps and the memory requirements.
the treatise linked to in my follow-up posting also has a bit of information.
Christian
Here's what I found as a noob... My card likes multiples of 160. Used to be 80x2, then 10x16, now 5x32 is the best. So find your "magic number" by running autotune several times and look at the first four digit hash number it gives you, mine was 5120, then divide by 32 to get your magic number. Then experiment with multiples. Has always worked out best for me and I have no idea why.