I think found the best method to calculate intensity and threads for GPU's to mine cryptonight_v8. Almost is working for me with 3 different GPU's.
First found total "compute units" or "ROPS" of GPU to use.
Example 28 compute units. 2Gb GPU Ram
Then rest 2 for internal work of GPU and OS needs. 28 - 2 = 26.
If use total compute units GPU don't have sufficient memory to work.
26 Compute units. Then multiply * 2. 26*2= 52 is the intensity for a GPU with 2Gb ram and 28 compute units and using 1 thread. With this configuration GPU works with maximum ram that can do. ~1700 Mb.
If we want to use 2 threads don't multiply * 2. Every thread is ~850Mb of memory used. Intensity 26 and threads 2
If the GPU have 4Gb we can use intensity = 52 = 2Gb and threads 2 = 2Gb (total 4Gb ram) or intensity=26*4= 104 and 1 thread.
Worksize is minimun 8 and works fine in multiples of 8. 8/16/32.......the greater worksize is, greater time to found a result.
With a 1Gb GPU is easy. 28 - 2 = 26 intensity and threads 1 ~850Mb memory used, or 26/2=13 intensity ~425Mb memory used and threads 2 = 425Mb*2 = 850Mb memory used.
I hope this can be useful for some cases.
I use this variable in batch file: setx GPU_FORCE_64BIT_PTR 0