Im having the same issue though with grid size for the 3090, I just took #cuda cores/2 and used that. Not sure if its even close but otherwise I get the illegal memory access error and error about #'s missing.
GPU: GPU #0 NVIDIA GeForce RTX 3090 (82x0 cores) Grid(5248x128)
[3777.30 Mkey/s][GPU 3777.30 Mkey/s]