3) i rent an RTX 4090 but no matter the options I use, it doesn't utilize the full RAM of the GPU...why is that? is it from my side or a bug in your updated version?
As the above post explained it in too many words, using the GPU memory is not beneficial at all. The highest performance occurs when 0 (zero) bytes of memory are being used. The highest throughput is a fine balance between the GPU clock (cycles/s), the kernel implementation, the launched grid size, and the size of the on-chip registers, L1 cache, and L2 cache. This balance greatly differs from one GPU to another, but in all cases, filling up the GPU memory with a gazillion kangaroos is a very bad idea, as it slows down the computations and also greatly increases the DP overhead, for no good reason.
In essence, the less kangaroos, the better.