Post
Topic
Board Development & Technical Discussion
Re: BSGS solver for cuda
by
studyroom1
on 15/10/2021, 10:27:15 UTC
i think i found the problem the information which program is pulling from device is wrong or these are max value which intentionally hardcoded in program , Ethar can you please set all dynamic , i mean device should report all parameters

Found 1 Cuda device.
Cuda device:GeForce RTX 3080(4095Mb)    wrong
Device have: MP:68 Cores+0                    wrong
Shared memory total:49152                      i guess this is system memory but avaiable is 128GB
Constant memory total:65536                    not sure how calculate this one

i am not sure but MP is unit of AMD cards and cuda for Nvidia , and cuda is 8k+ in 3080 but not sure what is 68 cores here
so many confusions
Program used cuda driver api(not runtime api that ussualy used) and code for GPU writed on ptx.
cuda.lib that used to call cuda driver api even x64 version alwayse return 32bit values.
In that case you can`t use/allocate GPU memory more than 2**32bytes
Also cuDeviceTotalMem() return 32bit values of memory that is why you see 4095mb
I write about this issues to nvidia few times but according to them they have no problem)
if you are looking to cuda.lib you will fined unofficial commands like cuDeviceTotalMem_v2 and other.
All this commands have prefix _v2 and this comands return correct 64bit values.
But nvidia say that they does not have commands with prefix _v2 ))
It is about limitation of 2**32 bytes GPU memory
About Device have: MP:68 Cores+0, here 0 because i didn`t add Ampere to programm:
Code:
Case 2 ;Fermi
            Debug "Fermi"
            If minor=1
              cores = mp * 48
            Else
              cores = mp * 32
            EndIf
          Case 3; Kepler
            Debug "Kepler"
            cores = mp * 192
            
          Case 5; Maxwell
            Debug "Maxwell"
            cores = mp * 128
            
          Case 6; Pascal
            Debug "Pascal"
            cores = mp * 64
            
          Case 7; Pascal
            Debug "Pascal RTX"
            cores = mp * 64
          Default
            Debug "Unknown device type"
        EndSelect
by the way it need only for information and nothing more
to get corect number of cores need add only this
Code:
         Case 8; Ampere
            Debug "Ampere RTX"
            cores = mp * 128
          Default
            Debug "Unknown device type"


Thanks man for the information , can you please fix memory & ampere issue? is it possible ? and recompile it as i am unable to compile it via pure basic , free version have limitation