Post
Topic
Board Development & Technical Discussion
Re: Checking brainwallet
by
CrunchyF
on 04/05/2022, 09:43:09 UTC

Interesting approach - it seems other solutions focus more on larger tables - does the speed advantage of having the tables in shared memory outweigh the disadvantage of doing 15 additions vs for example 11 additions with 22 bit tables?

Here's a free tip that might gain some speed: Instead of doing pure addition, you can do addition/subtraction from a "middle" value like 0x8000800080008000800080008000.... - you save one bit of table size, and negating a point is quite easy.

Yes it is my first Cuda project and i think that an optimisation is possible with the use of the different type of memory of the GPU (Global and shared memory) beacuse the time access are different.

To copy my tables in the GPU mem  i use the standard function :

cudaMemcpy(b, a, ..., cudaMemcpyHostToDevice);
so i'm not quite sure in which type of GPU mem the tables are loaded.

I use  a 16bit indexed table because it's easy to cast a 256 integer array in 16 parts with the (uint16_t *) operator and u d'ont have to code a specific function .

the cost of the split function of 256bits in 22bits chunk, is not probably not negligeable.
 
My 2 table (x and y) have a size of 2*32MB so if u use a 22bits table u will be around (2^6)*2*32MB = 4096 MB to large for my RTX3070.

But anyway, as u say the cost of finding a value in a 4GB table compared to a  a 32MB is probably (i dont know really)  not the same.

I focused on small table, because i wanted to have big empty space for storing the biggest bloomfilter possible.

the big optimisation would be in the coding of a well syncronised batch modular inversion because with the JeanLuc Pons code you are obliged to wait that evry thread of the batch finish the multiplication
But the algorithm not seems to be easy

Quote
Here's a free tip that might gain some speed: Instead of doing pure addition, you can do addition/subtraction from a "middle" value like 0x8000800080008000800080008000.... - you save one bit of table size, and negating a point is quite easy.

Interesting ... but only for big table no?