Re: Solving ECDLP with Kangaroos: Part 1 + 2 + RCKangaroo

That's why I don't want to support old cards: if I support them officially but not optimize you will blame me that they have bad speed.
But feel free to modify/optimize sources for your hardware

I'll be honest, your kangaroo finds the key faster than mine or jlp. Yes, the speed shows less, but in the end it finds it much faster.
Works even on 1660 super (~600Mkeys/s).
Thanks for sharing.

You can improve it in many ways.
For example, since L2 is useless for old cards, disable setting persistent part of L2 and set
#define PNT_GROUP_CNT 48
and change these lines in KernelB:

Code:

//calc original kang_ind
u32 tind = (THREAD_X + gr_ind2 * BLOCK_SIZE); //0..3071
u32 warp_ind = tind / (32 * PNT_GROUP_CNT / 2); // 0..7
u32 thr_ind = (tind / 4) % 32; //index in warp 0..31
u32 g8_ind = (tind % (32 * PNT_GROUP_CNT / 2)) / 128; // 0..2
u32 gr_ind = 2 * (tind % 4); // 0, 2, 4, 6

Quote from: ee1234ee on Today at 12:58:52 PM

May I ask why, my 4060ti graphics card has a speed of just over 2000
CUDA devices: 1, CUDA driver/runtime: 12.6/12.5
GPU 0: NVIDIA GeForce RTX 4060 Ti, 16.00 GB, 34 CUs, cap 8.9, PCI 1, L2 size: 32768 KB
Total GPUs for work: 1
Solving point: Range 76 bits, DP 16, start...
SOTA method, estimated ops: 2^38.202, RAM for DPs: 0.367 GB. DP and GPU overheads not included!
Estimated DPs per kangaroo: 23.090.
GPU 0: allocated 1187 MB, 208896 kangaroos.
GPUs started...
MAIN: Speed: 2332 MKeys/s, Err: 0, DPs: 345K/4823K, Time: 0d:00h:00m, Est: 0d:00h:02m
MAIN: Speed: 2320 MKeys/s, Err: 0, DPs: 704K/4823K, Time: 0d:00h:00m, Est: 0d:00h:02m

Do you expect better speed? Why? 4090 has 128 CUs, 4060ti only 34.

Hello, can you tell me in which file you can find L2 and what you have to deactivate?
Thank you

I found it GpuKang.cpp that
Is that right there?

Quote

//allocate gpu mem
   //L2
   int L2size = KangCnt * (3 * 32);
   total_mem += L2size;
   err = cudaMalloc((void**)&Kparams.L2, L2size);
   if (err != cudaSuccess)
   {
      printf("GPU %d, Allocate L2 memory failed: %s\n", CudaIndex, cudaGetErrorString(err));
      return false;
   }