There is some tips to speed-up keyhunt-cuda (rotor-cuda):
Apply this then you need less grid size, like 4096x512 will be enough for 4090:
https://bitcointalk.org/index.php?topic=5244940.msg63526413#msg63526413Also change this:
__device__ __noinline__ void CheckHashSEARCH_MODE_SA(uint64_t* px, uint64_t* py, int32_t incr, uint32_t* hash160, uint32_t* out)
{
switch (mode) {
case SEARCH_COMPRESSED:
CheckHashCompSEARCH_MODE_SA(px, (uint8_t)(py[0] & 1), incr, hash160, out);
break;
case SEARCH_UNCOMPRESSED:
CheckHashUnCompSEARCH_MODE_SA(px, py, incr, hash160, out);
break;
case SEARCH_BOTH:
CheckHashCompSEARCH_MODE_SA(px, (uint8_t)(py[0] & 1), incr, hash160, out);
CheckHashUnCompSEARCH_MODE_SA(px, py, incr, hash160, out);
break;
}
}
to this because doing switch-case in kernel is very bad idea:
__device__ __noinline__ void CheckHashSEARCH_MODE_SA(uint64_t* px, uint64_t* py, int32_t incr, uint32_t* hash160, uint32_t* out)
{
CheckHashCompSEARCH_MODE_SA(px, (uint8_t)(py[0] & 1), incr, hash160, out);
}
also maxFound can be completely removed to search puzzle, because we need only one return result anyway
Thanks.