Re: [New Bounty] 6BTC for PrimeCoin CUDA miner!

Quote from: Koooooj on July 17, 2013, 06:05:30 AM

I think I have a pretty good understanding of the algorithm from a mathematics standpoint. I don't know CUDA (or OpenCL, which I would suggest--it runs on both NVidia and AMD with reasonably comparable performance) and I don't know the getwork protocol, but I can give some insight into the mathematics, at least.

The first thing to understand is that if the block hash is H then H, 2H, 3H, 4H, etc are all valid origins, and that H-1, 2H-1, 4H-1, 8H-1, etc. makes a valid Cunningham chain (if all of those numbers are prime and the chain is long enough; this is for a Cunningham chain of the first kind. For the second kind replace the "-" with "+". Most of the math here will assume Cunningham chains of the First kind, but it is identicle for chains of the Second kind. If H ± 1, 2H ± 1, ... are all prime (i.e. a chain of the first and second) then the chain is a bi-twin chain). That means that when you are eliminating composite origins you are also eliminating numbers that could appear later in the chain. This will become useful later.

Another thing to notice is that H, 2H, 3H, 4H, etc. are very unlikely to be prime. If you look at the probability that a random number on the order of 2^256 is prime then it is about 1 in 177; that makes the probability that 8 randomly chosen numbers are all primes about 1 in 1 quintillion--impractically low. That probability is greatly improved by using a "primorial" number--the product of many small primes. If you take pRimorial= R = H * 2 * 3 * 5 * 7 * .... * 47 (the first 15 prime numbers; quick tests on my own code suggest that this is about ideal, although some tweaking and testing is in order) then you increase the size of the number you are testing, but you also make the numbers much more likely to be prime. Thus if you look at R-1, 2R -1, etc then you know that each of those numbers is not divisible by 2, 3, 5, ... , 47 because they are 1 less than (or greater than, if you are looking for Cunningham chains of the Second kind) a multiple of each of those numbers. This means that the smallest factor NR ± 1 can have is 53, where N is any integer.

From here you execute the Sieve. The standard Sieve of Eratosthenes is very straightforward, but it is designed to be executed starting at 1 and then continuing over the small, consecutive integers, such as is shown on the GIF on the Wikipedia page. We are not interested in consecutive numbers, and they are far from small, so a change must be adopted. This change requires knowledge of modular arithmetic. To execute the sieve we first take a small prime number (starting with 53 if our primorial ended with 47) and take R (mod 53). If (and only if) R ≡ 0 (mod 53) then R is divisible by 53, and all future R will be divisible by 53. In that case, no NR-1 will be divisible by 53 and none should be eliminated.

In every other case, R ≡ r (mod 53), where 0 < r < 53. If you have found that r = 1 then that shows that R-1 ≡ 0 (mod 53) and is therefore not prime (and cannot be a member of a Cunningham Chain of the first kind). If r = 52 then R+1 ≡ 0 (mod 53) and is therefore not prime. If r is any other number then you have to search. Since r is relatively prime to 53 (i.e. gcd(r,53) = 1) it can be shown that N * r (mod 53) will cycle through every number less than 53. Thanks to modular arithmetic, N * r ≡ N * R (mod 53), so you can search for the N for which N₁ * r ≡ 1 (mod 53) and for which N₂ * r ≡ 52 (mod 53). You can eliminate N₁R - 1 as a candidate for chains of the first kind, and you can eliminate N₂R + 1 as a candidate for chains of the second kind.

The power of finding this N is that you then know that if N * r ≡ 1 (mod 53) then (53 + N) * r ≡ 1 (mod 53) since N * r is cyclic with a period of 53. Thus, you can quickly go through, just like in the classic form of the sieve of Eratosthenes, and eliminate every 53rd N. You would then repeat this process for subsequent prime numbers (59, 61, 67, ...), until sieving against another prime number is not worth it--it takes too long and removes too few numbers.

The next thing you can do to eliminate candidates is to look for clusters (you can actually do this while sieving if you want, but I have separated it since it is a separate topic). Since the goal is to find chains of (currently) 8 primes it makes no sense to waste time performing a long primality test on a chain of numbers that have passed the sieve that is 7 or fewer long. Thus, you can look through the sieve for chains in the form N-1, 2N-1, 4N-1, ... 128N-1 where all of those numbers have no small factors (which would have been found in the sieve). When you find these, they are passed to the formal (probable) primality tests.

The main test used in this step is the Fermat Primality Test, based on Fermat's Little Theorem. It states that a^P-1 ≡ 1 (mod P) for all prime P. (It does not state that if that identity holds true then P is prime, but in practice prime numbers are far more likely than "Fermat liars"). For the reference client "a" is chosen to be 2, since 1 is trivially a Fermat Liar for every composite P and since a larger "a" would take more time for no benefit. To perform the operation it is best to use "fast modular exponentiation," which is shown in pseudocode in the Modular Exponentiation Wikipedia page.

The second test used is the Euler Lagrange Lifchitz test (a brief reading of the code suggests that it is not actually used, but it is implemented. You can skip this paragraph and not lose much). It states that if P is prime and P ≡ 1 (mod 4) then 2P + 1 is prime if and only if 2^P + 1 ≡ 0 (mod 2P + 1). There are other forms as well: if P ≡ 3 (mod 4) then 2P + 1 is prime if and only if 2^P - 1 ≡ 0 (mod 2P +1). There are forms for 2P-1 as well; if P ≡ 1 (mod 4) then you use 2^P-1 - 1 ≡ 0 (mod 2P - 1); otherwise it's 2^P-1 + 1 ≡ 0 (mod 2P -1). These are not much, if any, faster than just using the Fermat primality test--they all rely on a single modular exponentiation, and in all cases the base is 2, and both the exponent and the modulus are on the order of P. Lifchitz mentions a way to look for Cunningham chains of the second kind "in cluster" by taking 2^n_i-1-1-1 ≡ 0 (mod n * n₁ * n₂ * ... * n_i) where n_i = 2n_i-1-1. This test only looks for one kind of chain, though, and requires that n ≡ 1 (mod 4). Also, it is a much more expensive modular exponentiation since the modulus is the product of 8 or more large numbers.

When a prime chain is found that is at least as long as floor(difficulty) (e.g. at least 8 when the difficulty is 8.9) it is necessary to calculate the fractional difficulty. I haven't verified this in the source code, but in the design paper he takes r = (2^P-1 (mod P) ) as the Fermat remainder, then defines the fractional length as (P - r) / P. P, in all of this, is the number that would have been next in the chain. I'm not sure which half of a bi-twin chain has to be used (note that a bi-twin chain can have an odd length if the two Cunningham chains it is made up of differ in length by 1). If the integer chain length plus the fractional length is greater than the difficulty, it is a valid share.

I hope this has been easy enough to follow, while still being informative. If anyone has questions I'll try to answer them. Note that my understanding is based partially on looking at the code but also largely on my own insights; I've likely missed something major, but there is a slim chance that I've offered an improvement on what's already being done simply from coming at it from a different persepctive. If you feel this satisfies part of your bounty for money, I can receive donations at 1FbWVR3LpGoWp5rc4SJc3cHo42ALrYhcxC. Not really doing this for the bounty, though. I just want to be as much help as possible towards making GPU mining a reality, to help neutralize the botnet and massive VPS farm threats.

Awesome explanation, sending 0.25BTC over. I think I have a good enough explanation of the math to go forward, however (and this is a really dumb question), the block that has to be a factor of the origin of the prime, we just convert the hex of the previous block (so say we are trying to mine block 54320, we would convert the hex of block 54319 to decimal, correct?)

Oh, and another dumb question... What does '≡' mean?