Let me see if I understood correctly, where I lost you.
you can parallelize by buying more computers.
Your ability to parallelize depends on the number of computers you have
and on the type of problem you want to solve.
If the problem you want to solve is finding the correct nonce for a bitcoin block to meet its target, you can parallelize very nicely. Generally speaking, every problem which consists of brute-force attacks is nicely parallelizable. Also, multi-dimensional problems can be parallelized very nicely, for example matrix mulitplication. Let us assume we multiply a 10x10 matrix with another 10x10 matrix. When I have 10 computers instead of 1 I will be (nearly) 10 times as fast. Even having 100 computers may help (one for every element of the result matrix). What about 200 computers? Still good, since I now could split the calculation of each sum into two halfs. What about 1 billion computers? Well, probably there is a limit to the degree to which matrix multiplication is parallelizable.
This observation may motivate a search for problems, which cannot be parallelized very well.
For example, assume you have a number x and want to calculate sha(x). Fine, we have an algorithm for that. But now let us calculate sha(sha(x)). How would we parallelize this? Actually we FIRST have to calculate sha(x) and THEN we have to evaluate the hash function again. It does not help me to have an additional computer. We know of no shortcut for parallelization. (There are functions, where we know shortcuts, like in addition: Many invocations of an addition can be expressed as multiplication, but with, for example sha, there is an issue).
So the idea was to replace the proof-of-work problem in Bitcoin (which currently is highly parallelizable) by a problem which is not parallelizable at all. (As outlined, this is only part of the concept, because a completely serialized proof-of-work would not lead to the probabilistic behaviour we want).
Hope I got the point where I lost you.