Now suppose it is you and me and some 40 other guys with the same hash performance as you have in your example. Suppose I want to claim 100 BTC bounty for every block instead of the standard 50 BTC. Chances are next to 100% that I will manage. Since, on the avaerage, I am faster than you (and all the other guys combined), I will dominate the longest chain in the long run.
Ok, you're definitely confused about the capabilities of someone with >50% of the hashing power. He cannot do things like put a 100BTC generation transaction per block. Such blocks are invalid and will be rejected by the network (particularly the nodes that actually accept bitcoins for goods and services). In other words, these will not be
Bitcoin blocks - the rest of the network will happily continue to build the Bitcoin chain, while he enjoys his own isolated make-believe chain.
let's say that in this system a person with a computer finds one block per month. Then four people with a computer each should find a total of 4 blocks per month, right?.
Why?
The perspective I am looking at is not the single block but the development of the block chain.
As soon as one of the four people found a block, this person broadcasts this block and the puzzles the other three had been working on becomes obsolete (at least that's my understanding on what the reference implementation does). Only a cheater would be interested in continuing to work on "his" version of the block; however, having lost the block in question, chances are getting higher that he will not manage to push "his" version of the next block.
Four people with a computer would rather find a total of 4 blocks in FOUR months - and these blocks would be the four blocks chained next to each other, ie a block chain of length 4.
Does your system maintain the notion that each given block is found by some specific individual? If so, if 4 people find 4 blocks in 4 months, it means each person finds 1 block in 4 months, contrary to the premise that each person finds 1 block per month...
If it wasn't clear, in this example the intention was that the 4 people aren't all there is, there are 4000 more similar people each finding 1 block per month, for a total of 4000 blocks per month. So again, if 4 people find 1 block per month each, then between them they find 4 blocks per month.
And, once more - pools are not a security threat ...
How do you prevent a pool from pooling more than 50% of the hashability and then imposing its own understanding of Bitcoin upon the remaining nodes?
Because the pool shouldn't be the one deciding what goes in a block. As was explained, a pool is essentially just an agreement to share rewards. Even in centralized pools (and like I said there are decentralized ones), all the operator needs is to verify that miners intend to share rewards, by checking that they find shares which credit the pool in the generation transaction. But everything else can be chosen by the miner.
This is a future fix, however - currently centralized pools do tell miners what to include in the block. But miners can still verify that they're building on the latest block, so they can detect pools attempting a double-spend attack (which is the main thing you can do with >50%).
Block finding follows a Poisson process, which means that the time to find a block follows the exponential distribution (where the variance is the square of the mean). The variance is high, but that's an inevitable consequence of the fair linearly scaling process.
Again you are raising an important aspect. The task thus is to see that two goals can be balanced: Linear scaling and small variance.
Variance in block finding times is unwanted, but I think most will agree it pales in comparison to the other issues involved. Especially since there are basically two relevant timescales - "instant" (0 confirmations) and "not instant". The time for 10 confirmations follows Erlang(10) distribution which has less variance.
I agree that the Poisson process is a very natural solution here and prominently unique due to a number of it's characteristic features, such as independence, being memory and state less etc. A non-parallelizable PoW will certainly lose the state-less property. If we drop this part, how will the linear scaling (effort to expected gain) and the variance change? We will not have all properties of Poisson, but we might keep most of the others. The question sounds quite interesting to me.
By all means you should pursue whatever research question interests you, but I expect you'll be disappointed both in finding a solution satisfying your requirements, and in its potential usefulness.