The attackers precise strategy for selfish mining is stated by Vitalik Buterin at bitcoinmagzine.com, which is excerpted below:
Suppose the attackers portion of the network hashpower is X, and when there are two competing public chains the portion of the network that picks up on the attackers chain is Z.
So X is the hashpower of the attacker and Z is the probability that honest miners decide to join the adversaries chain.
State 0: If the attackers private chain is the same as the public chain, mine on the private chain. With probability X, the attacker discovers a block and advances to state 1 (private chain 1 block ahead). With probability 1-X, the public network discovers a block, and the attacker resets his private chain to the public chain.
This does not make sense to me. Why would the adversary have a private chain? There is no advantage for him.
In State 0 should be only one chain where everyone mines on top of it. The rest of your statement is correct. The adversaries probability to find a new block is X (State 1). He won't publish it but keep it to himself. He will also start to mine on top on his new 'private' chain. The rest of the network is still trying to find a new block and therefore wasting hashingpower. In case the honest miners find a new block (probability 1-X), we remain in State 0 and the game starts from the beginning.
State 1: If the attackers private chain is 1 longer than the public chain, mine on the private chain. With probability X, the attacker advances to state 2 (private chain 2 blocks ahread). With probability 1-X, the public network discovers a block, setting the system to state 0′.
The attacker's chain is 1 longer.
At state 0, with probability X, the attacker will be 1 block ahead and keep it unexposed. So the public network will work on the block continuously. When it comes to State 2, I doubt that the probability is still X, because the public network has been working all along, while the attacker starts after 1 block ahead and he/she may needs some time to collect transactions.
In State 0 the attacker and the honest miners are working on the same chain. The attacker is not 1 block ahead. That is the definition of State 0.
When you are in State 2, the attacker has already 2 blocks and holds them back. The probability that the adversary finds a new block is still X and the honest miners probability to find a new block is 1-X.
In case the adversary finds a new block, he will be 3 blocks in front and hold them back (State 3).
If the honest miners find a block they will be 1 block behind. What the adversary does is to release his 2 blocks. Since his chain is longer, we is going to win and receives the reward.