CAP does not apply to distributed systems. Sure, state can be inconsistent, but that's not necessarily a problem. Consistency is usually over-rated by academics, and that's how one often ends up with misleading theorems. Any two computers communicating over a distance will be out of sync. The basis of blockchains is Lamport's work on how communication can happen in such a distributed system. In essence there needs to be consensus on a
partial order of events. Total consensus on one variable is impossible, since information can't travel faster than light, but it is not required. If node A knows that X = 1, he sends a message to B "X = 1". But it might be that before that message reaches B, that X = 2. This in itself is not a problem. Bitcoin's PoW indeed solves the double-spending problem. Its possible even at a distance to know what happens first, as long as Peers are honest (if A sends message X = 3, although X=1, the result will be false in any case independent of the order). Say in Bitcoin everyone knows that Alice owns 1 BTC. If she sends a message to all peers, that she wishes to transfer her wealth to Mallory, then this message will only be applied after a rather long interval - the block cycle. Blockchains doesn't mean that all nodes agree on a total order of events. It means consensus on partial order of events. The most important article for understanding blockchains very few seem to have read and understood is:
http://research.microsoft.com/en-us/um/people/lamport/pubs/time-clocks.pdf . Lamport also came up later with the Byzantine Generals problem.