So difficulty is adjusted with each block according to the network hash power. Network hash power is measured by the speed with which a block is discovered, correct?
Half correct. Bitcoin's difficulty is adjusted every 2016 blocks to take an average. Bitcoin doesn't actually care about the hashpower. They only care about the time it takes between each block and tries to get it to as close as 10 minutes as possible.
So how is a 51% attack possible? As I understand it, the new block header must be hashed using the previous block header. Why then does controlling a certain amount of the network power enable double spends?
Bitcoin is designed in such a way such that the longest chain with the valid proof of work is always correct. With 51% attack, the attacker is able to generate blocks faster than anyone else on the network. The attacker can first spend a transaction [A] and get it confirmed. Next, he will start to mine blocks starting from the point before the transaction [A] is confirmed and include a transaction [C] which spends Bitcoin to a separate place. The attacker will broadcast his own chain once he outpaces the legitimate chain. Since everyone thinks that the longest valid chain (difficulty wise) is correct, they will follow the attackers chain.