The main problem I see coming is that even the slowest announced ASIC (BFL's Jalapeno) can go faster than one getwork per second. Most devices are a lot faster. Pulling the work away from the pools and closer to the actual device is a decent looking way to deal with that. I'm not aware of any huge issues with timestamp rolling as long as they can get enough getwork blocks, especially after a longpoll when they need more new pieces of work to start working on the new block (small surges of getworks every 10 seconds when a block comes out seems excessive).
On my desktop, P2Pool can supply ~130 getworks/second, or enough work for 520 GH/s (130/s * 4 GH), without any timestamp rolling. With timestamp rolling one minute backwards and forwards, it can supply 62 TH/s (520 GH/s * 120) of work, or 480 GH/s (1/s * 4 GH * 120) of work at a rate of one getwork per second.
I don't think ASICs will need any special support. P2Pool can provide getwork results fast enough for hundreds of GH/s (from a normal computer) and could be optimized for more. In addition, any timestamp rolling multiplies that.
What about mining on a remote node? It seems like ASICs could kill P2P Mining.
The earlier argument (^) about remote miners not working was a bit of a farce, as shown by the above numbers.
The thing to focus on in order to make sure things work is the timestamp rolling support in ASIC mining software. We need to ensure that it can roll ahead of the current time (and potentially backwards, though that would require an extension to getwork).
I looked at GBT, and I don't see it improving anything. It requires sending the entire block template (potentially up to 1 MB) to miners every 10 seconds, which is definitely impractical for remote miners. An extension to GBT that allowed only sending the merkle root would avoid this, though.