The main problem I see coming is that even the slowest announced ASIC (BFL's Jalapeno) can go faster than one getwork per second. Most devices are a lot faster. Pulling the work away from the pools and closer to the actual device is a decent looking way to deal with that. I'm not aware of any huge issues with timestamp rolling as long as they can get enough getwork blocks, especially after a longpoll when they need more new pieces of work to start working on the new block (small surges of getworks every 10 seconds when a block comes out seems excessive).
On my desktop, P2Pool can supply ~130 getworks/second, or enough work for 520 GH/s (130/s * 4 GH), without any timestamp rolling. With timestamp rolling one minute backwards and forwards, it can supply 62 TH/s (520 GH/s * 120) of work, or 480 GH/s (1/s * 4 GH * 120) of work at a rate of one getwork per second.
I don't think ASICs will need any special support. P2Pool can provide getwork results fast enough for hundreds of GH/s (from a normal computer) and could be optimized for more. In addition, any timestamp rolling multiplies that.
What about mining on a remote node? It seems like ASICs could kill P2P Mining.
The earlier argument (^) about remote miners not working was a bit of a farce, as shown by the above numbers.
The thing to focus on in order to make sure things work is the timestamp rolling support in ASIC mining software. We need to ensure that it can roll ahead of the current time (and potentially backwards, though that would require an extension to getwork).
...
Sorry - I don't get the point of this comment.
Unless I've completely missed your point, you don't need to roll the time with Stratum or GBT, you roll a value in the coinbase instead, named (IMO badly) the secondary nonce instead of screwing with the block timestamp (which is a hack) unnecessary