Once the RAM is full everything is a read so it'll be perfectly parallel with some negligible constant time overhead.
Improvements against scrypt for the birthdayhash will help a bit, but like you said any tricks generally translate fine into ASICs.
For your example, since the mac pro has 16 times more RAM and hashing power it will be exactly 16^2 or 256 times better. Well worth the 50x cost. Since this is quadratic it will just get worse when you get into 128GB servers.
I like the elegance of having the RAM be the main resource for hashing, but I think we have a ways to go before making it practical. I'll keep thinking about it.
Nathaniel