In NoodleDoodle's performance commit he noted a benchmark of 2.5ms/tx on i7-2600. That's 400 tx/sec on a 2011 desktop. A reasonably priced current-gen server (say dual-Xeon 10-core CPUs) is probably several times faster so close to 5K/sec, but I don't know the exact numbers. There is more optimization available still (we aren't using the most optimized elliptic curve asm library available from Bernstein for example, just his sort-of-optimized C library).
With the move to ringCT, it will probably be different (though some of the differences will offset, such as having fewer outputs/tx), and we will have to reevaluate.
Sweet!

When I tell people about this good news during my Monero evangelizing, how do I explain why our sig_ops are so much faster than Old Grandpa Bitcoin's?
Or should I even bother, since Bitcoin's verification is being fixed (and
supercharged) with segwit?
I did some thumb-sucking math a number of pages ago: I "built" a single system server that exists today (8-way) that could process 40-something-k TPS IIRC.
Edit: found post:
https://bitcointalk.org/index.php?topic=753252.msg12768096#msg12768096This would suggest to me that 120k TPS would cost ~$200,000 for just the system(s).
Assuming a transaction size of 2 KB (made up), we'd need a ~1.9gbps link.
Super sweet!!!

Watch your back, Visa.
Soon...