I've only tested g2.8xlarge which gives a combined cpu +gpu of 1.8k or so
I'm guessing the p2.8xlarge would be somewhere near the 3-3.5k mark and p2.16xlarge twice that.
I'll check if there's a preexisting AMI with CUDA set up to launch a spot instance and run a benchmark....(if there isn't I can't be bothered to set it up from scratch ...sorry

)