A simple compile using Quartus 12.1 of the unoptimized design(1) in a A7 device gives the following result:
; Device ; 5SGXEA7K2F40C2 ;
; Logic utilization (in ALMs) ; 32,617 / 234,720 ( 14 % ) ;
For what it's worth, I used 5SGXEA7H1F35C1 and got 11% utilization (~8 cores?), and Fmax >200MHz.
The only change I made was a new PLL megafunction.
I have a serial interface and some other communication logic in there which can explain the extra ALM's.
200MHz Fmax on the hash clock? That's more what I would expect, or even faster for such a device. Did you use derive_pll_clocks and derive_clock_uncertainty in the SDC file for your timing analysis?