trying to build a depth:=3 version right now.
slice luts: 54% (53% used as logic)
slice registers: 26%
occupied slices: 66%
estimates after synthesis.
with a targeted 50mhz clock p&r takes forever and finally fails with setup violations.
problem is congestion/routing, not available ressources in terms of FFs or LUTs...
if you have the time, then just give it a try for xc6slx45-2csg324 with 50mhz and depth:=3
increasing the frequency is not an option, with depth:=2 the timing performance design goal reports just 55mhz after p&r.