Looks good! I tried to do the same thing on a V6 LX130T (use almost all DSPs and pipeline the rest of the LUT adders), but there aren't enough registers in that device for tx_w and tx_state delays

. so many 512 and 256 bit registers...
BTW, what does Xpower report for that design at 400 MHz?