Thats exactly what Ive been working on, for reasonable definitions of small and fast.
I think its a promising avenue for research. Its the one I would choose too. Sinking big bucks into an investment in UltraScale+ FPGA boards and being dependent on one or two VHDL coders who know the subject matter seems
risky adventurous.
Revisiting sha256 ASIC development history, I can think of at least two companies (CoinTerra, Spondoolies) who failed because they tried to design large die area high hash rate chips and were late to market and at least one company (Bitmain) who succeeded by designing smaller and simpler chips, using lots of them in a miner and being first to market.