Shouldn't we be looking at FPGAs by now?
We should. But it should be one that can compete with the 20.000+ number of cores in a high-end GPU.
If it only has a few number of parallel units they should be so freaking fast that they do more total jumps/s than what the equivalent GPU (with [tens of] thousands of cores) do. Otherwise it would overall be slower.
To get a rough idea if it's worth it I would first start with the field multiplication. I'm not sure if Bernstein's 256-bit multiplier using logic gates is the best one yet (or even if it's public) but you can take it as a reference. Then we have on average six 256-bit multiplications per jump per kangaroo. Depending on FPGA specs you can compare the raw performance against what a GPU can perform (for example a RTX 4090 can do around 90 billion 256-bit field mul/s at the very low level, before we can talk about point addition and so on)
You can find very recent (2022) HW designs of fast XGCD (for mod inv) which is the bottleneck when running Kangaroo on a GPU (around 50% of the running time is spent just by field inversion, even when doing just a single inversion for a batch of thousands of kangaroos / jump).
If the inversion is in HW than a FPGA might get overall faster than a GPU, or it might not, dependng on the other factors.