I think getting something working with an FPGA would be the likely next step.
Do you think that an FPGA would apply for this task?
Someone would need to code a implementation of this
https://github.com/samr7/vanitygen/blob/master/calc_addrs.cl in Verilog. I don't know much about FPGAs but they must also choose the right pins to assign to. I like the idea though, since there should be a way to program the EC multiplications and adds for it.