If low-level custom firmware can be developed to give us direct access to the ASIC’s SHA-256 hashing function (without its built-in mining logic), then this approach could work. We could offload the SHA-256 part to the ASIC while handling the rest on the GPU/CPU, creating a hybrid GPU + ASIC system for Bitcoin address generation.
It's easier to simply use two GPUs to increase throughput, instead of trying to solve technical bottlenecks.
Because, before any hashing takes place, a GPU can produce much more public keys then the amount it can transfer out. The memory can't keep up.
An ASIC is required to do everything internally, on-chip.