1) The miner's main routine sends to the hashing routine 2 (SSE) (or 4 for AVX) inputs for checking, not one.
2) A hashing routine with 2 (or 4) inputs and 2 (or 4) outputs.
That is normally how SIMD code is written yes. It is why structs of arrays is more SIMD-friendly than arrays of structs. (Good) GPU SIMD code including miners works just that way, even the first BTC miners I think.
If you think CPU miners can be optimized the do more SIMD processing, and you are probably right in at least some cases, then go optimize and make some money.