So does this mean that, say in the case when transactions in the mempool are low, each time a new transaction enters the mempool the miners will stop attempting to hash the old block and start attempting to hash the new block that includes the new transaction?
Yes, they could do that. They probably do it, but that doesn't answer to “miners hash a block that is changing every second”. While they mine, they may change various values (despite the nonce) like timestamp, transactions from mempool (which changes the merkle root) etc.
Also I have noticed a handful of really small blocks that get mined especially fast.
Small like... empty? Probably empty blocks could be solved more easily, due to the fact that there's less information to be hashed each time.