Ordinal data indeed isn't stored on UTXO set. But,
1. There are so many new Ordinal inscription created.
2. When you create new inscription, Ordinal create 2 TX. 2nd TX usually have 1 input and 2 output. 1 input refer to "monkey" (arbitrary data) and 2 output which rerepsent change address and new UTXO which represent ownership of that Ordinals.
i thought bitcoin transactions could have as many utxo inputs as they wanted to. so why couldn't someone use 10 utxos to pay for their monkey and then that would be reducing the utxo set by 10-2=8 utxos. the only time the utxo set count would increase is if someone used only 1 utxo to pay for their entire monkey. but if i had to guess, i'd imagine the average bitcoin transaction uses at least 2 utxos.