Your clock signal is great with very sharp rising/falling edge. I am wondering how you made it.
I simply added a NOR gate after the RC delay (100 ohm, 220pF) of your original circuit, but the signal is not as good as yours. The rising / falling time is much larger than yours and the pulses are broader.
It's better than the original circuit but I still get HWs.
The errors I get now are almost certainly on the input side rather than result capture, except the infrequent overrun I see get counted.
I use a trailing edge delay circuit. The first NOR gate has P and N, and it's output goes to A of 2nd NOR gate, then through a 100R resistor to B of 2nd gate. B has the 30pF to GND. So the 2nd NOR is ORing the clock with a delayed clock. The UART is set for rising edge capture, and data is inverted when read out of FIFO (~RCREG). This will be updated in the schematic very soon.