The hardware passed checks and fortunately I believe I've found the issue and it's related to the toolchain in the updated operating system in the new dedicated server. I will be slowly migrating users to the new server now. If there's an issue I'll put everyone back on the old one immediately. Downtime should be negligible.
Most users should have migrated by now. Some with persistent DNS caches that ignore the short time to live setting of solo.ckpool.org may take a while to switch over. Any lingering hashrate on the old pool can still solve a block so no hashes are being wasted.
The long story is the instability was caused by a stack overflow in the ckpool code with very long segwit addresses that never manifested on older toolchains due to the more lax amount of stack ram was being allocated. A fix has been committed to the ckpool git.