On ethermine pool I can tell you that it has always under reported my estimated payment and hashrate, but the _actual_ payment matched almost exactly what I expected. It may be that something is off on their side ... but as long as it's off in the same way for all miners then your test is relevant (for which miner's best).
If you want to compare payouts, then leave it longer than 24h (say, 3 days?) and after you stop the rigs, allow for at least 1h more so that the last shares you mined get counted in the last 1h window on Flypool.
I secretly hope bminer is better in the end, simply because dstm on my rigs is reducing hashrate due to too much cpu hogging.