I pushed some more code to
https://github.com/jtoomim/p2pool/tree/1mb_segwit that adds more CPU and (maybe) RAM performance improvements for p2pool. I think it should improve CPU usage and latency by about 30% or more. These improvements should reduce DOA and orphan rates on the network a bit. It looks like running p2pool on CPython with medium-slow CPUs should now be viable without huge DOA/orphan costs, although I still recommend using pypy whenever possible. The new code makes fewer memory allocations when serializing objects for network transmission, which might reduce total memory consumption, or it might not. We'll see in a few days.
I also added a performance profiling/benchmark mode. If p2pool is too slow for you, I would find it helpful if you ran python run_p2pool.py --bench and then sent me a snippet of the output, especially if you can get the output near when a share is received over the network.
okey, thanks. i will take a look these days. would by nice to get it run on Win10 with Pyton 2.7 (pypy does not run here ...)