Finally i can post here

Nice work ssvb to make miner for PS3, i hope we can tweak it together to make it even better

Sure, more improvements for PS3 miner are definitely possible. I have pushed some of my old unfinished code which implements parts of scrypt in SPU assembly (~5.4 khash/s -> 5.9 khash/s improvement per SPE core) to github. This is only an optimization for the first big loop, handling the second big loop in a similar way is expected to provide about the same speedup. This is just better instructions scheduling and keeping all data in registers avoiding unnecessary spills. Your improvements are focusing on a different aspect - better data layout for less scattered writes and more unrolling to completely eliminate any stray stalls waiting for DMA completion (up to ~6.1 khash/s per SPE core). Combining both optimizations should provide quite good results, maybe even changing to handle 10 hashes at once (5+5) would be beneficial for the second loop. The theoretical peak performance per SPU core, based on
counting 128-bit vector ADD/ROL/XOR operations is ~7.35 khash/s (an optimistic estimate, not even taking negligible SHA256 part and other overhead into account). Which means that we are already at >80% of the theoretical peak performance.
Just GPU miners are a bit more hot topic at the moment. During the last weekend I was busy installing a new graphics card and then doing some OpenCL coding with it. But I'm going to revisit Cell/BE code after I get my GPU miner up and running
