P.S. This design does not "take up too many IOs", taking up too many IOs assumes you don't use some sort of serialization, which is rather absurd.
Well, maybe you could tell me how you would call it to produce an ip core with 288 IOs for a device that has only 167 (or so) soldered on a board with some SDRAM etc ending with 80 Usable user IOs? Btw, you could enligten me wich microcontroller you plan to use that can write 256 bits at once. There is no bottleneck in using some sort of serialization at all. And even if there were, you could always reduce the bandwith requirement by implementing roll-n-times in hardware.
The "control" entity is obviously not meant to be the top of the design. You would accompany it with some kind of interface. Check the Open source miner project. There you have both RS232 interface and through Altera's "virtual wire".
As for bandwidth, you really don't need any. You just send a bunch of bytes (256-isch) to fire off a decent sized job
