OK, so took me a day and a half, but I finally went through and read all 23 pages of this thread. Kind of glossed over it, so I'm not 100% on what all the decisions are made, but I have a much better feel of where you guys are at in terms of the design and progress.
Some things i'd like to add to the discussion for now:
In terms of Xilinx licenses - as someone said above - Xilinx doesn't care too much where you get the software or license - as in the end you will require a Xilinx FPGA to program and that's really where they make their money. If anyone cares I can show you where to obtain a less than reputable ( read: pirate ) ISE license for ISE 13 with everything unlocked ( including the LX150/T design targets )
The unconnected I/O are unimportant for this design. You can tie them all to ground, you can tie them all to Vcc, you can leave them floating. It doesn't matter - personally I'd say just leave them floating. This design is not going to be requiring non-noisy I/O lines as we aren't going to be doing anything like a high speed bus on the I/O. At my workplace we build industrial monitoring equipment and on a lot of our boards we just leave unconnected I/O hanging and set the ISE to either use a weak pull down or leave the I/O floating. It's really not much of an issue unless you are requiring very high speed accurate I/O ( think 100-200MHz clock ranges ). These settings can be found under right clicking on Generate Programming File -> Process Properties -> Configuration Options -> Unused IOB pins ( Pull Down, Pull Up, Floating )
Also, again I'm a bit unclear on how the JTAG is going to be set up. But I would expect a JTAG header on every DIMM, or at least one JTAG somewhere with all FPGAs chained. I would also probably expect a couple of LEDs. You probably are going to want to put one between VCC and GND to show you that the baord has power. You will probably also want an LED or two as debug outputs. In my mind, I would also love to have a couple other test points as just unused I/O pins routed out to a TP - these are also very useful for debugging. On the other hand, this design shouldn't require too much debugging as it's pretty dead simple - but in terms of future application/expansion, it may be helpful for debugging new features. Also, typically there will be an LED or two somehow tied to the tx/rx lines of the USB bus so you can tell when communications is occurring. I would love to be able to route all the unused I/O to pin headers or test points, but I agree with what has been said above, and the cost/complexity to do it is just not worth it. Sure you'd be able to use the board as a spartan6 dev board, but I don't think that's the goal of this project. So to keep PCB complexity down I agree with you guys - just route exactly what's needed, and then possibly add a handful more I/Os for debugging/indicators and then a few more that are brought out for future expansion - either to DIMM pins or test points ( think extra comm protocols etc... ). If not all DIMM pins are going to be routed ( not taken for power or I/O ) - I would also prefer to have a TP for each of these unused DIMM pins - that way we could deadbug new features or bug fixes this way. If we leave just enough room for error that we can hack something on for this prototype it will save us a lot of time/effort later because it'll likely help us skip re-spinning. I agree with what was said above that this is a prototype - it should have a slight excess of what's necessary to help us debug/fix any potential problems/errors we may make before the first spin.
I'm unsure about what internal clock rates any said miner design will be able to obtain inside the FPGA. But I would guess it's going to be around 25-50 MHz. I would probably advise against using the same 25 MHz crystal for the MCP and the FPGAs just because to get 25MHz to 100 MHz using a PLL in the FPGA will require using the CLK_FX output to multiply the input clock and this is generally a noisier clock solution. It's definitely doable, but maybe not the best implementation. I don't have any experience with Spartan6 devices either ( we use a lot of Spartan3s ), so I'm unsure if it's possible to get useful computation done in this chip at higher clock rates. One thing I definitely do know is you will want to route your clock input into one of the global clock pins - basically there are certain pins in each bank that route closer/more directly to the BUFGMUXs that control the quadrant clock lines. These clock lines are the best lines to use for distribution clocks throughout the FPGA as they are built for this function and provide the least amount of clock skew along these lines. You can definitely still take a clock in on any I/O pad and route it to one of these BUFGMUXs - it's just sometimes that trace path ( between non-clock I/O and BUFGMUX ) is not the most optimal path. There are a bunch of different pins you can use, but I would stray away from the quadrant/side locked clock inputs and just use one of the global clock inputs ( there should be at least 4 ). The quadrant/side clocks are useful if you partition your FPGA device into regions based on clock domains - then you can free up global clock resources and only use clocks in one quadrant or one side if needed - but this isn't necessary for our design - I envision one main clock for the hashing engine, and potentially one other clock for communications. The hashing engine clock is the only one I'm worried about - the comm clock could be derived internally off a counter as it's not fast speed or touches a lot of resources. If you'd like to read more up on it -ug382.pdf describes clocking mechanisms for the Spartan6 family.
TL;DR So when you are looking to route the clock to the FPGA - make sure you connect to a GCLK I/O pin.