I had an idea for speeding up nonce and "attempt" generation.
Let me qualify this first: I am brand new to the nitty gritty technical details of mining. I am in no way an expert.
Now that that is out of the way, I am presuming that the nonce serves as a seed to a PRNG which builds the data to be hashed and checked against the difficulty. I will call this data the "attempt string" because I don't know what the proper term is.
Anyways, so far, the fastest PRNG I have ever heard of is the SFMT. It is open source and written in c. It is optimized to take advantage of certain features of the CPU to generate random numbers as fast as possible. It may be worth while to check out because it could speed up nonce generation and attempt string generation. It is possible that this could improve the overall performance of your mining software.
Now, on to a question. NXT has recently released the MS or Monetary System. Some coins released on the MS can be "minted" using SHA-256, Scrypt, or Keccak25. Minting is almost the same thing as mining with the difference that it works like a transaction on the NXT blockchain and thus costs a transaction fee to the account of the minter.
There is talk about making a GPU/ASIC minting software... The only thing is, the Devs at NXT only know Java and we all know that it is faster to run mining or "minting" software natively rather than over the JVM.
It would not be that hard to adapt your software to mint MSCoins. All it would take is native transaction signing and some code to submit that to the NXT API server. Plus, of course, removal of unnecessary code and features.
Being the first GPU minting software for MSCoins would certainly be a noteworthy accomplishment!