I had trouble uploading the code.. So I just forked the cc miner so I could put the release on. I made 3 files balloon.h, ballooncore.cu & baloon.cu . And added in the algo to miner.cpp.
It's only using the memory of the gpus and a tiny bit of the gpus core. So don't expect the world. I tested it on a rig of 3 x 1060's. And as its v2 and v3 will be better once I've ported more code to the cuda cores.
Enjoy