This is an extremely interesting idea. Could you elaborate on how the Zerocoin transaction stages map to the stages of CoinJoin transaction creation?
For non-decenteralized coincoin, you simply pass around a transaction and sign it. It's a single sequence and an atomic transaction, you'd make two loops through the users, one to discover the inputs and outputs, and another to sign them. There really aren't stages to it.
Making a decenteralized CoinJoin secure, private, and resistant to DOS attack (people refusing to sign in order to make it fail) is trickier... for the privacy and dos attack resistance you can use ZC:
Presume the participants for a transaction are sharing some multicast medium and can all communicate. They need to accomplish the task of offering up inputs (txid:vout) for inclusion in the transaction and then, in an unlinkable way, providing outputs to receive their coins.
Each participant connects and names bitcoin input(s), an address for change (if needed), and the result of performing a ZC mint transaction to add to the ZC accumulator. They sign all this with the keys for the corresponding inputs proving its theirs to spend.
Then all the parties connect again anonymously and provide ZC redeem transactions which specify where the resulting bitcoins should go.
On their original connections all parties can now see the redeem transactions and are convinced that they were provided by the correct parties, and that they themselves will be paid. So they all sign the transaction.
If a party fails to sign, everyone else is convinced that its because they are jamming the process (intentionally or maliciously) and then can all ban (ignore in the future) whatever costly identity they used to enter the mix, or if there is no other mechanism that particular txin which they used.
This isn't the only way to do this in a decentralized manner, the way to do it with blind signatures is fairly similar:
Each participant connects, names Bitcoin input(s), an address for change (if needed), a key for blind signing, and a blinded hash of the address they want paid. They sign all this with the keys for the corresponding inputs proving its theirs to spend.
Each participant then blind signs the blinded hashes of all participants (including themselves).
Each participant then reconnects anonymously and discloses their unblinded values and all the signatures. Because all the participants can see all the signatures, they know all are authentic. They sign, and if they refuse to sign everyone is convinced that the refusing signer is attempting to jam and bans them.
The most obvious difference between the two techniques is that the blind signing requires N^2 signatures, and potentially three communication phases (submit, blindsign, redeem) instead of two (mint, redeem) if you wanted this process to span more than a single event.
As I said above, I generally think the non-decenteralized versions of these transactions will be implemented and commonly used first, simply because they're so much less work to do.