Let's let the OP try to write up the logic formula for SHA256's first 32 bits of input and 1 bit of output.
Thanks for the links.
I rewrote the code so that it performs the double SHA256 first, constructing an in-memory tree of all the operations and variables (without actually evaluating them, arithmetically or symbolically.) So basically for each bit in the final hash, I have a tree representing all the operations that have to occur (using an in-order walk) to compute that bit.
What this allows me to do is simplify different branches of the trees independently, and in parallel on different threads. The other thing it gives me is the ability to see the current depth of the tree I'm trying to simplify, which will give me some "order of magnitude" guess of how large the final equation will be.
There's thousands of these branches to simplify and they do take a very long time, but watching them in the debugger, things are reducing and it's not ballooning out of control (yet.)
If everything looks good, the longer term plan is to assign branch simplification to different datacenter machines, so that I can have a few thousand of them simplifying in parallel instead of 8 on my desktop.