your c1...c31 bits can be reduced by one instruction

c1 == a1b1 + a1c0 + b1c0
== a1(b1+c0) + b1c0
Yes, but at the expense of 1 extra depth in the circuit. The form on the site is for shortest path - ideal for implementation.
c1 == a1b1 + a1c0 + b1c0
== a1(b1+c0) + b1c0
or
== a1b1 + c0(a1 + b1)
or
== b1(a1 + c0) + a1c0