In case anyone still cares about this post, I should mention that I FINALLY go the thing 100% solved. After wondering about the extra "sixth field" that the miner was submitting after the usual five fields that are clearly explained in the documentation, I looked more carefully and realized that during the JSON exchange, early after authenticating, the miner negotiated something with the pool called "version rolling" and the pool gave the miner a "version rolling mask".
It turns out I was doing everything correct EXCEPT I did not take this into account. It seems that the miner can negotiate with the pool to alter the "version number" in the block header, limited by the constraints specified in the "version rolling bitmask". There's a little Boolean equation that you use to combine your suggested version, with the original version, and with this bitmask to produce a "new_version" that you use in place of the original. Put in a simplified way, you are allowed to alter the bits in the original version as dictated by the bitmask. The sixth field contains your suggested modification to the bitmask.
This acts like an "extra" extranonce that you can play with, but the advantage is that you don't need to recalculate the coinbase, its hash, and the merkle root. You still have to hash the new blockheader, but that's quite a bit less work!
Anyway, I made that small adjustment and suddenly I calculated a share with a nice long string of leading zeros.