One important difference between the schemes you are considering and Yang's is that in your schemes, the people that do the mixing seem to know the link between the input and output addresses of the people that they are mixing with, while this is not the case in Yang's proposal. Another important difference is that you accept inputs of different sizes, while he doesn't. I understand the appeal of using inputs of different sizes but as Yang suggests, we can just transform these into standardized sizes using a greedy algorithm and mix them separately. This means that you don't have to fiddle the sizes as you have been doing.
As for M-to-N mixing with M>N as justusranvier suggested, I believe it is not really an improvement over N-to-N mixing. My reasoning is the following:
The idea of M-to-N mixing seems to be to reduce the number of change addresses each user has. This cannot be done before the mixing as this would link previous transactions. However, this can be done without a problem after the mixing since all that you can learn from the blockchain is that some of the addresses in the mixing belonged to the same person, and you would have learned that from any M-to-N mixing with M>N anyways. Of course, that means that the number of addresses in the mixing transaction is larger, but this has the upside of increasing the anonymity gained.