The point of all this is that a single transaction that connects two addresses together is not necessarily enough to link two businesses together, absent additional evidence.
The paper addresses this issue:
Although someone would use the CoinJoin method [9] to combine UTXOs from multiple senders into a single transaction to make it more challenging to determine the relationship between input and output addresses, we detect this method has not been adopted by the exchange so far.
I don't think this is a valid assumption. A CJ transaction can consist of two inputs, each from different entities. In 2013, many exchanges were not as professional as they are today, and were dealing with much less customer money.
The OP appears to be interested in weeding out exchanges with fake volume. An exchange with fake volume could possibly pay a whale to conduct a small number of Coin Join transactions to evade detection of their fake volume.
It is not that simple. Imagine that we have this model approved and standardized and many watchdogs involved using the basic idea. An exchange confident enough about its volume might decide to let analyzer do their hob and provide the info which puts them in the top list. A shady exchange can not change anything by using coinjoin. It is because of what coinjoin does: hiding assets. The incentive goes the opposite way.
Also, a classification model that is accurate 96% of the time (it is unclear how you are measuring accuracy) has very high accuracy. My first reaction to that high of claimed accuracy is that you might have
data leakage. I can't point to the source without looking at your specific steps to train your model, which understandably may not be something you want to share.
I suppose they are presenting a model more than a software. So far, the model seems to me to be solid up to the extent that a good heuristic-based data mining model could be. The implementation is not open and it is not good news, so the results presented are highly suspicious.
For example, consider a conspiracy theory to be true: A shady exchange (such as Bittrex) with very low liquidity and a high incentive to put itself in the top 10 list and faking high volumes of trade, as a part of its scam, hires a team of technical writers and they publish an acceptable analysis model and faking privately generated results in favor of the exchange.
I would recommend that you learn about
machine learning.
Thank you for the recommendation and the Wikipedia page you linked.
It is not how it works in technical discussions tho. You got deep knowledge in ML? Good for you! But for now, the only serious objection you've made to the article is about the possibility of the model being
coinjoin-attacked by exchanges, making void one of the basic heuristic assumptions of the proposed model. Well, I'm not convinced, nobody would because there is no sign of that and no incentive for that.