It is great because it gives us a solid labeled dataset regarding what people like about posts.
You think? I'm not so sure, because people give merits for a wide variety of reasons and even from one merit source to the next there are different standards as far as what constitutes a merit-worthy post. The only consistent criterion I could see would be length, with a well-answered question (of any length) coming in at a close second.
Any labeled data is better than unlabeled! To be fair I did mention right after this one potential issue of how people give merit for different reasons, and then literally how it might be neat to model and mine out why users give merit. We do have the data on who gives merit to who and the information within those posts...
But just as a general mining project the model should still be able to tell multiple stories. There was a paper and then this old data mining challenge on Kaggle regarding the 'random acts of pizza' subreddit in modeling altruism, that is what it was about certain posts convinced people to buy or not buy someone a pizza. We found various factors within the result model such as politeness, gratitude, and so on.
We might just end up with a decision tree which basically has a branch for each reason someone gives merit including 'fake merit/merit farming' where people give a shill account merit..