Mining calculators work on statistical modeling of expected earnings. There is no guarantee of actually making the expected earnings. For example, had a user been mining on my pool since it began, they would have made approximately 25% more than expected values. Looking at Kano's pool, you can see a lifetime earnings of over 106% of expectations.
Yes i was thinking that which is why i think i need some real raw data to work off.
Could you not just use a mining pools data? Or look at the blockchain? Each pool would be different on results unless your talking PPS. So I would suggest pick a pool and use their data.
One thing worth noting 6 months of previous data will look different then the next 6 months. With having it's hard to say a lot of what will happen and pricing is also hard. So I think you need to decide if 6 previous months data will work if so it is very easy. Or if you need 6 months of you doing it personally. I guess this might depend on what the thesis is for.