Post
Topic
Board Trading Discussion
Re: Discussion for MtGox trade data downloader
by
nitrous
on 18/12/2013, 17:39:49 UTC
...

Hmm, some really strange stuff happening... Firstly, showing up as 1.07 GB - I just realised that the tool will create a duplicate index, I should have thought about that. It's not really a problem except for taking a few minutes the first time you load it up and increasing the file size more than necessary, otherwise though it shouldn't affect your usage.

Yeah sorry about that - the format I was asked to support only went down to second resolution, so the microsecond resolution isn't present. In fact, microsecond resolution isn't even available for the first 218868 ticks. Unfortunately there's so many different possible formats I could export to, so I picked a few and stuck with them (although obviously the dump contains all the raw data unfiltered). If there's one you really want then I could release a new version of the tool with that supported, but if you want to manipulate the data into more formats or more than you can do in Excel consider playing around with Python and seeing what you can do with it Smiley

Excel is being very weird - if you notice, it's taking the minute and seconds and converting them into a new millisecond value for some very strange reason, such as 17:48:56 -> 17:48:56:4856. There's no imperative for it to do this.

Yes, the first data source is the Google BigQuery database, the second is the MtGox HTTP API, and the third is the MtGox Socket API -- basically the socket API is used to just collect the last few trades in real time. If you're going to cut them out, then you can cut out the last few minutes, half an hour to be safe, of the data, and just the one day on May 23rd (although really there shouldn't be any discrepancy, the data should be exact).

Yes, the large jump is because MtGox changed recording format -- Money_Trade__, otherwise known as the TID/Trade ID, used to be a (mostly) sequential integer, then it became a microsecond timestamp afterwards (coinciding with that closure, I believe). It doesn't make a difference to the data though, all ticks should still be present, it's just a curiosity. Of course, this discontinuity will probably have some effect on the prices around that time, so you might want to exclude a few days ± around that point for that reason as it might mess up your backtesting.