Glad you've found it useful

Yes, that's right. Here's an example of one script that's already been created -
http://cahier2.ww7.be/bitcoinmirror/phantomcircuit/. This is a different format though, and doesn't store all the data (I did however port this to an export format). You'll need to determine which field in the API maps to which field in the BQ database, and fill in some that isn't included in the API (Bid_User_Rest_App__ for example).
There are some discrepancies with the API data - the missing fields, data being split into individual currencies, currencies occasionally being silently added, gaps in the API data, etc. Whilst this shouldn't be too much of a problem for your actual data processing, it could cause problems if you ever want to update from BQ again. I suggest that you instead insert the data from the API into a new table (say, dump_live). You can then create a view from a union between dump and dump_live, delete the obsolete index on dump, and index the view (Money_Trade__ ASC,[Primary] DESC,Currency__ ASC). This will allow you to still access all the data fairly fast, but without corrupting the BQ dump. If you don't mind an extra 200mb, you can leave the index on dump. My tool will recreate this index if it doesn't exist, so it might be easier and more efficient to just leave it.
I've actually given thought to putting this in the official tool, hence the insights I had above (I'm not sure when I'll have enough time to do that though, it would mean changing quite a lot of the tool), so if you want any help with your script let me know, especially as I've also written documentation on the MtGox API v2, so I can help you map the API data to the BQ schema and get around the data jumps
I ended up putting my trade analysis work on hold for a while, but I wanted to stop by and mention that I was able to get the remainder of the data using the API and add it to my db just fine. Thanks again for all your help.
I was curious about the gaps in the API data that you mentioned, what type of gaps exactly are you talking about? I did some extremely basic verification and recreated a few candles for a few random days, and I got the same result as bitcoincharts.com (after filtering to USD trades properly).
In case anyone is interested, here is the python code I used to query the remainder of the mtgox data. You have to do it in 1000 trade blocks, so it is essentially just a nice wrapper on a long series of API calls:
import urllib2
import json
import csv
import time
import datetime
API_FETCH_URL = 'https://data.mtgox.com/api/2/BTCUSD/money/trades/fetch?since='
# mtgox API varname: postgres DB varname (from BigQuery dump)
VAR_MAP = {
'price_currency': 'currency__',
'trade_type': 'type',
'price_int': 'price',
'item': 'item',
'primary': 'single_market',
'tid': 'money_trade__',
'amount_int': 'amount',
'date': 'date',
'properties': 'properties'
}
VARS = [var for var in VAR_MAP.keys()]
def process_date(timestamp):
return datetime.datetime.utcfromtimestamp(timestamp).isoformat(' ')
def fetch_trades(last_id= None, outfile= '', limit=False, noise=False):
counter = 0
with open(outfile, 'wb') as f:
# create CSV writer object
writer = csv.writer(f)
# add header of varnames at top of file
writer.writerow([VAR_MAP[var] for var in VARS])
while True:
# pause for 2 seconds between API calls to prevent getting banned by anti-DDOS
time.sleep(2)
# fetch trades after the most recent trade id, using mtgox API
page_data = urllib2.urlopen(API_FETCH_URL + last_id)
# read response from urlopen GET request
json_response = page_data.read()
# decode JSON data into python dictionary
response = json.loads(json_response)
if response['result'] == 'success' and len(response['data']) > 0:
trades = response['data']
if noise:
print 'Batch %04d -- ?since= %s, num trades: %d' % (counter + 1, last_id, len(trades))
# write each trade as a separate line, using only trade values for the vars in the list VARS
# for date, convert from timestamp into ISO str format (UTC) to match date column in postgres DB
writer.writerows([[trade[var] if var != 'date' else process_date(trade[var]) for var in VARS]
for trade in trades])
# set last_id to the tid of the last trade, so we can fetch next batch of trades
last_id = trades[-1]['tid']
counter = counter + 1
# if limit parameter is in use, then only do as many batches as specified
if limit != False and counter >= limit:
break
else:
print '\n**********'
if response['result'] == 'success':
print 'SCRIPT END -- last trade reached'
else:
print 'SCRIPT END -- API call failed'
print '**********'
break
if __name__ == '__main__':
username = 'name'
fileame = 'mtgox_recent_trades.csv'
# id of last trade from BigQuery dump
last_dump_trade_id = '1369319572499520'
outfile = 'c:\\users\\%s\documents\\data\\mtgox\\%s' % (username, filename)
fetch_trades(last_id= last_dump_trade_id, outfile= outfile, noise= True)