I have not thought it through, wanted to have feedback!

From my background the intuitive thing is to use F1 measure as a score. Or area under ROC curve. There are good wikipedia articles about these.
These seemed to be concerned with binary outcomes. I want a metric that is continuous. I am developing one currently

Eureka! It is this simple:
- Every predictor gives two prices in log scale eg. "In 2014-5-16 the price is between 2.7 and 2.85 (roughly 500 and 700)"
- When the actual price is known, you take
min [ abs ( actual - upper_limit); abs ( actual - lower_limit) ]- Whoever has the lowest average error after a reasonable number of predictions (predictions can be renewed as often as you wish regardless of their maturity)
is the best! 
- Proof omitted
