If I did something wrong, i apologise in advance, but i'm just not seeing that performance.
Tick offset can make a big difference. For 2 hour strategy, it seems we should try at least 180 different tick offset settings... maybe one day butter will do this for us to find the best setting??
Yes, the tick-offset makes a lot of difference in back testing, but much less in the actual performance of the bot. Remember, you are trying to learn from past market conditions in order to form an effective strategy for the future. This should in no way be confused with "predicting the future".
The different results you get with different offsets are largely luck. Whereas you can "see" the luck factor in hindsight, you cannot use the tool to predict future "luck".
It has been discussed prior in this forum other factors that may be gleaned from offset testing. I recommend those playing with this feature review that discussion.
What COULD be implemented though, is a function where the backtesting calculates the average of all possible offsets.
That might be a more objective way to compare strategies.