accurate historical data

 
I had been speaking to FXCM regarding historical data. I already had downloaded free historical data from MetaQuotes and the support person sent me a link where I could purchase 10 years of historical data from FXCM. The price was $500 which did not make sense because I had free data from MetaQuotes. Although it made me think why would they even have that offer. Either they were hoping to get money from people that did not do research regarding what is available for free or maybe that the MetaQuotes data was inaccurate and you have to pay to get accurate historical data. Is the free MetaQuotes data accurate enough to analyze or do you have to pay to get accurate data?
 
jshumaker:
I had been speaking to FXCM regarding historical data. I already had downloaded free historical data from MetaQuotes and the support person sent me a link where I could purchase 10 years of historical data from FXCM. The price was $500 which did not make sense because I had free data from MetaQuotes. Although it made me think why would they even have that offer. Either they were hoping to get money from people that did not do research regarding what is available for free or maybe that the MetaQuotes data was inaccurate and you have to pay to get accurate historical data. Is the free MetaQuotes data accurate enough to analyze or do you have to pay to get accurate data?
There is no central market for Forex, each Brokers prices are unique to that Broker so how do you determine what is accurate and not accurate ?
 
RaptorUK:
There is no central market for Forex, each Brokers prices are unique to that Broker so how do you determine what is accurate and not accurate ?


Good point. Let me rephrase my question then.

How much of a difference in price would you expect to see between MetaQuotes free data and the paid data from FXCM? I would hope that the majority of the brokers would only be a few pips apart. Excluding during high volatility news releases such as non-farm payroll report. Therefore anybody's data would be sufficient for backtesting.

 
jshumaker:


Good point. Let me rephrase my question then.

How much of a difference in price would you expect to see between MetaQuotes free data and the paid data from FXCM? I would hope that the majority of the brokers would only be a few pips apart. Therefore anybody's data would be sufficient for backtesting.

I wouldn't expect to see much more than a pip difference, that doesn't mean that if you look hard enough you won't find differences of 2, 3 or more pips, you probably will. The quality aspect of data IMO is more about missing data, Fridays that end at different times and Sundays that start at different times, H1 data not matching M1 data, etc.
 
Even though it was not my intention. One of the functions I was working on forced me to match up different time frame highs and lows. It seemed like the data lined up perfectly. Do you have any experience of seeing those kinds of errors in the MetaQuotes data?
 
jshumaker:
Even though it was not my intention. One of the functions I was working on forced me to match up different time frame highs and lows. It seemed like the data lined up perfectly. Do you have any experience of seeing those kinds of errors in the MetaQuotes data?
Nope, I haven't used the MQ data, my data comes from tick data so has a common source for all the timeframes. You should use M1 and make the other timeframes from it.
 
RaptorUK:
Nope, I haven't used the MQ data, my data comes from tick data so has a common source for all the timeframes. You should use M1 and make the other timeframes from it.


where do you get your tick data from?
 
4evermaat:

where do you get your tick data from?
I have downloaded from Dukascopy and Pepperstone.
 

There are also multiple free tick data (not M1 but actual ticks) available by different brokers, mostly in csv format.

Overall, the quality is not too bad, however you would occasionally find some errors:

1- Ticks with a wrong time stamp.

2- Ticks with invalid Bid or Ask price.

These errors are rare (typically occur a dozen time or so in a calendar year) and you must correct or account for them.

 

I had been developing a overnight scalping EA using historical data from before the market crash 2001-2006 and it looked like I had gotten the EA to perform really well. In all honesty, I was kind of shocked with the results I was seeing. Then I ran the EA on more current data 2010 - current. Now I know the common disclaimer of past performance is not indicative of future results, but this is so drastically different that I have a really hard time believing that these could be from the same data set. Especially since this is a very simple algorithm and I have completed college courses in data mining and predictive modeling. So I am fully aware of how you can over optimize an algorithm and know I didn't over optimize this. I also queried the data for each year and checked the average volatility for each hour of the day and I still cannot understand how these two charts came from the same EA. My belief is that I am not getting quality historical data.

2001 - 20022012 - current

 
jshumaker: I had been developing a overnight scalping EA using historical data from before the market crash 2001-2006 and it looked like I had gotten the EA to perform really well. In all honesty, I was kind of shocked with the results I was seeing. Then I ran the EA on more current data 2010 - current. Now I know the common disclaimer of past performance is not indicative of future results, but this is so drastically different that I have a really hard time believing that these could be from the same data set. Especially since this is a very simple algorithm and I have completed college courses in data mining and predictive modeling. So I am fully aware of how you can over optimize an algorithm and know I didn't over optimize this. I also queried the data for each year and checked the average volatility for each hour of the day and I still cannot understand how these two charts came from the same EA. My belief is that I am not getting quality historical data.

Been there Link#1. Better believe it :)

It isn't the data, its because the market is pseudo-random but charges you spreads. In_my_mind this is no different from playing roulette with 0 && 00. Or playing poker against opponents whose skill levels are unknown but you all have to pay the house for hosting the game ... aka spreads/commission.

If someone give you the results of 10,000 roulette_spins, I'm pretty sure you could formulate a strategy to beat that historical data. But taking this information into the future is a mistake.

Reason: