Clean Data

Why Clean Data is a Mandatory Pre-Analysis Step

Futures Magazine’s September ’99 issue featured a cover story about the accuracy of the top end-of-day data providers for investors. We are pleased to report that CSI was the undeniable champion in terms of data accuracy and in other ways that might surprise you. The full text of the Futures story, written by Sheldon Knight of K-Data Inc., is available as a mailed reprint directly from CSI by completing the on-line visitor information request below. Please see the comparative rankings of US data firms shown below that was compiled from information supplied in the Sheldon Knight study.

Although we are admittedly biased, we found the results of Mr. Knight’s analysis very interesting, even compelling: The largest market data firms in the nation just didn’t stack up next to CSI’s stellar performance. Not only did CSI dramatically outdistance all of the competition with the least number of errors overall, but we did so with zero omissions. According to Sheldon Knight, “The data management functions of [CSI’s] Unfair Advantage are by far the most flexible tested, and the database is one of the most comprehensive.” Great data and great software; what more can we say? We would like to disclose a little more about the differences that make CSI the best data source in the industry. Its all in the details.

On Data Accuracy

In the study, there were collectively 1,203 errors and omissions noted from among the ten firms tested. The bottom line for CSI was the committal of 27 errors and omissions in the 1,506-day test. Dividing the remaining errors among the nine competing firms, we find that they had an average of 131 errors each in the same time period, which demonstrates an error rate of the average CSI competitor that is 385% higher than CSI’s.

CSI’s 18 errors in the soybean futures test were the least of all vendors. The average error size was less than half that of the second place firm, and an insignificant fraction of most of the other firms. In the S&P 500 analysis, CSI’s error rate of 9 tied for the lowest with one other vendor. Data sources were varied and sometimes overlapping, but CSI’s record of minimal errors probably has much more to do with procedure, pride, commitment, diligence, and customer participation than source. It is very rare for an error to get past the many data scrubbers on the CSI staff.

On Data Presentation

This was briefly noted in the Sheldon Knight study, but it deserves additional comment. Data presentation refers to the handling of after-the-close settlements that can result in exchanges quoting settlement prices that are outside the days trading range (above the high or below the low). It is common (but not necessarily correct) for summary day-end data vending firms to expand the high-low range to accommodate the assigned settlement price, even though settlement prices do not necessarily represent prices where actual trading took place. CSI delivers actual trading statistics to customers and gives the option of presenting data 1) in actual form, based on exchange statistics, 2) with highs and lows expanded to include the settlement, or 3) with the settlement price modified so that it lies within the actual highs and lows. According to the article, only CSI has recorded the historical statistics on all markets so that they can be presented in any one of these ways. It is clear that CSI’s competitors have forever lost the ability to present an unaltered historical record.

On Analytical Validity

The Futures study clearly demonstrates that technical analysis requires accurate data. In the study, S&P 500 data from CSI, Omega Research, and Bridge were used on the same simple breakout system with strikingly different results. The profit scenario varied from 20% to much more than 100% over the full period of study. This should offer substantial proof that the derived effects of a flawed database can lead to a useless result and a wasted effort because parameter settings determined from flawed data cannot be expected to work with the same efficiency in the market on which they will be applied. Unfair Advantage’s software and database are designed so that every user is equipped with exactly the same data set at all times, forcing any common analytical tool that is derived from past information to produce equivalent results on different machines.

Building a trading model based upon flawed past data is certain to degrade system effectiveness into the future. This truth, learned decades ago by CSI’s founder Bob Pelletier, is the driving force behind CSI’s policies. Before CSI was incorporated in 1970, studies done by Pelletier, a General Electric mathematician at that time, were inevitably tripped up by some obscure error that dominated parameter settings and falsely influenced the outcome of simulation exercises by forcing undeserved profits from the flawed data. It may seem that a small error here or there would not be important, but that was not the result in the work. Experiences like this made it abundantly clear that errors must be forbidden if any fruitful benefit was to be derived from hindsight testing.

Several of the data vendors included in the study are either allied with or directly tied to very expensive analysis programs, but they are not necessarily the required data sources. Although CSI is explicitly excluded from the data download screens and menus of most of those programs, discriminating users of the industry’s most powerful software tools still come to CSI for data. They know that it is pure folly to accept the suggestion that an average data firm can deliver the accuracy needed to create an exceptional trading system. Now that the importance of data accuracy has been revealed, perhaps even more traders will come directly to CSI, whether or not their software producer steers them in that direction. Software companies with whom CSI data products are compatible include: Equis Int’l (MetaStock®), Omega Research, Windows on Wall Street, ProfiTaker and many others.

Putting it All Together

It should be mentioned that the errors measured in the Sheldon Knight study were discovered in hindsight, based upon each company’s one-time historical submission of their global data reserves. An even more telling result might have emerged had the study been conducted on an ongoing basis by observing each contributors performance one day at a time over an extended period. With an ongoing study, the reader could have a better understanding of each company’s performance when it means the most: immediately after each day’s database update is posted. This way, a firm’s timing of delivery on all stock and world futures markets, diligence in avoiding omissions, and ability to stay on top of information gathering in spite of unpredictable obstacles could be studied.

Many factors contributed to CSI’s impressive performance reported in the Futures article, and most of them might be dismissed as insignificant details. Back-up electrical power, multiple information sources, a large experienced staff competent in applying checks and balances, and rewards for diligent customers reporting questionable data items are a few of the details CSI attends to each day. They seem to make the difference.

Clean Data – CSI vs Other Data Firms

Why Clean Data is a Mandatory Pre-Analysis Step

On Data Accuracy

On Data Presentation

On Analytical Validity

Putting it All Together

Need urgent assistance?

Site Links

Live Support

Legal