Title: Programmers Guide to Unfair Advantage®


Author: Steven Davis 11/10/99
Last Modified 9/27/2000

9/3/2001 - Corrected mmanal.txt and location
1/3/2000 - Changed Stock symbol to 8 characters.
9/21/2000 - Changed Stock Scanning record size to 48 bytes to add futures series.
9/27/2000 - Added Stock Sectors and Industry Groups
11/7/2000 - Adding new fields to mmanalyzer.csv and mmanalyzer.bin
9/7/2001 - Updated Stock Factsheet record for CUSIP addition. 
2/8/2002 - Added some Stock Fundamental fields. 
 

1. Overview


Unfair Advantage (UA) is a CSI product.  UA maintains a client-side database of end-of-day stock, futures, mutual fund, and index prices (or quotes).  UA further facilitates third party software back fulfilling ad-hoc requests, maintaining certain published format files, and optionally maintaining secondary databases in industry-standard formats.  Additionally UA provides some research capability for the purposes of debugging data transfer problems.
 

2. API Access Model


UA fulfills ad-hoc requests through an OLE automation interface designated API2.  This allows a client application to search through the set of markets and the series available for each market.  It also allows for any series to be retrieved in binary or ascii formats.  API2 is fully documented in the file c:\ua\uapi\funcspec.doc titled “UNFAIR ADVANTAGE API (API2) FUNCTIONAL SPECIFICATION” hereafter referred to as the API2 document.
 

3. File Access Model

UA also allows for secondary databases to be created and maintained.  This is done through the portfolio manager.

3.1 Portfolio Manager

The Portfolio Manager is fully documented in the UA user’s guide titled UNFAIR ADVANTAGE® REFERENCE MANUAL.  This manual ships with UA.

3.2 UA File Maintenance Model

Since all industry-standard database formats are not all-inclusive, each secondary database UA creates contains only the data the user has specified upon creation.  For example, the user may create a CSI format secondary database with a back-adjusted Corn series, a Gann Contract of Live Cattle, and the nearest-to-expire 10 Eurodollar contracts.  This specification is not trivial and is the source of much customer consternation.

When the user specifies, through the portfolio manager, what the secondary database/databases are which should be created and maintained.  The user is prompted whether immediate creation is requested.  If not, creation is delayed until the next daily update.  Notice that in several supported formats, the secondary database depends in structure on the manor in which it was built.  If the customer were to modify the secondary databases by creation, deletion, or sorting, confusion of third party software may arise.  It is thus best to follow the access guidelines supplied with the industry-standard format.

The customer must request UA contact the CSI servers whenever account information needs to be verified or when the customer wishes to update the database.  It should be pointed out, that the database does not update itself and that the customer must be diligent in this matter.  I do not point this out to be snide, but rather a customer once tried to sue CSI on this point.

The UA database has an intrinsic “current date.”  The “current date” is the last date for which UA has information.  When UA updates, updates available on the “current date” is reacquired to allow for phases data availability and quick-corrections.  Then UA acquires any and all updates, which has since become available.  UA then applies the updates:

1. UA first updates its customer account records.
2. UA updates its own database.
3. UA tries to update any already created secondary databases.
4. Optionally UA displays certain reports.
5. UA displays any broadcast messages.
6. UA updates any published format data files.
7. UA updates any vendor specific data files requested.
8. UA creates/recreates any secondary databases which hadn’t not yet been created or couldn’t not be updated.
9. UA executes the optional after-download-execute setting.

An example of a non-updateable is a second-day detrended series.  Every day, the detrending basis changes and every day, every price in the file could be different.  As such it is easier and more accurate to recreate that database daily.

The actual format of the daily download is compressed, encrypted, and proprietary.  We do not release the format to anyone.  However, an uncompressed, unencrypted file may be created and left for the client if he is flagged as a UA commercial account.  Before you call-up asking for this, be forewarned that it costs upwards of $3000/month for that service.
 

3.3 CSI File Access


UA can maintain a CSI format database.  The format can be found at http://www.csidata.com/formats.html.  There are two significant limitations to this format: (1) every bit is in use, but there are some numbers in the database which can not fit, and (2) negative numbers are not allowed.  Other than that, this is an excellent, quick, low-fragmentation format.
 

3.4 CSIM File Access

UA can maintain a CSIM format database.  The format can be found at http://www.csidata.com/formats.html.  This format can be read by Metastock® as if it were in Metastock format.  CSI has no relationship with Metastock.  There are two significant limitations to this format: (1) it is hard to know if you have found the right series, and (2) Metastock format reads will only recognize one volume and one open interest field.  Other than that, this is an excellent, quick format.  This is no special anti-fragmentation technology though.
 

3.5 Ascii File Access

This is the easiest and most frequently used database format UA can maintain.  A great deal of flexibility is possible if you make full use of the UA Ascii file settings as documented in the UA reference manual.  Major disadvantages are that it is easy to get the wrong series, that the conversion factor (number encoding scheme identifier) is nowhere written, and the updating process is much slower.  The advantage is that it can accurately handle everything currently available.  There is no special anti-fragmentation technology.
 

4.  Exposed Files

4.1 c:\ua\archives\cdbfacts.adm

This file can be read-in, but should not be held as locked.  UA sometimes writes and/or rereads it.

The cdbfacts.adm file is a comma seperated Ascii file.  This file lists the properties of each commodity market included with UA:

1. Csinum
2. Symbol
3. Exchange
4. Name
5. Cvf
6. Units
7. Contract Size
8. Point Value
9. Active Months
10. Unused
11. Minimum Tick
12. Branch Number
13. Branch Date
14. Second Symbol
15. Unused
16. Unused
17. Currency
18. Start Date

Additional columns may be present and/or unused columns may be used.  For the definitions of these fields, I recommend the API2 documentations.

4.2 c:\ua\archives\sdbfacts.adm

This file can be opened in shared-read-only mode only.  UA maintains a constant shared-read-allowed status on the file.  This is a fixed record file starting with stock 1000.  The records are as follow

    typedef struct {
      char FreeFlag; // Free or Inactive or Active
      char unused1[1];
      char Symbol[8];
      char Name[40];
      char unused3[12];
      char CusipType[4];
      char Cusip[20];
      char SectorCode[2];
      char IndustryCode[2];
      char Exchange[6];
      char CvfSwitchDate[8];
      char Cvf[2];
      char PreSwitchCvf[2];
      char StartDate[8];
      char StopDate[8];
      char OptionCvf[2];
      char OptionStrikeDivide[1];
      char unused6[7];
**  float EarningsPerShare; // four byte float
**  long OutstandingShares;
**  long NumberOfInstitutionalShareHolders;
**  float PercentageOfShareHodlersWhoAreInstitutional;
**  float StockBeta; // four byte float
**  float RevenueGrowth5Year; // four byte float
**  float EPSGrowth5Year; // four byte float
**  float DividendGrowth5Year; // four byte float
      char Reserved[180-166];
    } StockSymbolRecord;

The “FreeFlag” is F for free records (no stock ever assigned), I for inactive records (stock assigned, but no longer traded), or A for active records.  Since there are sometimes Active records with no StartDate, I usually check with the StartDate field before I included it as an active stock.  Unused fields allow for easy change of field sizes.  The reserved is for future fields.  Do not make any assumption about Unused or Reserved fields.  The Sector and Industry codes must be cross referenced with the Sectors.adm and Indgroups.adm files.

** - These fields are four space characters if unfilled.

Be warned, that operating on the factsheet is the most common source of aggrevating details for the customer.  Caching is used by UA.

4.3 c:\ua\mmanalyzer.csv

Multi-Market Analyzer (MMA) produces these file on a per-run basis.  MMA is a CSI product often bundled with UA.  These files are a CSV format file.  The format is subject to change and the file is not automatically updated, but rather is produced as a byproduct.  It is included here for convenience.

Suppose that you ran MMA with n series of data.

The first row of mmanalyzer.csv has column headings.

mmanalyzer.csv consists of the following columns.
Column 1 is the date.
Columns 2 to (n+1) are the closing prices lined up by date.
Columns (n+2) to (2*n+1) are the opening prices lined up by date.
Columns (2*n+2) to (3*n+1) are the Davis Unstretched Index values.
Columns (3*n+2) to (4*n+1) are zero.  In some older versions they contain the difference between the two components of the Davis Elasticity Index.
Columns (4*n+2) to (5*n+1) are the Davis Market Leadership Indices.
The (5*n+2) Column is the Davis Stretch Index.
Columns (5*n+3) to (6*n+2) are target positions for the 524 System.  (+1 for long, -1 for short)
Columns (6*n+3) to (7*n+2) are system orders for the 524 System.  (0 is nothing, 1 is buy, 2 is sell, 3 is reverse position by buying, 4 is reverse position by sellting, 5 is exit position.)  All orders are simulated as "at the open".
Columns (7*n+3) to (8*n+2) are the trailing stop values for the 524 System.
Columns (8*n+3) to (9*n+2) are the fixed stop values for the 524 System.
Columns (9*n+3) to (10*n+2) are the trade profits/losses for the 524 System in decimal.
Columns (10*n+3) to (11*n+2) are the cumulative profits/losses for the 524 System in decimal.
Columns (11*n+3) to (12*n+2) are the trade profits/losses for the 524 System in US Dollars per contract.
Columns (12*n+3) to (13*n+2) are the cumulative profits/losses for the 524 System in US Dollars per contract.
Columns (14*n+3) to (14*n+2) are the trade profits/losses for the 524 System scaled by historic volitility.
Columns (15*n+3) to (15*n+2) are the cumulative profits/losses for the 524 System scaled by historic volitility.
 
 

4.4 c:\ua\mmanal1.txt, mmanal2.txt, mmanal3.txt

These files are also produced when MMA runs.

mmanal1.txt is redundant.

mmanal2.txt contains the correlation table (n by n).

mmanal3.txt contains the factor analysis results.  Factors being on columns.  The first row of the file is the variance for the factor.  The second row is meaningless.  The remaining n rows give the waits for each commodity in each of the factors.

If you have a large n, the number of columns will exceed the tolerances of all known spreadsheets and/or editors.

4.5 c:\ua\mmanalyzer.bin

As with c:\ua\archives\mmanalyzer.csv, a binary image is provided which is easier to read handle for low-level languages.  The binary file is also created on-demand, and the format may change.

Suppose again that n is the number of futures used.
The first four 4-byte integers are as follows, the Number of Dates (NumDates), the Number of Markets (NumMarkets), the Davis Unstretched Index lead window(UnstretchedWindowSize), and the Davis Elasticity Index window size(LeadIndicatorWindow).
The next four 4-byte integers are reserved.
The next (NumDates) 4-byte integers are the dates of the prices used.
The next (NumMarkets) 4-byte integers are the csinum’s of the markets included.
The next (NumMarkets) 1-byte integers are flags denoting whether the csinum refers to a stock.
The next (NumDates*NumMarkets) 4-byte reals are the prices used in the calculation.
All prices for a given date are consecutive.
The next ((NumDates-UnstretchedWindowSize)*NumMarkets) 4-byte reals are the Davis Unstretched Index values.
The next ((NumDates- UnstretchedWindowSize)*NumMarkets) 4-byte reals were the differences between the two components of the Davis Elasticity Index, but are now zero.
The next (NumDates*(NumMarkets- UnstretchedWindowSize -LeadIndicatorWindow+1)) 4-byte reals are the Davis Market Leadership Index values.
The next (NumDates) 4-byte reals are the Davis Stretch Index values.
The next (NumDates*(NumMarkets- UnstretchedWindowSize -LeadIndicatorWindow+1)) 1-byte integers are the target positions for the 524 System.
The next (NumDates*(NumMarkets- UnstretchedWindowSize -LeadIndicatorWindow+1)) 1-byte integers are the system orders for the 524 System.
The next (NumMarkets) 4-byte reals are the number of US Dollars per full point for each of the markets.
The next (NumDates*(NumMarkets- UnstretchedWindowSize -LeadIndicatorWindow+1)) 4-byte reals integers are the trailing stops for the 524 System.
The next (NumDates*(NumMarkets- UnstretchedWindowSize -LeadIndicatorWindow+1)) 4-byte reals integers are the fixed stops for the 524 System.
The next (NumDates*(NumMarkets- UnstretchedWindowSize -LeadIndicatorWindow+1)) 4-byte reals integers are the trade profits/losses for the 524 System.
 

4.6 c:\ua\archives\stkscn.bin

This is a fixed record file containing all recent prices for a selection of stocks.  See the UA User Manual for instructions on how to setup UA to create this file.  The record is:

typedef struct {
      long csinum, date;
      char cvf, IsStock, unused1[2];
      float open, high, low, close;
      long vol, oi, unused2[3];
    } StockScanFileRecord;

The first record contains all zeros except the csinum which is 102 in this version and the date is the record size in bytes which is 48 in this version.  The ordering is by csinum and then by date.  The conversion factor, cvf, is the conversion factor as documented for API2.  IsStock is 1 for stocks and mutual funds.  It is 0 for futures and cash series.

4.7 c:\ua\archives\Sectors.adm

This file is a CSV file with Sector Code , "Description".

4.8 c:\ua\archives\Indgroups.adm

This file is a CSV file with Sector Code , Industry Code, "Description".
Industries are subsets of sectors.
 

5.  Studies/Indicators

5.1 How to add an indiciator.

UA scans the UA directory for "study*.ini" files.  Each .ini file has a corresponding DLL.  When UA runs a study, it creates an indata file.  UA calls the study function in the studyXXX.dll.  UA reads the outdata file, and plots the results.

5.2 Field Specifier

A field specifier is a character.  r1 is the immediate series.  If there is an overlayed graph, then a second series r2 may be present:

            case 'o': v = r1->open ; break;
            case 'h': v = r1->high ; break;
            case 'l': v = r1->low  ; break;
            case 'c': v = r1->close; break;
            case 'v': v = r1->vol  ; break;
            case 'V': v = r1->tvol ; break;
            case 'i': v = r1->oi   ; break;
            case 'I': v = r1->toi  ; break;
            case '$': v = r1->cash ; break;
            case 'O': v = r2->open; break;
            case 'H': v = r2->high; break;
            case 'L': v = r2->low  ; break;
            case 'C': v = r2->close; break;
            case 'w': v = r2->vol  ; break;
            case 'W': v = r2->tvol ; break;
            case 'p': v = r2->oi   ; break;
            case 'P': v = r2->toi  ; break;
            case 'S': v = r2->cash ; break;

5.3 The StudyXXX.Ini file

UA looks for the following fields in the Ini file.

"study","DESCRIPTION"  The selection description
"study","CHARTLABEL"  The charting abbreviation
"study","Inputs" A string formed from Field Specifiers
"study","Outputs" A string formed from Field Specifiers
"study","MIN" The minimum value attainable (Adjusted if data violates)
"study","MAX" The maximum value attainable (Adjusted if data violates)
"study","SharePriceScale" Use a value of 1 if the return is in price units or zero if not.
"study","MinimumNumDays" The user gets a warning if the number of data days is insufficient.
"study","NUMPARMS" How many parameters the study has (currently limited to 4)
For each parameter, UA looks for a section "parmX" where X is the parameter index (1-4).
parmX,"LABEL" Parameter Label
parmX,"CONTROL" Parameter Control type (EDIT or CHECK)
parmX,"DEFAULT" Default Value (TRUE=1, FALSE=0)
parmX,"TYPE" Parameter Type (INT or FLOAT)

5.4 indata file

A file name "indata" in the UA directory is created to provide the data for the study.  This file is in a space-seperated format.  Suppose that the study is listed to have m inputs.  The first line of the indata file contains the conversion factors for each of these inputs.  The remaining lines begin with the date and then each of the inputs for that date on that line.

5.5 The StudyXXX.dll file

Currently all DLL's must be 16-bit dlls.  The studyXXX.dll must export

    int FAR PASCAL Study( long *argc, long *Iparm, float *Fparm);

The arguments are defined as to be

long argc[5];
float Fparm[256];
long Iparm[256];

Where argc is
      argc[0]  = InDataCount;  (Number of data records in indata)
      argc[1]  = Inputs.GetLength(); (Number of parameters based on the ini file)
      argc[2]  = Outputs.GetLength(); (Number of outputs base on the ini file)
      argc[3]  = Number of Integer parameters
      argc[4]  = Number of Float parameters
FParm and Iparm contain the parameter values.

5.6 The outdata file

The study should write its results to the file "outdata" in the UA directory.  This file has the same format as the indata file.

5.7 To refresh study

As you are developing your study, you may notice that everytime you go to choose a study, it has reread the directory.  Everytime it runs a study, it is reopening the dll.
 

6. CSI data format partial specification


The following is taken from the CSI(R) Format document available at http://www.csidata.com/.  I have included the information about the QMASTER2 and DT2 files.  This is sufficient for Unfair Advantage users who do not need for their code to read data from other sources nor write data for a third party.

OVERVIEW

CSIr Format data is stored in a number of coordinated files all in the same directory.  Another directory may also contain CSI Format data, but there is no coordination of files between different directories.  Every a CSI Format data directory must have a master list file called QMASTER.  Starting with version 2.4.0 of Unfair Advantage, CSI Format data directories also contain a QMASTER2 file.  The QMASTER2 file contains and expands upon the information in the QMASTER file.  If the QMASTER2 file is present, it should be read rather than the QMASTER.

The master list identifies what data series are stored in the CSI Format data directory.  The Nth series described in the master list may have a deleted flag other than ‘0’.  If the deleted flag for the Nth series is ‘0’, then the directory may contain files FXXX.dta or FXXXXXXX.dta and FXXX.dt2 or FXXXXXXX.dt2.  If N is less than 1000, then the FXXX.dta and FXXX.dt2 should be looked for.  If N is 1000 or greater, then FXXXXXXX.dta and FXXXXXXX.dt2 should be looked for.  In either case the XXX represents a zero-filled, right-justified text representation of N.  For example, if N is 56, then you would look for F056.dta and F056.dt2.  If N were 5612, then you would look for F0005612.dta and F0005612.dt2.  Some software sorts the CSI directory.  They do this by rearranging the order of the entries in the QMASTER and QMASTER2 files and by renaming the corresponding data files.  It is important, therefore, to always search the master list file to find the item of interest.  Not to rely on the placement of the data within the master list file to be constant.

The FXXX.dt2/FXXXXXXX.dt2 file contains and expands upon the information contained in the corresponding FXXX.dta/FXXXXXXX.dta file.  If both the FXXX.dta and FXXX.dt2 files are present, the .dt2 file should be preferred.

FILE LAYOUT KEY

The QMASTER file consists of 120 or more QMASTER_REC records.  The QMASTER2 file consists of 120 or more QMASTER2_REC records.  A FXXX.dta/FXXXXXXX.dta file consists of one DTA_HEADER_REC record followed by one or more DTA_DATA_REC.  I should warn you that there may be more DTA_DATA_REC than data, but I will expand upon this later.  A FXXX.dt2/FXXXXXXX.dt2 file consists of one DT2_HEADER_REC record followed by one or more DT2_DATA_REC.  Again, there may be more records than data.

 QMASTER2_REC RECORD LAYOUT

QMASTER2_REC is a 128-byte record.  All binary fields are in Intel byte ordering.

************ Start of QMASTER2_REC description ***********

struct QMASTER2_REC {
 unsigned long csinum;
 unsigned long dydm;
     long strike; // + is call, - is put, 0 is non-option
     unsigned short cvf, ocvf;
     char period[1];   //File Type (D,W,M,Q,A)
     char comstock[1]; //Commodity/Stock Flag (C/S)
 char deleted[1];  //DeletedFlag
 char Unused1[1];
     char name[40];
     char Unused2[15];
     char unit[5];     //Pricing Unit
     char symbol[8];   //Stock or Commodity Symbol
     char Unused3[40];
};

The csinum is identified exchange market.  For example, 2 refers to Live Cattle trade on the Chicago Merchantile Exchange.  49622 refers to the common stock of New Millenium Media Intl trade on the NASDAQ Bulletin Board.  The csinum is only unique when qualified by the comstock flag, however.  For example, Unfair Advantage uses 2000 for both a stock and a commodity even that the stock and the commodity have nothing in common.

The delivery month field, dydm, refers to when the underlying security is deliverable.  Things which are immediately deliverable such as Grain Elevator price of Lean Hogs or which are never delivered such as stocks use the delivery month 54.  Otherwise the format is YYYYMM where YYYY is the year, including century, and MM is the month.  So the March 2002, contract would have the number 200203 in the dydm field.

The strike field, strike, refers to which option, if any, is being described.  A strike value of zero, means that the series is the stock or future itself, and not an option.  A positive value of the strike means that it is a CALL, the option right to buy something at the strike price.  A negative value means that it is a PUT, the option to sell something at the strike price (-strike).

The period field refers to what periodicity the data is summarized into.  The CSI format can handle no periodicity shorter than daily, but Daily, Weekly, Monthly, Quarterly, and Annually are supported.  Be prepared to ignore series with other values.

The combination of comstock, csinum, dydm, strike, and period is unique within a CSI Format data directory.  If the user were to wish to have a short history of Microsoft and a long history, this would not be allowed by Unfair Advantage.  If it were allowed, then since the date is not part of the QMASTER2_REC, you could have two completely duplicate records in the QMASTER2 file and not know which one the user wants you to process!

The deleted flag currently may contain a ‘0’, ‘1’, or ‘9’, though additional values may be assigned in the future.  A ‘0’ means that the record is active.  A ‘1’ means that the user have removed this data, but that it may be readded and updated in the future.  A value of ‘9’ means that the record has never been assigned any data.  Once you see your first ‘9’ record, you are free to stop looking further.
 

 Description of DT2 file

Unfair Advantage users will benefit from having a .DT2 file which extends the
DTA file.  Do not assume this file always exists.  Instead check for its existence and read the DTA file if it doesn’t exist.

************ Start of DT2_HEADER_REC description ***********

DT2_HEADER_REC RECORD LAYOUT

For the DT2 file, we have a header followed by records and then by holiday records as with the DTA file.  The new record length is 68 bytes with no padding bytes.

Structure
  FileEndRecord unsigned 4 byte integer;
  MaximumDatePointer unsigned 4 byte integer
  HighestHigh signed 4 byte integer
  LowestLow signed 4 byte integer
  FirstDate unsigned 4 byte integer (20001231 for example)
  LastDate unsigned 4 byte integer (20001231 for example)
  HighNumberFlag unsigned 1 byte integer (value 2 or greater for DT2)
  Remaining 43 bytes are reserved

All integers are in Intel byte ordering.  The record indices work the same as the DTA file.  The HighestHigh and LowestLow are in points.  The dates are in the form 20001231.  The HighNumberFlag is 2 or greater if there is a DT2 file present.
 

 ************ Start of DT2_DATA_REC description ***********

DT2_DATA_REC RECORD LAYOUT

  date is an unsigned 4 byte integer
  dydm is an unsigned 4 byte integer
  open is an signed 4 byte integer
  high is an signed 4 byte integer
  low is an signed 4 byte integer
  close is an signed 4 byte integer
  cash is an signed 4 byte integer
  ClosingBid is an signed 4 byte integer
  ClosingAsk is an signed 4 byte integer
  vol is an signed 4 byte integer
  oi is an signed 4 byte integer
  tvol is an signed 4 byte integer
  toi is an signed 4 byte integer
  dow is a signed 1 byte integer
  The remaining 15 bytes are reserved

The date is of the format 20001231.  The dydm field is of the format 200101 for the 2001 January contract.  For non-computed contracts this information is redundant with the QMASTER file, but for computed series, this field tells the users which contract was current on that day.  The prices are all in points stored as Intel integers.  The volume and open interest (and their totals) are also integers.  The dow is the same as for the DTA file.

If you are having trouble getting this to read correctly, make sure that you
have used a #pragma pack(1) or equivalent to avoid padding bytes.  Also be sure
that you are reading 4 byte integers with bytes 4, 3, 2, 1 is the
number 0x01020304 which is 16909060, not the Motorola ordering which would have
value 67305985.
 
 

7.  Conclusion


Hopefully this has clarified your options as a developer to access the vast resources available under Unfair Advantage.