Monthly Article
CSIMT Format
PAGE 1a

This Issue
March 1999
 

Various Topics
Page 1

Tech Talk
Page 2

Market Statistics 
Update &  IPO's
Page 3 


Notice:
  The views and information expressed in this document reflect the opinions and experience of the author Robert C. Pelletier.  Neither CSI nor the author undertake or intend to provide tax advice or trading advice in any market or endorse any outside individual or firm.  All recommendations are provided for their informational value only.  Readers should consult competent financial advisors or outside counsel before making any software purchase or investment decision.  CSI does not stand behind or endorse the products of any outside firms.


Copyright (c) 1998 Commodity Systems Inc. (CSI).  All rights are reserved.



 

 
CSI MillenniumT (CSIMT) format with CSI® Y2K
11/17/1998

   This document describes an adaptation of the abandoned CompuTrac® data format, which until recently was actively used by Equis' MetaStock® charting software.  CSI has decided to rename the format because the Y2K extensions made it unique to CSI's proprietary use.  CSI will continue updating to this format, with backward-compatible extensions to allow for updating past January 1, 2000 and through the end of the 21st century.

   The chat forums on the Internet present disturbing and conflicting stories about how high-profile software developers will handle the Y2K problem as it pertains to data formats. One source said that the expensive upgrade of a popular analysis program will exclude independent firms like CSI as data sources. If these rumors become reality, you may be asked to pay for something that will hurt, rather than help you in your trading efforts.

   We received an e-mail from Omega Research (makers of SuperCharts® and TradeStation®) saying that they will continue to support the CSI format into the year 2000. They are also considering support of the CSIMillennium format, but haven't made a decision yet. We have had unconfirmed reports that Equis (MetaStock®) plans to extend their older MetaStock format for 20 or 30 years into the next century, but no details have been supplied on how they might accomplish this and whether it will be publicly disclosed. There are many other smaller software developers who must become Y2K compliant.  Should developers elect to keep their formats secret and exclude outside data firms, may have to decide between abandoning either the CSI data service or the analysis software you have already purchased.

   CSI has upgraded the CSI format to be Y2K compliant, and we have extended the former CompuTrac format to operate through the 21st century.  Anyone contemplating a purchase of new software should insist that the program reads and writes to either the CSI format or the CSIMillennium format, the details of which are enclosed. We ask all users to pass this on to your favorite software producer and urge them to adopt the CSI formats in their new upgrades. We believe the best bet for all concerned is to use the CSIMillennium format, which is declared an open format and for which there is little danger that it will not immediately do the job. All of these issues are subject to clarification and solidification as the year 2000 approaches.

   If you want to assure that your database and analysis software remain compatible, let your software developer know that you wish to use CSI as your data source. This requires that analysis programs simply maintain compatibility with the CSIM and/or the CSI QuickTrieve format in addition to the other formats they support.

To voice your concern contact -

Omega Research:
OfficeOfThePresident@OmegaResearch.com

MetaStock
Sales@Equis.com
 
 

CSI Millennium Format Specification

Important Notice

   The CSIMillenniumT format and the CSIT basic format are trademarked properties of CSI. The descriptive material and the actual format structures are copyrighted © properties of CSI. All rights reserved. Both the formats and their specifications are restricted to registered users who are granted a license for their use.

   CSI claims a copyright on these formats to assure that no other firm will claim them and/or demand payment for their use in any way. We will not unreasonably withhold permission from those wishing to use these formats, even if the user is a competitor. No user of these formats will be paid any sum of money for developing CSI-compatible programs. . An email with the password necessary for download will be immediately forwarded to your email address.

Please consult the CSI's website for the most current details before working with this open-to-the-public format.
 

Current Specification

   To accurately access the data files within a given directory, the programmer must read that same directory's master file list, which uniquely identifies the specific market data files (time series) stored in that directory.  This master file list is named MASTER, and is comprised of up to 256 records, with each record being 53 bytes in length.  The fields are formatted as follows:
 

MASTER FILE RECORD LAYOUT (MASTER)

Record 1:

DESCRIPTION     Position  Length Format
Number of Entries    1-2  2 CVI
Last Entry Used     3-4  2 CVI
UNUSED     5-53  49

   The "Last Entry Used" field is accessed in order to assign the next file number to a .DAT/.DOP file combination.  At file creation, this field is initialized to zero, which indicates the first file to create will be F1.DAT.   This field can be ignored for programs that only need to read the data files.

   Special NOTE: Even though the "Number of Entries" field is two bytes in length, the stored file number is only one byte; therefore the maximum file number cannot exceed 255.  If the last entry used has the value 255, and the number of entries is less than 255, then you must scan the master file list for an unused number.  The pseudocode is shown below:

     FileNumbers()   - Array of integers holding the file numbers of the master file list
     NumberOfEntries  - number of master file list entries
 
     If NumberOfEntries=255 then FileNumber=0 else FileNumber=LastEntryUsed+1
     If  NumberOfEntries<255 andLastEntryUsed>254 then 
        FileNumber=0
        For x=1 to 255
           Found = False
           For y=1 to NumberOfEntries
              If FileNumbers(y)=x then 
                 Found=True
                 Exit for
              End if
           Next y
           If  NOT Found then 
 FileNumber=x 
              Exit for
           End If
        Next x
    End If
    If FileNumber=0 then No space to create in this directory
 
 

Records 2 through Number of entries+1:

DESCRIPTION     Position  Length Format
File Number   (1)     1  1 Byte
Reserved     2-3  2
Record Length     4  1 Byte Record Length, in bytes, of the data file
Number of Fields (2)    5  1 Byte
Reserved     6  1 Byte
Century Indicator (3)     7  1 Byte
Item Name     8-18  11 Character
Delivery Month     19-20  2 Character
Slash      21  1 Character
Delivery Year (last two digits)   22-23  2 Character
Reserved     24-25  2 ??
Minimum Date     26-29  4 MBF
Maximum Date     30-33  4 MBF
File Type (D,W,M)    34  1 Char
Reserved     35-36  2 Integer
Symbol Area     37-53  17 Character

The 17 byte symbol area is further divided as follows for usage by QuickTrieve (all ASCII characters):

Description:     Position  Length Format
Type Flag Indicator (4)    1  1 Character
For Non Options:
  Symbol (5)      2-7  6 Character
  Conversion Factor Code (6)   8  1 Character
  Third character of symbol if a commodity  9  1 Character
  Commodity Number if a commodity (7)  10-12  3 Character
For Options:
  Symbol     2-7  6 Character
  Conversion Factor Code (6)   8  1 Character
  Delivery Month Code (8)    9  1 Character
  Delivery Year (last two digits)   10-11  2 Character
  Strike Price (modulo 1000)   12-14  3 Character

   Unlike QuickTrieve, which uses commodity numbers for identification, CSI's Unfair Advantage system exclusively uses the first eight characters of the 17-character symbol area to uniquely identify stocks, futures and options.

NOTES:

   1.) The File Number represents the physical file number on disk for the corresponding data file.  For example, if the byte is a 5, then this record corresponds to data file F5.DAT and its companion file F5.DOP.  See the section entitled DESCRIPTOR FILE LAYOUT for a discussion on how the DOP files relate to the DAT files.

   2.) Number of data fields in the data file.  This will always be the record length divided by 4, since all data fields are 4-byte single precision floating point numbers.

   3.) The century indicator byte is used to signify the century of the delivery year for Commodities and stock options.  The following values may be found in this byte:

18:Delivery century is 1800's
19:Delivery century is 1900's
20:Delivery century is 2000's
21:Delivery century is 2100's

   Any other value is considered invalid and the delivery year will be assumed to fall within the 1921-2020 year period. If the delivery year is greater than 20 the century is assumed to be 1900's, and if the delivery year is less than or equal 20, the century is assumed to be 2000's.  Examples:  delivery year of 15=2015, delivery year of 21=1921.

   4.) Type Flag:  @=Non-option stock or commodity, 1=Commodity Option, 2=Stock Option.  If there is no number at position 10-12 (or if the number is zero), the item is a stock, otherwise it is a commodity.

   5.) For stocks, the symbol field is the CSI symbol.  For commodities, the symbol field is the first two characters of the CSI symbol (the third character of the CSI commodity symbol is stored at position 9), followed by the two digit delivery month, followed by the last two characters of the delivery year. The Delivery Month/Delivery Year combination must be stored at two different places within the master file record, including here in the symbol field. They are placed here as well as at position 19-23 because MetaStock requires a unique symbol for each data file, and because MetaStock would not otherwise display these important contract identifiers in selection screens.

   6.) Conversion Factor codes:  -4=Q   -3=P   -2=O   -1=N    0=K   1=J   2=I  3=H  4=G  5=F

   7.) Should the CSI commodity inventory ever exceed 999, please consult the CSI website for updated information 

   8.) Delivery Month Code for Options:  A-L = Delivery month 1-12 for CALLS   M-X=Delivery Month 1-12 for PUTS.

   9.) Users of this format should regularly consult the CSI website and this document for changes and announcements concerning the CSIM format.
 

DATA FILE RECORD LAYOUT

   Data is formatted on disk in a variable length record with all information in binary format.  The filename is determined by the File Number field of the master file entry, e.g. if the file number field contains a binary five, the physical data file name on disk is F5.DAT and the descriptor file is F5.DOP.  The record length is set by the Record Length field of the master file entry. 

NOTE FOR METASTOCK COMPATIBILITY:  MetaStock versions prior to version 6.5 restrict the flexibility inherent in the format by forcing a special case data file of length 28 bytes (7 fields).
The structure provides for one header record and many data records as follows:

Header Record (record 1 of the data file):
Description    Position  Length Format
Reserved    1-2  2 Integer
       Always set to binary 0
Last Posted Record   3-4  2 Integer

Data Records (Records 2-Last posted record)
The data records are variable according to the descriptor file described below.  The only two constants are 1) the first field is date, and 2) each field is a 4-byte single precision float in Microsoft Binary Format. 
 

DESCRIPTOR FILE LAYOUT

   The descriptor (.DOP) is a sequential (carriage return/linefeed delimited) file holding the names of all data fields present for a particular data file.  The number of records in this file is determined by the Number of Fields entry of the master file record.  Each record of the sequential file is of the format:

"FieldName",InputConversionFactor,DisplayConversionFactor

An example descriptor file is shown below:

"DATE",0,0
"OPEN",-3,-3
"HIGH",-3,-3
"LOW",-3,-3
"CLOSE",-3,-3
"VOL",0,0
"OI",0,0 

   The above example is typical of most data files.  The DATE, VOL and OI price fields always have a conversion factor of 0, while the OPEN, HIGH, LOW and CLOSE price fields have the conversion factor of the commodity represented. 
 

IMPORTANT NOTES ABOUT CONVERSION FACTORS:

   1) When reading CSIM files you generally do not have to worry about the input conversion factor.  This is because the stored numbers are all in adjusted decimal format and ready for internal calculation.  This is different from the CSI QuickTrieve format, which stores all values as whole numbers and conversion to decimal must be performed before doing arithmetic calculations.  The display conversion factor is used to display the scale on the chart for viewing by the end user.

   2) The original CompuTrac system assumes that negative conversion factors for raw market information are different from CSI's system of conversion factors used in QuickTrieve and Unfair Advantage applications.  Specifically, a conversion factor of -1 for the CompuTrac format means halves, and a conversion factor of -2 means quarters, for which the QuickTrieve format has no equivalent.  The CompuTrac conversion factor of -3 means eighths, which is equivalent to a QuickTrieve conversion factor of -1.  To summarize:  -1=halves, -2=quarters, -3=eighths,                 -4=sixteenths,  -5=thirty-seconds, -6=sixty-fourths. 
 

Also Please Note:

   1.) MBF stands for Microsoft Binary Format. It is a method of storing binary numbers that has subsequently been replaced by the IEEE standard format for most computer languages.  Most compilers have some type of conversion function that will convert from MBF to IEEE and back.  If not, ask your CSI marketing representative for our functions available for C, Delphi and Turbo Pascal applications that will perform this numeric conversion.

   2.) The first physical date and last physical date fields stored in the master file, as well as the date field in each data record, are stored in the following manner:
Dates in the 1900's are stored as they always have been, without the leading century.

   3.) Dates after December 31, 1999 are stored with a leading one to make a seven-digit number.  Examples: January 1, 2000=1000101, February 20, 2004=1040220.
To get the true date with century included, take the number in the date field and add 19000000.  Be sure to use a 4-byte integer or a double precision real to store this result.  Single precision reals do not accurately store numbers this large.
 
 

PAGE 1a