RECORD LAYOUT OF CSI(R) FORMAT DATA 5/16/2000 OVERVIEW CSIr Format data is stored in a number of coordinated files all in the same directory. Another directory may also contain CSI Format data, but there is no coordination of files between different directories. Every CSI Format data directory must have a master list file called QMASTER. Starting with version 2.4.0 of Unfair Advantage, CSI Format data directories also contain a QMASTER2 file. The QMASTER2 file contains and expands upon the information in the QMASTER file. If the QMASTER2 file is present, it should be read rather than the QMASTER. The master list identifies what data series are stored in the CSI Format data directory. The Nth series described in the master list may have a deleted flag other than '0'. If the deleted flag for the Nth series is '0', then the directory may contain files FXXX.dta or FXXXXXXX.dta and FXXX.dt2 or FXXXXXXX.dt2. If N is less than 1000, then the FXXX.dta and FXXX.dt2 should be looked for. If N is 1000 or greater, then FXXXXXXX.dta and FXXXXXXX.dt2 should be looked for. In either case the XXX represents a zero-filled, right-justified text representation of N. For example, if N is 56, then you would look for F056.dta and F056.dt2. If N were 5612, then you would look for F0005612.dta and F0005612.dt2. Some software sorts the CSI directory. They do this by rearranging the order of the entries in the QMASTER and QMASTER2 files and by renaming the corresponding data files. It is important, therefore, to always search the master list file to find the item of interest; not to rely on the placement of the data within the master list file to be constant. The FXXX.dt2/FXXXXXXX.dt2 file contains and expands upon the information contained in the corresponding FXXX.dta/FXXXXXXX.dta file. If both the FXXX.dta and FXXX.dt2 files are present, the .dt2 file should be preferred. FILE LAYOUT KEY The QMASTER file consists of 120 or more QMASTER_REC records. The QMASTER2 file consists of 120 or more QMASTER2_REC records. A FXXX.dta/FXXXXXXX.dta file consists of one DTA_HEADER_REC record followed by one or more DTA_DATA_REC. Be warned that there may be more DTA_DATA_REC than data. In general a data file contains holiday record at least until the end of the current month. Near the end of the month, however, there could be none of these. Since some software uses these records for "What-If" analysis, Unfair Advantage allows the user to specify that when the number of trailing holiday records goes bellow a minimum threshold, that Unfair Advantage will expand the next month early. In the future these extensions may have nothing to do with the current month so simply code assuming that there may be some trailing holiday records to ignore. A FXXX.dt2/FXXXXXXX.dt2 file consists of one DT2_HEADER_REC record followed by one or more DT2_DATA_REC. Again, there may be more records than data. QMASTER2_REC RECORD LAYOUT QMASTER2_REC is a 128-byte record. All binary fields are in Intel byte ordering. ************ Start of QMASTER2_REC description *********** struct QMASTER2_REC { unsigned long csinum; unsigned long dydm; long strike; // + is call, - is put, 0 is non-option unsigned short cvf, ocvf; char period[1]; //File Type (D,W,M,Q,A) char comstock[1]; //Commodity/Stock Flag (C/S) char deleted[1]; //DeletedFlag char Unused1[1]; char name[40]; char Unused2[15]; char unit[5]; //Pricing Unit char symbol[8]; //Stock or Commodity Symbol char Unused3[40]; }; The csinum identifies exchange market. For example, 2 refers to Live Cattle traded on the Chicago Merchantile Exchange. 49622 refers to the common stock of New Millenium Media Intl traded on the NASDAQ Bulletin Board. The csinum is only unique when qualified by the comstock flag, however. For example, Unfair Advantage uses 2000 for both a stock and a commodity even that the stock and the commodity have nothing in common. The delivery month field, dydm, refers to when the underlying security is deliverable. Things which are immediately deliverable such as Grain Elevator price of Lean Hogs or which are never delivered such as stocks use the delivery month 54. Otherwise the format is YYYYMM where YYYY is the year, including century, and MM is the month. So the March 2002, contract would have the number 200203 in the dydm field. The strike field, strike, refers to which option, if any, is being described. A strike value of zero, means that the series is the stock or future itself, and not an option. A positive value of the strike means that it is a CALL, the option right to buy something at the strike price. A negative value means that it is a PUT, the option to sell something at the strike price (-strike). The period field refers to what periodicity the data is summarized into. The CSI format can handle no periodicity shorter than daily, but Daily, Weekly, Monthly, Quarterly, and Annually are supported. Be prepared to ignore series with other values. The combination of comstock, csinum, dydm, strike, and period is unique within a CSI Format data directory. If the user were to wish to have a short history of Microsoft and a long history, this would not be allowed by Unfair Advantage. If it were allowed, then since the date is not part of the QMASTER2_REC, you could have two completely duplicate records in the QMASTER2 file and not know which one the user wants you to process! The deleted flag currently may contain a '0', '1', or '9', though additional values may be assigned in the future. A '0' means that the record is active. A '1' means that the user have removed this data, but that it may be readded and updated in the future. A value of '9' means that the record has never been assigned any data. Once you see your first '9' record, you are free to stop looking further. Description of DT2 file Unfair Advantage users will benefit from having a .DT2 file which extends the DTA file. Do not assume this file always exists. Instead check for its existence and read the DTA file if it doesn't exist. ************ Start of DT2_HEADER_REC description *********** DT2_HEADER_REC RECORD LAYOUT For the DT2 file, we have a header followed by records and then by holiday records as with the DTA file. The new record length is 68 bytes with no padding bytes. Structure FileEndRecord unsigned 4 byte integer; MaximumDatePointer unsigned 4 byte integer HighestHigh signed 4 byte integer LowestLow signed 4 byte integer FirstDate unsigned 4 byte integer (20001231 for example) LastDate unsigned 4 byte integer (20001231 for example) HighNumberFlag unsigned 1 byte integer (value 2 or greater for DT2) Remaining 43 bytes are reserved All integers are in Intel byte ordering. The record indices work the same as the DTA file. The HighestHigh and LowestLow are in points. The dates are in the form 20001231. The HighNumberFlag is 2 or greater and doesn't tell you anything currently. ************ Start of DT2_DATA_REC description *********** DT2_DATA_REC RECORD LAYOUT date is an unsigned 4 byte integer dydm is an unsigned 4 byte integer open is an signed 4 byte integer high is an signed 4 byte integer low is an signed 4 byte integer close is an signed 4 byte integer cash is an signed 4 byte integer ClosingBid is an signed 4 byte integer ClosingAsk is an signed 4 byte integer vol is an signed 4 byte integer oi is an signed 4 byte integer tvol is an signed 4 byte integer toi is an signed 4 byte integer dow is a signed 1 byte integer The remaining 15 bytes are reserved The date is of the format 20001231. The dydm field is of the format 200101 for the 2001 January contract. For non-computed contracts this information is redundant with the QMASTER file, but for computed series, this field tells the users which contract was current on that day. The prices are all in points stored as Intel integers. The volume and open interest (and their totals) are also integers. The dow is the same as for the DTA file. If you are having trouble getting this to read correctly, make sure that you have used a #pragma pack(1), packed records, or equivalent to avoid padding bytes. Also be sure that you are reading 4 byte integers with bytes 4, 3, 2, 1 as the number 0x01020304 which is 16909060, not the Motorola ordering which would have value 67305985. QMASTER_REC RECORD LAYOUT QMASTER_REC is a 64-byte record. All fields in this file are in ASCII Format. ************ Start of QMASTER_REC description *********** MASTER FILE RECORD LAYOUT (QMASTER) DESCRIPTION POSITION LENGTH CSI I. D. Number 1-4 4 Commodity Name 5-24 20 File Type (D,W,M) 25 1 Delivery Month 26-27 2 Delivery Year (last two digits) 28-29 2 Conversion Factor 30-31 2 Pricing Unit 32-36 5 Commodity Symbol 37-38 2 Commodity/Stock Flag (C/S) 39 1 Option Flag (P=Put, C=Call, N=Normal) 40 1 Striking Price 41-45 5 Stock or Commodity Symbol 46-51 6 Beginning with QuickTrieve 4.04, the commodity symbol is placed here as well as at position 37. The commodity symbol can now be up to three characters, but the other commodity symbol field is only two characters. The full three character field can be found here. Deleted/Not deleted (0=not deleted, 1=deleted) 52 1 If an entry is marked as deleted (ASCII "1") then there is no corresponding data file for this record. Delivery Year (First two digits) 53-54 2 This field is new for QT 4.08. It contains the century indicator of the delivery year. Since older data files will have invalid information here, QuickTrieve uses the following algorithm: Century= 53-54 Year= 28-29 If Century < 19 or Century > 21 then if Year <20 then century=20 else century=19 End If Reserved 55-61 7 Optional display conversion factor 62-63 2 If the number is in the range -4 to +5(the - or + is required for this field), then this number is used for DISPLAY purposes in QUICKPLOT/QUICKSTUDY. For example, if the numbers for this commodity or stock are sent from CSI in 64ths but you wanted to have this item displayed using eighths, you would set the conversion factor at position 30 to -4, and set the display conversion factor to -1. Stock Number Extension Byte 64 1 If this position is an ASCII 1,2 or 3, then it represents the leftmost digit of the CSI stock number. For example, if the CSI I.D. number field has 3509 and this field has a 1, then the real CSI stock number is 13509. SPECIAL NOTE: It is important for developers to be aware that the physical file name assigned to a particular time series may change. For example, if the user sorts the master file (using QuickTriever, Unfair Advantager, or other third party software), the new file name for a given time series will be dependant on that series' new position within the master file after a sort, i.e. the physical file name will change. If a program maintains a file list independent from the QMASTER file, steps should be taken to ensure that the data file being read is in fact the desired time series. ************ End of QMASTER_REC description *********** DATA FILE RECORD LAYOUT For Unfair Advantage users, a second data file ending in .DT2 is provided. Please see following section for details. SPECIAL NOTE ON NUMERIC RANGE EXTENSION FOR DATA FIELDS. Look for the section entitled EXTENSION BITS for a discussion of how QuickTrieve represents numbers over 64K. To summarize, original versions of QuickTrieve only allowed numbers 0-64K. Later versions introduced the extension bits, with two bits allocated for each data field, allowing values from 0-256K. As certain time series approached this upper limit, it became necessary to allocate an additional two bits for each field, extending the range to 1024K. Data is formatted on disk in a fixed length, 32 byte record with all information in binary format. The filename assigned to the file is determined by the associated position within the master file (QMASTER). For example, the data file for record 5 of the QMASTER file is F005.DTA The structure provides for one header record and many data records as follows: Header Record (record 1 of the data file): Description Position Length Format FILE END RECORD POINTER 1-4 4 MBF Record number of the last record in this file assigned to hold data. There are usually more records physically in the file past this point, but they should be ignored, as they do not hold price data. MAXIMUM DATE POINTER 5-8 4 MBF Number of the last record to actually hold price information. In the course of daily data collection, each successive day collected in chronological sequence normally increases this number by 1. At file creation, this field is set to 0. Your programs normally look at this field and not the FILE END RECORD POINTER when determining where to stop looking for data in the file. HIGHEST HIGH 9-12 4 MBF The highest high entered into this data file. LOWEST LOW 13-16 4 MBF The lowest low entered into this data file. FIRST PHYSICAL DATE ON FILE 17-20 4 MBF This is the date of the first date on file at record 2. LAST PHYSICAL DATE ON FILE 21-24 4 MBF This is the date of the last date on file (the date pointed to by the FILE END RECORD POINTER) HIGH NUMBERS ALLOWED FLAG 25-25 1 ASCII If this position is anything except an ASCII "0", then large numbers (numbers over 65536 for OHLC,Noon,Cash Fields) are allowed in this file. If this position is an ASCII "0" (zero), The OHLC,Noon and Cash fields are all assumed to be less than 65536. This field should almost never be set to "0". The only reason it was included was to provide backward compatibility with data files created with other software that may not have properly cleared out the expansion fields (described next) that hold the information for the larger numbers. RESERVED 26-29 4 RESERVED 30-30 1 RESERVED 31-32 2 NOTE 1: MBF stands for Microsoft Binary Format. It is a method of storing binary numbers that has subsequently been replaced by the IEEE standard format for most computer languages. Most compilers have some type of conversion function that will convert from MBF to IEEE and back. If not, we have functions available for C, Delphi and Turbo Pascal that will perform this numeric conversion. NOTE 2: The first physical date and last physical date fields described above, and the date field in each data record, described below, are stored in the following manner: Dates in the 1900's are stored as they always have been, without the leading century. Dates after December 31,1999 are stored with a leading one to make a seven digit number. Examples:January 1,2000=1000101, February 20,2004=1040220. To get the true date with century included, take the number in the date field and add 19000000. Be sure to use a 4 byte integer or a double precision real to store this result. Single precision reals to not accurately store numbers this large. DATA RECORDS: Description Position Length 1 ) Date 1 4 2 ) Day of week 5 1 3 ) Open 6 2 4 ) High 8 2 5 ) Low 10 2 6 ) Close 12 2 7 ) Noon 14 2 8 ) Cash 16 2 9 ) Total Volume 18 3 10) Total Open Interest 21 3 11) Contract Volume 24 3 12) Contract Open Interest 27 3 13) Extended bits for OHLC,Noon,Cash 30 3 The Date is stored as a 4 byte single as a MBF (Microsoft Binary Format) floating point number. See the notes in the header record description regarding MBF numbers and how to interpret dates past 1999. The day of week is stored as a binary value in the range 1-9. For the values 1-5, this corresponds to Monday-Friday. a value of 9 means a holiday or simply that that record has no data yet and is to be ignored. Fields 3-8 are 2 byte unsigned integers. Fields 9-12 are 3 byte price fields that can be computed as follows: 1) The first two bytes are evaluated as an unsigned integer. 2) The third byte is a binary number in the range 0-255 that can be evaluated using the ASC function. This value is multiplied by 65536, and the result is added to the value from step one to compute the value of the field. EXTENSION BITS FOR DATA FIELDS: Field 13 is formatted as follows: Byte 30:Bits 7,6 = Primary extension to OPEN 5,4 = Primary extension to HIGH 3,2 = Primary extension to LOW 1,0 = Primary extension to CLOSE Byte 31:Bits 7,6 = Primary extension to Noon Quote field 5,4 = Primary extension to Cash Price Field 3,2 = Secondary extension to Noon Quote field 1,0 = Secondary extension to Cash Price Field Byte 32:Bits 7,6 = Secondary extension to OPEN 5,4 = Secondary extension to HIGH 3,2 = Secondary extension to LOW 1,0 = Secondary extension to CLOSE Primary extension - bits 16-17 of the number. This extends the range to 256K. This is the extension defined in previous definitions of the CSI format, and older data values are interpreted exactly as before. Secondary extension - bits 18-19 of the number. This extends the range to 1024K The extension bits represent the most significant bits of the field, and extend the range of the fields from 0-64K to 0-1024K. A complete demonstration program is available from CSI that contains the routines to perform the numeric conversion, but a short example follows: Function TwoBits%(A$,P%) Static ` Extract two bits from a byte. Support routine for Function AdjustQTNumber M%=ASC(A$) SELECT CASE P% CASE 6:M%=M%\64 ' ISOLATE BITS 6 AND 7 CASE 4:M%=(M% AND &H30)\16 ' ISOLATE BITS 4 AND 5 CASE 2:M%=(M% AND &HC)\4 ' ISOLATE BITS 2 AND 3 CASE 0:M%=(M% AND &H3) ' ISOLATE BITS 0 AND 1 END SELECT TwoBits%=M% End Function Function AdjustQTNumber!(Temp!,HighByte$,LowByte$,PosHigh%,PosLow%, ConversionFactor%) Static ' Adjust a QuickTrieve number Open-High-Low-Close-Noon Quote-Cash ' Parameters: ' Temp! - First 16 (0-15) bits of the Field, converted from Integer to floating point 'LowByte$ - byte holding bits 16-17 of the number. 'HighByte$- byte holding bits 18-19 of the number. 'PosLow% - position within LowByte$ that hold ' the desired two bits 'PosHigh% - position within HighByte$ that hold ' the desired two bits ' ' Table of bytes, positions used for fields(LowByte$,HighByte$ are the byte position within the 32 byte data record ' Field LowByte$ HighByte$ PosLow% PosHigh% ' Open 30 32 6 6 ' High 30 32 4 4 ' Low 30 32 2 2 ' Close 30 32 0 0 ' Noon 31 31 6 2 ' Cash 31 31 4 0 ' ' ConversionFactor% - Number in the range of -4 to +5 that represents the CSI conversion factor ' Function returns the adjusted floating point number ' If Temp!<0 THEN Temp!=Temp!+65536 m%=TwoBits%(HighByte$,PosHigh%)*4+TwoBits%(LowByte$,PosLow%) IF m%<>0 Then Temp!=m%*65536!+Temp! CALL P2D(Temp!,TempDecimal!,ConversionFactor%) ' Convert to Decimal Form AdjustQTNumber!=TempDecimal! End Function ************ End of CSI(R) Data file description *************** The files COMCONS and STKCONS2 are only created and maintained by the Quicktrieve. They are not created or maintained for Unfair Advantage users. Unfair Advantage users should consult the Developer's Document which comes with Unfair Advantage for information on the corresponding files. COMMODITY CONSTANTS FILE RECORD LAYOUT The commodity constants file name is COMCONS. The commodity constants file is used in file creation to transfer information such as commodity name and symbol to the master file (QMASTER). It is normally NOT necessary to access this file when accessing QUICKTRIEVE data files. It is necessary to access this file only if you are reading a raw history file (HISTDATA format) or raw daily dump file (DAILY) and you need to look up a conversion factor or other pertinent information. The commodity constants file is random access file with a logical record length of 32. This file contains 1800 records, ordered as follows: 1-300 first half of information for commodities 1-300 301-600 second half of information for commodities 1-300 601-900 first half of information for commodities 301-600 901-1200 second half of information for commodities 301-600 1201-1500 first half of information for commodities 601-900 1501-1800 second half of information for commodities 601-900 The following pseudocode shows how to get the information for a particular commodity: offset = 0 recnum = commodity number if recnum >300 and recnum <=600 then offset=300 if recnum >600 then offset=600 retrieve record at recnum+offset, put in first half of 64 byte buffer retrieve record at recnum+offset+300, put in second half of 64 byte buffer Assuming the 64 byte buffer area has been filled with both parts of information for a given commodity, the area is defined as follows: Description Position Length Commodity Name 1-20 20 Conversion Factor 21-22 2 Pricing Unit 23-27 5 Commodity Symbol 28-30 3 Reserved 31-32 2 Secondary pricing unit 33-42 10 CSI Dollars per point 43-52 10 Option conversion factor 53-54 2 Reserved 55-64 10 ************ End of Commodity constants file description *************** STOCK CONSTANTS FILE RECORD LAYOUT The stock constants file name is STKCONS2. There are 28999 records in the file, correspoinding to CSI stock numbers 1001-29999. It is normally NOT NECESSARY to access this file when accessing QUICKTRIEVE data files. It is necessary to access this file only if you are reading a raw history file (HISTDATA format) or a raw daily dump file(DAILY) and you need to look up a conversion factor or other pertinent information about a stock. The logical record length is 49, divided into the following fields: DESCRIPTION POSITION LENGTH Stock Name 1-20 20 Conversion Factor 21-22 2 Stock Symbol 23-28 6 Stock Industry group code 29-30 2 This is a 2 character group code assigned by CSI. There are approximately 200 different group codes. Stock Exchange 31-31 1 Specifies which exchange this issue is traded on. This field is optional. Current codes are: A=Amex, O=OTC, N=NASDAQ, M=Mutual Fund, I=Institutional Reserved 32-36 5 Currency 37-38 2 Currency the stock is traded in. Optional field US=US Dollars, BP= British Pound, CD=Canadian Dollar, SF=Swiss Franc, JY=Japanese Yen First date on file at CSI 39-44 6 This is the start date of CSI's database for this issue. If this date is unknown, this field will be 999999. The primary reason for this field is to allow estimates of the cost of historical orders before actually placing them with CSI. Display Conversion Factor 45-46 2 This field is the conversion factor QUICKPLOT/QUICKSTUDY will use when DISPLAYING this stock to the screen. Usually the same as the primary conversion factor. DO NOT USE THIS FIELD FOR INTERPRETATION OF RAW DATA. The conversion factor at position 21-22 is always used to convert the raw data into decimal form for analysis. Reserved 47-49 3 ************ End of stock constants file description *************** QUICKTRIEVE, QUICKMANAGER, QUICKPLOT and QUICKSTUDY are registered trademarks of Commodity Systems, Inc. Boca Raton,FL USA