Description of Schema Based Parser

The below CSV file snippet shows the table structure to be parser by the CSV Parser in ATSD:

year,CTY_CODE,CTYNAME,IJAN,...,IDEC,IYR,EJAN,...,EDEC,EYR
1985,0001,OPEC,1733,...,2426,22801,1033,...,1186,12478
1986,0001,OPEC,2631,...,1327,19751,947,...,813,10844
1987,0001,OPEC,1344,...,1883,23952,739,...,1146,11057

Below is the table structure that is represented. The first line contains column headers:

year CTY_CODE CTYNAME IJAN ... IDEC IYR EJAN ... EDEC EYR
1985 0001 OPEC 1733 ... 2426 22801 1033 ... 1186 12478
1986 0001 OPEC 2631 ... 1327 19751 947 ... 813 10844
2016 0001 OPEC 1344 ... 1883 23952 739 ... 1146 11057

Here is the CSV parser schema:

select('#row=2-*')
    .select('#col=4-28')
    .filter(!cell(1,col).endsWith('YR'))
    .filter(cell(row,1) != 2016 || cell(1,col).substring(1) != 'DEC')
    .addSeries()
    .entity('usa')
    .metric('us-trade-' + (cell(1,col).startsWith('E')?'export':'import') )
    .tag('ctyname', cell(row, 3))
    .tag('cty_code', cell(row,2))
    .timestamp(cell(row,1)+'-'+cell(1,col).substring(1));

Select Rows

select('#row=2-*')

This method selects elements from the table based on the above expression. In this case, '#row=2-*' means select all rows starting with row Index 2, ignoring the first row.

Index 1 2 3 4 5-14 15 16 17 18-27 28 29
2 1985 0001 OPEC 1733 ... 2426 22801 1033 ... 1186 12478
3 1986 0001 OPEC 2631 ... 1327 19751 947 ... 813 10844
4 1987 0001 OPEC 1344 ... 1883 23952 739 ... 1146 11057

Select Columns

select('#col=4-28')

This expression '#col=4-28'' selects all columns with indexes from 4 to 28. After selection, the following cells are created:

Index 4 5-14 15 16 17 18-27 28
2 1,733 ... 2,426 22,801 1,033 ... 1,186
3 2,631 ... 1,327 19,751 947 ... 813
4 1,344 ... 1,883 23,952 739 ... 1,146

Filter Cells

Do not select columns that are contained in a column which describes summary values for the year. The columns end with the YR suffix.

filter(!cell(1,col).endsWith('YR'))

For filtering cells, use the filter command that takes a boolean condition as a parameter. The method checks every cell against this condition.

Current Indexes: row and col

While iterating through cells, retrieve index values for columns and rows by using the row and col variables accordingly.

cell Method

The cell method returns the value of the cell with listed row and column indexes.

For example, for the first iteration, retrieve the value of the cell that is contained in the first row and the fourth column. It has the value IJAN.

Standard JavaScript Methods

The cell method returns a value as a string. Use standard JavaScript methods and properties from the String class. Use the endsWith method to determine whether a string ends with the YR string or not.

Filtering Cells

Discard the column with the header IYR from cells and work with following cells:

Index 1 2 3 4 5-14 15 17 18-27 28
2 1985 0001 OPEC 1733 ... 2426 1033 ... 1186
3 1986 0001 OPEC 2631 ... 1327 947 ... 813
4 1987 0001 OPEC 1344 ... 1883 739 ... 1146

Filter Last Missing Data

Discard data for December 2016. Use this filter:

filter(cell(row,1) != 2016 || cell(1,col).substring(1) != 'DEC')

For example, in the first iteration check when this condition resolves to true:

filter((1985 != 2016) || ('JAN' != 'DEC'))

Add Series

Use the addSeries() method to add series for each of the cells.

addSeries()

After filtering the correct cells, iterate through the cells and use the cell value as the series value. But first, specify the necessary series fields.

Series Fields

1. Entity

Specify the entity with the entity method that takes the entity name as the string parameter.

entity('usa')

2. Metric

To specify the metric, use the metric command.

metric('us-trade-' + (cell(1,col).startsWith('E')?'export':'import') )

Next, decide which metric to use for the series. The metric depends on the first letter in the column header. If the header starts with 'E', use 'us-trade-export'. Otherwise, the name of metric is 'us-trade-import'.

3. Tags

Specify tags by using the tag method which takes key and value of a tag as the parameter.

tag('ctyname', cell(row, 3))
.tag('cty_code', cell(row,2))

Using this method, tags are defined as ctyname and cty_code, respectively.

4. Timestamp

Choose the appropriate string that describes the time of the series sample, which is parsed by a timestamp pattern. Accomplish this by using the timestamp method:

timestamp(cell(row,1)+'-'+cell(1,col).substring(1))

Result

In this case, pass a string that is the result of concatenated values of the first column in the row and a substring of the current column header.

For example, the first iteration returns a series with these fields:

| Date                 | Metric          | Entity | Tags                             | Value |
|----------------------|-----------------|--------|----------------------------------|-------|
| 1985-01-01T00:00:00Z | us-trade-import | usa    | cty_code = 0001, ctyname = OPEC  | 1,733 |