Description of Schema Based Parser
The below CSV file snippet shows the table structure to be parser by the CSV Parser in ATSD:
year,CTY_CODE,CTYNAME,IJAN,...,IDEC,IYR,EJAN,...,EDEC,EYR
1985,0001,OPEC,1733,...,2426,22801,1033,...,1186,12478
1986,0001,OPEC,2631,...,1327,19751,947,...,813,10844
1987,0001,OPEC,1344,...,1883,23952,739,...,1146,11057
Below is the table structure that is represented. The first line contains column headers:
year | CTY_CODE | CTYNAME | IJAN | ... | IDEC | IYR | EJAN | ... | EDEC | EYR |
---|---|---|---|---|---|---|---|---|---|---|
1985 | 0001 | OPEC | 1733 | ... | 2426 | 22801 | 1033 | ... | 1186 | 12478 |
1986 | 0001 | OPEC | 2631 | ... | 1327 | 19751 | 947 | ... | 813 | 10844 |
2016 | 0001 | OPEC | 1344 | ... | 1883 | 23952 | 739 | ... | 1146 | 11057 |
Here is the CSV parser schema:
select('#row=2-*')
.select('#col=4-28')
.filter(!cell(1,col).endsWith('YR'))
.filter(cell(row,1) != 2016 || cell(1,col).substring(1) != 'DEC')
.addSeries()
.entity('usa')
.metric('us-trade-' + (cell(1,col).startsWith('E')?'export':'import') )
.tag('ctyname', cell(row, 3))
.tag('cty_code', cell(row,2))
.timestamp(cell(row,1)+'-'+cell(1,col).substring(1));
Select Rows
select('#row=2-*')
This method selects elements from the table based on the above expression.
In this case, '#row=2-*'
means select all rows starting with row Index 2, ignoring the first row.
Index | 1 | 2 | 3 | 4 | 5-14 | 15 | 16 | 17 | 18-27 | 28 | 29 |
---|---|---|---|---|---|---|---|---|---|---|---|
2 | 1985 | 0001 | OPEC | 1733 | ... | 2426 | 22801 | 1033 | ... | 1186 | 12478 |
3 | 1986 | 0001 | OPEC | 2631 | ... | 1327 | 19751 | 947 | ... | 813 | 10844 |
4 | 1987 | 0001 | OPEC | 1344 | ... | 1883 | 23952 | 739 | ... | 1146 | 11057 |
Select Columns
select('#col=4-28')
This expression '#col=4-28''
selects all columns with indexes from 4 to 28. After selection, the following cells are created:
Index | 4 | 5-14 | 15 | 16 | 17 | 18-27 | 28 |
---|---|---|---|---|---|---|---|
2 | 1,733 | ... | 2,426 | 22,801 | 1,033 | ... | 1,186 |
3 | 2,631 | ... | 1,327 | 19,751 | 947 | ... | 813 |
4 | 1,344 | ... | 1,883 | 23,952 | 739 | ... | 1,146 |
Filter Cells
Do not select columns that are contained in a column which describes summary values for the year. The columns end with the YR
suffix.
filter(!cell(1,col).endsWith('YR'))
For filtering cells, use the filter
command that takes a boolean condition as a parameter. The method checks every cell against this condition.
row
and col
Current Indexes: While iterating through cells, retrieve index values for columns and rows by using the row
and col
variables accordingly.
cell
Method
The cell
method returns the value of the cell with listed row and column indexes.
For example, for the first iteration, retrieve the value of the cell that is contained in the first row and the fourth column. It has the value IJAN
.
Standard JavaScript Methods
The cell method returns a value as a string. Use standard JavaScript methods and properties from the String class.
Use the endsWith
method to determine whether a string ends with the YR
string or not.
Filtering Cells
Discard the column with the header IYR
from cells and work with following cells:
Index | 1 | 2 | 3 | 4 | 5-14 | 15 | 17 | 18-27 | 28 |
---|---|---|---|---|---|---|---|---|---|
2 | 1985 | 0001 | OPEC | 1733 | ... | 2426 | 1033 | ... | 1186 |
3 | 1986 | 0001 | OPEC | 2631 | ... | 1327 | 947 | ... | 813 |
4 | 1987 | 0001 | OPEC | 1344 | ... | 1883 | 739 | ... | 1146 |
Filter Last Missing Data
Discard data for December 2016. Use this filter:
filter(cell(row,1) != 2016 || cell(1,col).substring(1) != 'DEC')
For example, in the first iteration check when this condition resolves to true
:
filter((1985 != 2016) || ('JAN' != 'DEC'))
Add Series
Use the addSeries()
method to add series for each of the cells.
addSeries()
After filtering the correct cells, iterate through the cells and use the cell value as the series
value. But first, specify the necessary series fields.
Series Fields
1. Entity
Specify the entity with the entity
method that takes the entity name as the string parameter.
entity('usa')
2. Metric
To specify the metric, use the metric
command.
metric('us-trade-' + (cell(1,col).startsWith('E')?'export':'import') )
Next, decide which metric to use for the series. The metric depends on the first letter in the column header. If the header starts with 'E'
, use 'us-trade-export'
.
Otherwise, the name of metric is 'us-trade-import'
.
3. Tags
Specify tags by using the tag
method which takes key
and value
of a tag as the parameter.
tag('ctyname', cell(row, 3))
.tag('cty_code', cell(row,2))
Using this method, tags are defined as ctyname
and cty_code
, respectively.
4. Timestamp
Choose the appropriate string that describes the time of the series sample,
which is parsed by a timestamp pattern. Accomplish this by using the timestamp
method:
timestamp(cell(row,1)+'-'+cell(1,col).substring(1))
Result
In this case, pass a string that is the result of concatenated values of the first column in the row and a substring of the current column header.
For example, the first iteration returns a series with these fields:
| Date | Metric | Entity | Tags | Value |
|----------------------|-----------------|--------|----------------------------------|-------|
| 1985-01-01T00:00:00Z | us-trade-import | usa | cty_code = 0001, ctyname = OPEC | 1,733 |