Uploading CSV Files

Overview

Upload CSV files via REST API or manually through the web interface.

To process a CSV file, create a CSV parser. A parser splits the file into lines to create rows and then into tokens to create columns. Finally, the parser converts the newly created array of cells into series, property, and message commands.

Configuration Settings

Setting Description
Enabled Parser status. If parser is disabled, uploaded CSV files referencing this parser are discarded.
Name Unique parser name used as identifier when uploading files.
Command Type Type of data contained in the file: time series, properties, messages.
Write Property Enable writing data both as series and as properties.
Entity Column Name of column in CSV file containing the entities. For example: host or node.
Multiple columns can be specified in the Entity Column field to concatenate their values into a composite entity name using a dash symbol as a token.
For example:
Source CSV file:
Year,Source,Destination,Travelers
1995,Moscow,Berlin,2000000
Entity Columns:
Source,Destination
Resulting Entity:
Moscow-Berlin
Entity Prefix Prefix added to entity names.
Default Entity All data written to specific entity.
Replace Entities Replace entity names in the input file with their aliases from the selected Replacement Table
Process Events Process incoming data in the Rule Engine in addition to storing it in the database.
Metric Prefix Prefix added to metric names.
Metric Name Column Column containing metric names.
Metric Value Column Column containing metric values.
Message Column Column containing message text.
Timestamp Columns Columns containing the Timestamp for each data sample. In some cases, depending on the CSV file, the Timestamp can be split into multiple columns. For example: Date, Time.
If there are two columns containing the Timestamp, then they are concatenated with a dash symbol (-) in the Timestamp Pattern field.
For example:
Source CSV File:
Date,Time,Sensor,Power
2015.05.15,09:15:00,sensor01,15
Timestamp Columns:
Date,Time
Result:
Date-Time
2015.05.15-09:15:00
Timestamp Pattern Setting:
yyyy.MM.dd-HH:mm:ss
Timestamp Type Pattern, Seconds (Unix time), Milliseconds (Unix time).
Predefined Pattern Predefined Timestamp formats.
Timestamp Pattern Custom timestamp format, specified manually. For example: dd-MMM-yy HH:mm:ss
If there are two columns containing the Timestamp, then in they are divided with a dash (-) in the pattern.
Timezone Diff Column Column containing the time difference calculated from UTC.
Time Zone Time zone for interpreting Timestamps.
Filter Expression applied to row. If expression returns false, the row is discarded.
Filter syntax:
Fields:
timestamp – timestamp in milliseconds. Computed by parsing date from Time Column with specified Time Format and converted into milliseconds.
row[columnName] – text value of cell in the specific column.
Functions:
number(columnName) – returns numeric value of cell, or NaN (Not a Number) if the cell contains text which is not numeric.
isNumber('columnName') – returns true if cell is a valid number.
isBlank('columnName') – returns true is cell is empty string.
upper(columnName) – converts cell text to uppercase.
lower(columnName) – converts cell text to lowercase.
date(endtime expression) – returns time in milliseconds.
Filter examples:
number(columnName) > 0
isNumber('columnName')
row[columnName] LIKE 'abc*'
upper(columnName) != 'INVALID'
timestamp > date(current_day)
timestamp > date(2018-08-15T00:00:00Z)
timestamp > date(now – 2 * year)
Tag Columns Columns converted to series tags.
Default Tags Predefined series tags, specified as name=value on multiple lines.
Ignored Columns List of columns ignored in METRIC and MESSAGE commands.
These columns are retained in PROPERTY commands.
Renamed Columns List of column names to substitute input column headers, one mapping per line.
Usage: inputname=storedname
Header Header to be used if the file contains no header or to replace existing header.
Schema Schema defines how to process cells based on their position.

Columns contained in the CSV file that are not specified in any field in the parser are imported as metrics.

Parse Settings

Setting Description
Delimiter Separator dividing values: comma, semi-colon, or tab.
Line Delimiter End-of-line symbol: EOL (\n, \r\n) or semi-colon ;
Text Qualifier Escape character to differentiate separator as literal value.
Comment Symbol Lines starting with comment symbol such as hash # are ignored.
Padding Symbol Symbol appended to text values until all cells in the given column have the same width.
Applies to fixed-length formats such as .dat format.
Decimal Separator Symbol used to mark the border between the integral and the fractional parts of a decimal numeral.
Default: comma.
Allowed values: period, comma.
Grouping Separator Symbol used to group thousands within the number.
Default: none.
Allowed values: none, period, comma, space.
Fields Lengths Width of columns in symbols. Padding symbols added to the end of the field to obey the fields lengths.
For files in .dat format.
Discard NaN NaN (Not a Number) values are discarded
Ignore Line Errors If enabled, any errors while parsing the given line are ignored, including date parse errors, number parse errors, split errors, mismatch of rows, and header column counts.
Ignore Header Lines Ignore top-N lines from the file header

Column-based Examples

Schema-based Parsing

In addition to column-based parsing described above, the database supports schema-based parsing using RFC 7111 selections:

select("#row=2-*").select("#col=3-*").
addSeries().
metric(cell(1, col)).
entity(cell(row, 2)).
tag(cell(1, 3),cell(row, 3)).
timestamp(cell(row, 1));

Produced series commands:

series e:nurswgvml007 d:2018-11-15T00:00:00Z m:space_used_%=72.2 t:disk=/dev/sda
series e:nurswgvml007 d:2018-11-15T00:00:00Z m:space_used_%=14.5 t:disk=/dev/sdb
series e:nurswgvml001 d:2018-11-15T00:00:00Z m:space_used_%=14.4 t:disk=/dev/sda1

Schema-based Examples