Parsing Irregular Timestamps

Introduction

While most time series data is stored with regular timestamps in one column, this is not always the case. When your data storage or visualization tool does not have flexible settings to handle irregular data schema, the information cannot be read and understood correctly thus destroying its value.

This tutorial explains how to use CSV parsing features in ATSD to correctly interpret irregularly stored timestamp data.

Dataset

The dataset parsed in this tutorial is stored in the Axibase Dataset Catalog. The data tracks City of New York Department of Probation Youth Rearrest Rates. Data samples from the downloadable table are shown below:

Region Month Year Rate
Citywide March 2016 4.5
Citywide April 2016 4.7
Citywide May 2016 4.7
Citywide June 2016 4.8

Because the timestamp is split across two columns, working with the data from within the data.gov portal is impossible because defining either part of the timestamp as the x axis erroneously groups the data, considering either only the month or year of the data sample.

Access the plotly portal. Selecting either column creates an invalid visualization. The visualization sorted by Month is shown here:

The dataset tracks data from March 2016 to April 2018, thus most months contain two data points, April contains three, and August and September contain one.

CSV Parser

Download the dataset in CSV format.

Region,Month,Year,Rate
Citywide,March,2016,4.5
Citywide,April,2016,4.7
Citywide,May,2016,4.7
Citywide,June,2016,4.8
...

Using Excel, Numbers, or LibreOffice to concatenate the Month and Year columns is one solution, but adds an unneeded step when using ATSD.

Open the Data > CSV Parsers page and create a new parser for this file. Alternatively, download the parser configuration and follow the upload instructions to import the configuration into ATSD.

If you manually create the parser, complete the Timestamp Columns field to define both columns which contain a piece of the fragmented timestamp separated by semi-colon, comma, or line break.

Define the Timestamp Type as Pattern and specify the pattern in the Timestamp Pattern field.

See ATSD Documentation for more information about timestamp patterns.

During import, ATSD concatenates the specified columns to create the Timestamp Pattern. Each component of the timestamp may be defined by its own column and still be read by the database during import using the appropriate configuration.

Using ChartLab, the dataset is parsed to create a visualization that shows the time series data as it is meant to be portrayed.

The declarative Charts Syntax defines the entity and metric created by the parser upon import:

[series]
  entity = data.cityofnewyork.us
  metric = juvenile_rearrest_rate