Compression Tests
Overview
The following tests estimate the amount of disk space required to store the 1-minute OHLC dataset containing 10+ million time:value
samples in different databases.
Results
Universal Table Schema
Database | Version | Compressed | Bytes per Sample | Test Link |
---|---|---|---|---|
ATSD | 17340 | 1.9 | View | |
Microsoft SQL Server | 14.0.1000.169 | 42.9 | View | |
Microsoft SQL Server | 14.0.1000.169 | 89.5 | View | |
MySQL | 5.7 | 34.5 | View | |
MySQL | 5.7 | 70.7 | View | |
Oracle | v12.2.0.1 | 39.3 | View | |
Oracle | v12.2.0.1 | 52.5 | View | |
PostgreSQL | 9.6 | 83.7 | View | |
Vertica | 7.1.1-0 | 5.6 | View |
Specific Table Schema
Database | Version | Compressed | Bytes per Sample | Test Link |
---|---|---|---|---|
Microsoft SQL Server | 14.0.1000.169 | 9.4 | View | |
Microsoft SQL Server | 14.0.1000.169 | 19.3 | View | |
MySQL | 5.7 | 8.2 | View | |
MySQL | 5.7 | 15.6 | View | |
Oracle | v12.2.0.1 | 9.4 | View | |
Oracle | v12.2.0.1 | 13.4 | View | |
PostgreSQL | 9.6 | 21.6 | View | |
Vertica | 7.1.1-0 | 2.4 | View |
Dataset
The dataset represents 20+ years of historical stock trade data with 1-minute OHLC (open-high-low-close) bars available from the Kibot company.
Minutely trade statistics are available for IBM stock traded on the New York Stock Exchange. Recording begins on February 1st, 1998 and continues until the most recent trading day.
The data is provided in the commonly used OHLCV format.
Date,Time,Open,High,Low,Close,Volume
01/02/1998,09:30,104.5,104.5,104.5,104.5,67000
...
09/08/2017,17:38,142.45,142.45,142.45,142.45,3556
The file contains over 2 million lines. The OHLC metrics contain values with up to four decimal places. The volume metric is an integer. The dates are recorded in US/Eastern
time.
Download the records from
http://api.kibot.com/?action=history&symbol=IBM&interval=1&unadjusted=0&bp=1&user=guest
.
Each row consists of five metrics per 1-minute interval:
09/08/2017,15:42,142.53,142.5399,142.49,142.49,10031
...
time = 09/08/2017 15:42
open = 142.53
high = 142.5399
low = 142.49
close = 142.49
volume = 10031
Schema Alternatives
The disk space usage tests are performed using two schema options:
Specific Table
The Specific Table schema creates a named column for each metric.
With this option, the table is designed to store a fixed set of columns (metrics) for the given entity type. A separate table is required for each type of entity.
Example: IBM Tivoli Data Warehouse
Universal Table
The Universal Table schema uses a Metric ID column to reference the metric defined in a separate lookup table.
This option supports the addition of new metrics without altering the data table itself.
Example: Microsoft System Center Operations Manager Data Warehouse