Compression Tests

Overview

The following tests estimate the amount of disk space required to store the 1-minute OHLC dataset containing 10+ million time:value samples in different databases.

Results

Universal Table Schema

Database Version Compressed Bytes per Sample Test Link
ATSD 17340 1.9 View
Microsoft SQL Server 14.0.1000.169 42.9 View
Microsoft SQL Server 14.0.1000.169 89.5 View
MySQL 5.7 34.5 View
MySQL 5.7 70.7 View
Oracle v12.2.0.1 39.3 View
Oracle v12.2.0.1 52.5 View
PostgreSQL 9.6 83.7 View
Vertica 7.1.1-0 5.6 View

Specific Table Schema

Database Version Compressed Bytes per Sample Test Link
Microsoft SQL Server 14.0.1000.169 9.4 View
Microsoft SQL Server 14.0.1000.169 19.3 View
MySQL 5.7 8.2 View
MySQL 5.7 15.6 View
Oracle v12.2.0.1 9.4 View
Oracle v12.2.0.1 13.4 View
PostgreSQL 9.6 21.6 View
Vertica 7.1.1-0 2.4 View

Dataset

The dataset represents 20+ years of historical stock trade data with 1-minute OHLC (open-high-low-close) bars available from the Kibot company.

Minutely trade statistics are available for IBM stock traded on the New York Stock Exchange. Recording begins on February 1st, 1998 and continues until the most recent trading day.

The data is provided in the commonly used OHLCV format.

Date,Time,Open,High,Low,Close,Volume
01/02/1998,09:30,104.5,104.5,104.5,104.5,67000
...
09/08/2017,17:38,142.45,142.45,142.45,142.45,3556

The file contains over 2 million lines. The OHLC metrics contain values with up to four decimal places. The volume metric is an integer. The dates are recorded in US/Eastern time.

Download the records from http://api.kibot.com/?action=history&symbol=IBM&interval=1&unadjusted=0&bp=1&user=guest.

Each row consists of five metrics per 1-minute interval:

09/08/2017,15:42,142.53,142.5399,142.49,142.49,10031
...
time   = 09/08/2017 15:42
open   = 142.53
high   = 142.5399
low    = 142.49
close  = 142.49
volume = 10031

Schema Alternatives

The disk space usage tests are performed using two schema options:

Specific Table

The Specific Table schema creates a named column for each metric.

With this option, the table is designed to store a fixed set of columns (metrics) for the given entity type. A separate table is required for each type of entity.

Example: IBM Tivoli Data Warehouse

tdw_schema

Universal Table

The Universal Table schema uses a Metric ID column to reference the metric defined in a separate lookup table.

This option supports the addition of new metrics without altering the data table itself.

Example: Microsoft System Center Operations Manager Data Warehouse

scom_schema