Data Forecasting

Overview

ATSD includes a range of univariate forecasting algorithms that predict future series values based on historical data.

Supported algorithms:

Holt-Winters
ARIMA
Singular Spectrum Analysis

The accuracy of predictions depends on the frequency of data collection, the selection interval, and the algorithm parameters.

Forecasting Example with Abnormal Deviation:

Editor Settings

Enable data forecasting on the Data > Forecasts page.

General Settings

Enabled forecasts are prepared by background jobs on schedule according to cron expressions. Forecasting jobs are typically executed during off-peak hours.

Setting	Description
Retention Interval	Specifies how long a forecast is stored in the database. Forecasts that are older than `current time`, or End Time, if specified, minus Retention Interval are deleted.

Data Selection Settings

Setting	Description
Metric	Metric name for which forecasts are calculated.
Entity	If selected, forecasts are calculated for the specified entity. Supersedes Entity Group drop-down list. If neither entity nor entity group is specified, forecasts are prepared for all entities.
Entity Group	If selected, forecasts are calculated for entities contained in the specified entity group.
Tags	Prepare forecasts only for series containing the specified series tags.
End Time	End time of the Data Selection Interval and Series Selection Interval. This field supports calendar expressions, for example `current_day`. If not defined, the field is set to the time the job is run.
Data Selection Interval	Time frame for selecting detailed data used as forecast input. Specify the end of the interval in the End Time field, otherwise the end of the selection interval is set to current time.
Series Selection Interval	Ignore any series with `Last Insert Time` before End Time by more than the specified interval. Use this option to ignore series which have not been updated for a long time.
Calendar	Ignore detailed values within the time intervals listed in the calendar.
Empty Period Threshold	Ignore a series with a percentage of empty periods greater than the specified threshold. Calculated as `100 * (number of empty periods before interpolation)/(total number of aggregation periods in Data Selection Interval)`.

For data exclusion options, see Calendar Exception Settings.

Aggregation Settings

Setting	Description
Group By	Grouping key for merging multiple series into one. Detailed data for multiple series sharing the same grouping key are merged into one array prior to computing aggregate statistics.
Auto Aggregate	The server automatically calculates an aggregation period that produces the most accurate forecast, defined as having the lowest variance from observed historical data.
Aggregation Period	Period of time over which the detailed samples are aggregated.
Aggregate Statistic	Aggregation function applied to raw data to regularize the series. Aggregate values for empty periods without detailed data are interpolated as values of aggregate functions for previous periods.

Algorithm Parameters

Setting	Description
Algorithm	Holt-Winters or ARIMA forecasting algorithms.
Score Interval	Part of Data Selection Interval that is used to compute variance between observed values and forecast to rank forecasts by variance. The shorter the Score Interval, the more weight is assigned to recently observed values.
Auto Period	The server automatically calculates seasonality of the underlying series that produces the most accurate forecast, defined as having the lowest variance from observed historical data.
Period	Specify seasonality of the underlying series.
Auto Parameters	The server automatically calculates algorithm parameters that produce the most accurate forecast, defined as having the lowest variance from observed historical data.

Persistence Settings

Setting	Description
Forecast Name	Optional name used to differentiate between several forecasts for one underlying series. Use cases: `forecastName` field in Data API `forecast(name)` Rule Engine function `forecast-name` Chart setting
Default Forecast	Enable these settings instead of default settings when calculating on-demand forecast. On-demand forecast is calculated at request time if a pre-stored forecast is not available.
Forecast Range	Minimum and Maximum constraints applied to the stored forecast values to ensure that such values are within the specified range. Constraints are applied to the winning forecast after scoring stage.
Forecast Interval	The length of time into the future for which forecasts are prepared and stored in the database. Can be rounded upwards to the nearest forecast period.

Editor Tools

Editor Tools are located at the bottom of the Forecast Configuration page.

Tool	Description	Example
Calculate Parameters	Click Calculate Parameters to calculate algorithm parameters.
Run	Execute the forecast job. Use this tool to test a forecast. The response contains the number of calculated forecasts.
Export	Export forecast data in CSV format.
Show Meta	Display parameters used to calculate the forecast. Metadata is stored with the forecast. Collection interval is an interval within the real data extracted to build the forecast.

Use the split-button on the Data > Forecasts page to add Exceptions and perform Testing.

Using Forecasts

Rule Engine

Access pre-computed forecasts using forecast functions.
Use forecast values as thresholds to trigger response actions if observed values deviate from forecast values by some amount.
Compare forecast values to simple, weighted, and exponential moving averages.

abs(avg() - forecast().interpolated) > 25

This expression compares the moving average of some metric to the forecast and alerts if the absolute difference exceeds 25.

thresholdTime(null, 90, '1 DAY') != null

thresholdTime function returns Unix time in milliseconds when the forecast is expected to exceed 90 for the first time.
The condition becomes true if the expected violation is to occur in less than 1 day.

Ad hoc Export

Set Data Type setting to Forecast, optionally specify a forecast name:

Data API

Query and insert forecast values with Data API. The insert capability can be used to populate the database with custom forecast values calculated externally.

A sample forecast JSON query:

[
    {
        "entity": "nurswgvml007",
        "metric": "cpu_busy",
        "type": "FORECAST",
        "endDate": "now + 2 * hour",
        "startDate": "now "
    }
]

Open collapsed menu to view response.

[
  {
    "entity": "nurswgvml007",
    "metric": "cpu_busy",
    "tags": {},
    "type": "FORECAST",
    "aggregate": {
      "type": "DETAIL"
    },
    "meta": {
      "timestamp": "2018-05-15T08:20:00.000Z",
      "averagingInterval": 600000,
      "alpha": 0,
      "beta": 0.4,
      "gamma": 0.4,
      "period": {
        "count": 1,
        "unit": "DAY"
      },
      "stdDev": 7.224603272075089
    },
    "data": [
      {"d":"2018-05-15T08:20:00.000Z","v":11.604692968987015},
      {"d":"2018-05-15T08:30:00.000Z","v":14.052095586152106},
      {"d":"2018-05-15T08:40:00.000Z","v":15.715682104344845},
      {"d":"2018-05-15T08:50:00.000Z","v":11.604018743609409},
      {"d":"2018-05-15T09:00:00.000Z","v":12.507966355503251},
      {"d":"2018-05-15T09:10:00.000Z","v":12.59619153186056},
      {"d":"2018-05-15T09:20:00.000Z","v":11.092825413101579},
      {"d":"2018-05-15T09:30:00.000Z","v":11.747112803805937},
      {"d":"2018-05-15T09:40:00.000Z","v":11.137962830355074},
      {"d":"2018-05-15T09:50:00.000Z","v":11.40358025413789},
      {"d":"2018-05-15T10:00:00.000Z","v":16.728103701429056},
      {"d":"2018-05-15T10:10:00.000Z","v":12.75646043607565}
    ]
  }
]

Insert a forecast into ATSD using POST method:

POST /api/v1/series/insert

Payload:

[
    {
        "entity": "nurswgvml007",
        "metric": "mpstat.cpu_busy",
        "type": "FORECAST",
        "data": [
            {
                "t": 1462427358127,
                "v": 52
            }
        ]
    }
]

Additional examples:

Charts

Load forecasts into charts by setting data-type = forecast in the [series] section.

[series]
    entity = nurswgvml007
    metric = cpu_busy
    data-type = forecast

List of widget and series settings applicable to forecast data:

Name	Description	Example
`data-type`	Data type for the current series. Possible values: `history`, `forecast`, `forecast_deviation`, `lower_confidence`, `upper_confidence`. Example: `data-type = forecast`
`forecast-name`	Unique forecast identifier. Useful when creating multiple forecasts for the same series. Usually used to view imported forecasts generated with external tools. If no forecast name is set, the default forecast is loaded. Example: `forecast-name = hw5`
`forecast-style`	CSS styles applied to forecasts in `column` and `column-stack` modes. Example: `forecast-style = stroke: magenta;`
`style`	Render forecast as a solid line instead of dashed line. Example: `style = stroke-dasharray: none;`

# Data Forecasting

# Overview

# Reference

# Editor Settings

# General Settings

# Data Selection Settings

# Aggregation Settings

# Algorithm Parameters

# Persistence Settings

# Editor Tools

# Using Forecasts

# Rule Engine

# Ad hoc Export

# Data API

# Charts