Group Processor
The transformation groups multiple input series and applies a statistical function to grouped values, producing new series.
The transformation is implemented as follows:
Load detailed data for each series separately (input series).
Combine input series into groups based on the
groupByEntityAndTags
setting.Split each group into subgroups based on the
place
setting.Create multi-value series for each subgroup. Multi-value series is a list of pairs:
(timestamp, samples of several series)
ordered by timestamps. The way series is constructed depends on theperiod
setting.a. The
period
is specified explicitly in the query.
Split each seriestime:value
array into periods. Discard periods with start time earlier thanstartDate
or greater thanendDate
. For each period create the pair(timestamp, collection of samples)
and add it to the multi-valued series. Thetimestamp
equals to the period start time. Thecollection of samples
consists of multiple series samples within the period. Thus a series sample belongs to thecollection of samples
if its timestamp belongs to the period.b. The
period
is not specified.
In this case the multi-value series consists of pairs(timestamp, multiple series samples with given timestamp)
, for eachtimestamp
present in the input series within the subgroup.Interpolate each multi-value series according to the
interpolate
field.Truncate each multi-value series if
truncate
field istrue
.Calculate aggregated series for each specified statistical function, and each multi-value series. Statistical functions are set in the
type
, ortypes
setting. To build aggregated series, the statistical function is applied to the samples for each pair in a multi-value series. Thus aggregated series contains sample(timestamp, aggregated value)
per each pair(timestamp, samples)
of multi-value series. Theaggregated value
is value of the statistical function applied to thesamples
.
Fields
Name | Type | Description |
---|---|---|
groupByEntityAndTags | array | Array of tag names which determines how series are grouped. Default: null . |
type | string | [type or types is Required] Statistical function applied to values with the same timestamp or within the same period, if the period is specified. The type can be set to DETAIL in which case no grouping is performed and the underlying series is returned unchanged. |
types | array | [type or types is Required] Array of statistical functions. Each function in the array produces a separate grouped series. If one of the functions is set to DETAIL , its result contains the underlying series. |
period | object | Period. Splits the merged series into periods and applies the statistical function to values in each period separately. |
interpolate | object | Interpolation function to fill gaps in input series (no period) or in grouped series (if period is specified). |
truncate | boolean | Discards samples at the beginning of the interval until values for all input series are established. Default: false . |
place | object | Allocate grouped series to subgroups based on constraints and the objective function specified in the place object. Default: null . |
Series Grouping
If the groupByEntityAndTags
setting is not set in the request, all input series are placed into a single group.
If the array is empty, the input series are placed in the same group if they have the same entity.
"groupByEntityAndTags": []
If the array of tag names is not empty, any two series belong to the same group if and only if they have the same entity, and for each specified tag name either both series have the same value of the tag, or both do not have the tag.
"groupByEntityAndTags": ["tag-name-1", "tag-name-2"]
Grouping Example
Consider a set of series with the following entities and tags:
| index | entity | tag-name-1 | tag-name-2 |
|-------|--------|-------------|-------------|
| 1 | e-1 | | |
| 2 | e-1 | tag-value-1 | |
| 3 | e-1 | | tag-value-1 |
| 4 | e-1 | tag-value-1 | tag-value-2 |
| 5 | e-1 | tag-value-2 | |
| 6 | e-2 | | |
| 7 | e-2 | | tag-value-1 |
The request groups series by entity and tag with name tag-name-1
:
"groupByEntityAndTags": ["tag-1-name"]
The result contains 4
groups:
{1, 3}
- Same entitye-1
and notag-1-name
{2, 4}
- Same entitye-1
and sametag-1-name
value{5}
- Only one series with a differenttag-name-1
value{6, 7}
- Same entitye-2
and notag-1-name
View the full example for more details.
Grouping Functions
COUNT
MIN
MAX
AVG
SUM
PERCENTILE(n)
MEDIAN
STANDARD_DEVIATION
MEDIAN_ABS_DEV
FIRST
LAST
MIN_VALUE_TIME
MAX_VALUE_TIME
Interpolation
Interpolation Fields
Name | Type | Description |
---|---|---|
type | string | [Required] Interpolation function. |
value | number | [Required by VALUE function] Constant number used to set value for the missing periods. |
extend | boolean | Add missing periods at the beginning and the end of the selection interval. Default: false . |
Values added by extend
setting are determined as follows:
- If the
VALUE {n}
interpolation function is specified, theextend
option sets empty leading/trailing period values to equal{n}
. - Without the
VALUE {n}
function, theextend
option adds missing periods at the beginning and end of the selection interval using theNEXT
andPREVIOUS
interpolation functions.
Interpolation Functions
Type | Description |
---|---|
NONE | No interpolation. Periods without any detailed values are excluded from results. |
PREVIOUS | Set value for the period based on the previous period value. |
NEXT | Set value for the period based on the next period value. |
LINEAR | Calculate period value using linear interpolation between previous and next period values. |
VALUE | Set value for the period to a specific number. |
Examples
Data
Detailed Data by Series
| entity | datetime | value |
|--------|----------------------|-------|
| e-1 | 2016-06-25T08:00:00Z | 1 |
| e-2 | 2016-06-25T08:00:00Z | 11 |
| e-1 | 2016-06-25T08:00:05Z | 3 | e-1 only
| e-1 | 2016-06-25T08:00:10Z | 5 | e-1 only
| e-1 | 2016-06-25T08:00:15Z | 8 |
| e-2 | 2016-06-25T08:00:15Z | 8 |
| e-1 | 2016-06-25T08:00:30Z | 3 |
| e-2 | 2016-06-25T08:00:30Z | 13 |
| e-1 | 2016-06-25T08:00:45Z | 5 |
| e-2 | 2016-06-25T08:00:45Z | 15 |
| e-2 | 2016-06-25T08:00:59Z | 19 | e-2 only
Detailed Data Grouped by Timestamp
| datetime | e1.value | e2.value |
|----------------------|----------|----------|
| 2016-06-25T08:00:00Z | 1 | 11 |
| 2016-06-25T08:00:05Z | 3 | - |
| 2016-06-25T08:00:10Z | 5 | - |
| 2016-06-25T08:00:15Z | 8 | 8 |
| 2016-06-25T08:00:30Z | 3 | 13 |
| 2016-06-25T08:00:45Z | 5 | 15 |
| 2016-06-25T08:00:59Z | - | 19 |
No Aggregation
When aggregation is disabled, the group
function is applied to values for all unique timestamps in the merged series.
In the example below, the SUM
function returns 12 = (1+11)
at 2016-06-25T08:00:00Z
as a total of e-1
and e-2
series values, both of which have samples this timestamp.
On the other hand, the SUM
returns 3 = (3 + null->0)
at 2016-06-25T08:00:05Z
because only e-1
series has a value at that timestamp.
[
{
"startDate": "2016-06-25T08:00:00Z",
"endDate": "2016-06-25T08:01:00Z",
"entities": ["e-1", "e-2"],
"metric": "m-1",
"group": {
"type": "SUM"
}
}
]
[{"entity":"*","metric":"m-1","tags":{},"entities":["e-1","e-2"],"type":"HISTORY",
"aggregate":{"type":"DETAIL"},
"group":{"type":"SUM"},
"data":[
{"d":"2016-06-25T08:00:00Z","v":12.0},
{"d":"2016-06-25T08:00:05Z","v":3.0},
{"d":"2016-06-25T08:00:10Z","v":5.0},
{"d":"2016-06-25T08:00:15Z","v":16.0},
{"d":"2016-06-25T08:00:30Z","v":16.0},
{"d":"2016-06-25T08:00:45Z","v":20.0},
{"d":"2016-06-25T08:00:59Z","v":19.0}
]}]
| datetime | e1.value | e2.value | SUM |
|----------------------|----------|----------|-----|
| 2016-06-25T08:00:00Z | 1 | 11 | 12 |
| 2016-06-25T08:00:05Z | 3 | - | 3 |
| 2016-06-25T08:00:10Z | 5 | - | 5 |
| 2016-06-25T08:00:15Z | 8 | 8 | 16 |
| 2016-06-25T08:00:30Z | 3 | 13 | 16 |
| 2016-06-25T08:00:45Z | 5 | 15 | 20 |
| 2016-06-25T08:00:59Z | - | 19 | 19 |
Truncation
Truncation discards timestamps at the beginning of the interval until all of the merged values have a value.
The example below uses startDate
of 2016-06-25T08:00:01Z
.
The first time is MAX(MIN(series_sample_time))
, the last time is MIN(MAX(series_sample_time))
.
MAX(MIN(series_sample_time))
= 2016-06-25T08:00:15Z
.
MIN(MAX(series_sample_time))
= 2016-06-25T08:00:45Z
.
| datetime | e1.value | e2.value | SUM |
|----------------------|----------|----------|-----|
| 2016-06-25T08:00:05Z | 3 | - | 3 | discarded because time < MAX(MIN(series_sample_time))
| 2016-06-25T08:00:10Z | 5 | - | 5 | discarded because time < MAX(MIN(series_sample_time))
| 2016-06-25T08:00:15Z | 8 | 8 | 16 |
| 2016-06-25T08:00:30Z | 3 | 13 | 16 |
| 2016-06-25T08:00:45Z | 5 | 15 | 20 |
| 2016-06-25T08:00:59Z | - | 19 | 19 | discarded because time > MIN(MAX(series_sample_time))
Samples for series e-1
at 2016-06-25T08:00:05Z
and at 2016-06-25T08:00:10Z
are discarded because there is no value for series e-2 until 2016-06-25T08:00:15Z
.
Sample for series e-2
at 2016-06-25T08:00:59Z
is discarded because there is no value for series e-1
after 2016-06-25T08:00:45Z
.
[
{
"startDate": "2016-06-25T08:00:01Z",
"endDate": "2016-06-25T08:01:00Z",
"entities": ["e-1", "e-2"],
"metric": "m-1",
"group": {
"type": "SUM",
"truncate": true
}
}
]
[{"entity":"*","metric":"m-1","tags":{},"entities":["e-1","e-2"],"type":"HISTORY",
"aggregate":{"type":"DETAIL"},
"group":{"type":"SUM","truncate":true},
"data":[
{"d":"2016-06-25T08:00:15Z","v":16.0},
{"d":"2016-06-25T08:00:30Z","v":16.0},
{"d":"2016-06-25T08:00:45Z","v":20.0}
]}]
| datetime | e1.value | e2.value | SUM |
|----------------------|----------|----------|-----|
| 2016-06-25T08:00:15Z | 8 | 8 | 16 |
| 2016-06-25T08:00:30Z | 3 | 13 | 16 |
| 2016-06-25T08:00:45Z | 5 | 15 | 20 |
Extend
An opposite operation to truncation, extend adds missing values at the beginning and end of the interval to ensure that all merged series have values when the group
function is applied.
| datetime | e1.value | e2.value | SUM |
|----------------------|----------|----------|-----|
| 2016-06-25T08:00:05Z | 3 | 8 + | 11 | e2.value extended to start at the beginning of the interval
| 2016-06-25T08:00:10Z | 5 | 8 + | 13 | e2.value extended to start at the beginning of the interval
| 2016-06-25T08:00:15Z | 8 | 8 | 16 |
| 2016-06-25T08:00:30Z | 3 | 13 | 16 |
| 2016-06-25T08:00:45Z | 5 | 15 | 20 |
| 2016-06-25T08:00:59Z | 5 + | 19 | 24 | e1.value extended until the end of the interval
[
{
"startDate": "2016-06-25T08:00:01Z",
"endDate": "2016-06-25T08:01:00Z",
"entities": ["e-1", "e-2"],
"metric": "m-1",
"group": {
"type": "SUM",
"interpolate": {
"type": "NONE",
"extend": true
}
}
}
]
[{"entity":"*","metric":"m-1","tags":{},"entities":["e-1","e-2"],"type":"HISTORY",
"aggregate":{"type":"DETAIL"},
"group":{"type":"SUM","interpolate":{"type":"NONE","value":0.0,"extend":true}},
"data":[
{"d":"2016-06-25T08:00:05Z","v":11.0},
{"d":"2016-06-25T08:00:10Z","v":13.0},
{"d":"2016-06-25T08:00:15Z","v":16.0},
{"d":"2016-06-25T08:00:30Z","v":16.0},
{"d":"2016-06-25T08:00:45Z","v":20.0},
{"d":"2016-06-25T08:00:59Z","v":24.0}
]}]
Extend is similar to interpolation where missing values at the beginning of in interval are interpolated with NEXT
type, and missing values at the end of the interval are interpolated with PREVIOUS
type.
| datetime | e1.value | e2.value | SUM |
|----------------------|----------|----------|-----|
| 2016-06-25T08:00:05Z | 3 | 8 +(NEXT)| 11 |
| 2016-06-25T08:00:10Z | 5 | 8 +(NEXT)| 13 |
| 2016-06-25T08:00:15Z | 8 | 8 | 16 |
| 2016-06-25T08:00:30Z | 3 | 13 | 16 |
| 2016-06-25T08:00:45Z | 5 | 15 | 20 |
| 2016-06-25T08:00:59Z | 5 +(PREV)| 19 | 24 |
Since extend
is performed prior to truncation, truncate
setting has no effect on extended results.
Interpolation
Interpolation fills the gaps in the detailed series. Its behavior depends on the period
parameter specified in the group processor.
period
parameter is not specified
The interpolate
function is applied to two consecutive samples of the same series to calculate an interim value for a known timestamp.
Query:
[{
"startDate": "2016-06-25T08:00:00Z",
"endDate": "2016-06-25T08:01:00Z",
"entities": ["e-1", "e-2"],
"metric": "m-1",
"group": {
"type": "SUM",
"interpolate": { "type": "PREVIOUS" }
}
}]
Response:
[{"entity":"*","metric":"m-1","tags":{},"entities":["e-1","e-2"],"type":"HISTORY",
"aggregate":{"type":"DETAIL"},
"group":{"type":"SUM","interpolate":{"type":"PREVIOUS","value":0.0,"extend":false}},
"data":[
{"d":"2016-06-25T08:00:00Z","v":12.0},
{"d":"2016-06-25T08:00:05Z","v":14.0},
{"d":"2016-06-25T08:00:10Z","v":16.0},
{"d":"2016-06-25T08:00:15Z","v":16.0},
{"d":"2016-06-25T08:00:30Z","v":16.0},
{"d":"2016-06-25T08:00:45Z","v":20.0},
{"d":"2016-06-25T08:00:59Z","v":19.0}
]}]
Two interpolated values are added to the second series:
| datetime | e1.value | e2.value | SUM |
|----------------------|----------|----------|-----|
| 2016-06-25T08:00:00Z | 1 | 11 | 12 |
| 2016-06-25T08:00:05Z | 3 | 11 (PREV)| 14 |
| 2016-06-25T08:00:10Z | 5 | 11 (PREV)| 16 |
| 2016-06-25T08:00:15Z | 8 | 8 | 16 |
| 2016-06-25T08:00:30Z | 3 | 13 | 16 |
| 2016-06-25T08:00:45Z | 5 | 15 | 20 |
| 2016-06-25T08:00:59Z | - | 19 | 19 |
period
parameter is specified
Assume that t1
, t2
, t3
are timestamps of consecutive periods, and the series has no samples in the t2
period. Then interpolated value of the t2
period is calculated based on two samples: (t1, v1)
and (t3, v3)
, where v1
- is the last series value within the t1
period, and v3
is the first series value within the t3
period.
Query:
[ {
"startDate": "2016-06-25T08:00:00Z",
"endDate": "2016-06-25T08:01:00Z",
"entities": ["e-1", "e-2"],
"metric": "m-1",
"group": {
"type": "SUM",
"period": {"count": 10, "unit": "SECOND"},
"interpolate": {"type": "PREVIOUS"}
}
}]
Response
[{
"entity": "*", ...,
"data": [
{"d": "2016-06-25T08:00:00Z", "v": 15},
{"d": "2016-06-25T08:00:10Z", "v": 21},
{"d": "2016-06-25T08:00:20Z", "v": 16},
{"d": "2016-06-25T08:00:30Z", "v": 16},
{"d": "2016-06-25T08:00:40Z", "v": 20},
{"d": "2016-06-25T08:00:50Z", "v": 19}
]
}]
Interpolated values added to each of the grouped series:
| | | | group | e1 grouped | e2 grouped | |
| datetime | e1.value | e2.value | timestamp | interpolated | interpolated | SUM |
|----------------------|----------|----------|-----------------------------------------------------------
| 2016-06-25T08:00:00Z | 1 | 11 | 2016-06-25T08:00:00Z | 1, 3 | 11 | 15 |
| 2016-06-25T08:00:05Z | 3 | - | | | | |
| 2016-06-25T08:00:10Z | 5 | - | 2016-06-25T08:00:10Z | 5, 8 | 8 | 21 |
| 2016-06-25T08:00:15Z | 8 | 8 | | | | |
| 2016-06-25T08:00:20Z | - | - | 2016-06-25T08:00:20Z | 8 (PREV) | 8 (PREV) | 16 |
| 2016-06-25T08:00:30Z | 3 | 13 | 2016-06-25T08:00:30Z | 3 | 13 | 16 |
| 2016-06-25T08:00:40Z | - | - | 2016-06-25T08:00:40Z | 5 | 15 | 20 |
| 2016-06-25T08:00:45Z | 5 | 15 | | | | |
| 2016-06-25T08:00:50Z | - | - | 2016-06-25T08:00:50Z | - | 19 | 19 |
| 2016-06-25T08:00:59Z | - | 19 | | | | |
Group Aggregation
By default, the group
function is applied to all unique sample times from the merged series.
To split values into periods, specify a period.
[
{
"startDate": "2016-06-25T08:00:00Z",
"endDate": "2016-06-25T08:01:00Z",
"entities": ["e-1", "e-2"],
"metric": "m-1",
"group": {
"type": "SUM",
"period": {"count": 10, "unit": "SECOND"}
}
}
]
[{"entity":"*","metric":"m-1","tags":{},"entities":["e-1","e-2"],"type":"HISTORY",
"aggregate":{"type":"DETAIL"},
"group":{"type":"SUM","period":{"count":10,"unit":"SECOND","align":"CALENDAR"}},
"data":[
{"d":"2016-06-25T08:00:00Z","v":15.0},
{"d":"2016-06-25T08:00:10Z","v":21.0},
{"d":"2016-06-25T08:00:30Z","v":16.0},
{"d":"2016-06-25T08:00:40Z","v":20.0},
{"d":"2016-06-25T08:00:50Z","v":19.0}
]}]
This is equivalent to Group <-> Aggregation
processing in case of SUM
+SUM
functions.
[
{
"startDate": "2016-06-25T08:00:00Z",
"endDate": "2016-06-25T08:01:00Z",
"entities": ["e-1", "e-2"],
"metric": "m-1",
"aggregate": {
"type": "SUM",
"period": {"count": 10, "unit": "SECOND"}
},
"group": {
"type": "SUM",
"period": {"count": 10, "unit": "SECOND"}
}
}
]
Aggregation -> Group
The Aggregation -> Group
order creates aggregate series for each of the merged series and then performs grouping of the aggregated series.
The timestamps used for grouping combine period start times from the underlying aggregated series.
| 10-sec period start | e1.COUNT | e2.COUNT | SUM |
|----------------------|----------|----------|-----|
| 2016-06-25T08:00:00Z | 2 | 1 | 3 |
| 2016-06-25T08:00:10Z | 2 | 1 | 3 |
| 2016-06-25T08:00:20Z | - | - | - | Period not created because there are no detailed values in the [00:20-00:30) period for any series.
| 2016-06-25T08:00:30Z | 1 | 1 | 2 |
| 2016-06-25T08:00:40Z | 1 | 1 | 2 |
| 2016-06-25T08:00:50Z | 0 | 1 | 1 |
[
{
"startDate": "2016-06-25T08:00:00Z",
"endDate": "2016-06-25T08:01:00Z",
"entities": ["e-1", "e-2"],
"metric": "m-1",
"aggregate": {
"type": "COUNT",
"period": {"count": 10, "unit": "SECOND"}
},
"group": {
"type": "SUM"
}
}
]
[{"entity":"*","metric":"m-1","tags":{},"entities":["e-1","e-2"],"type":"HISTORY",
"aggregate":{"type":"COUNT","period":{"count":10,"unit":"SECOND","align":"CALENDAR"}},
"group":{"type":"SUM"},
"data":[
{"d":"2016-06-25T08:00:00Z","v":3.0},
{"d":"2016-06-25T08:00:10Z","v":3.0},
{"d":"2016-06-25T08:00:30Z","v":2.0},
{"d":"2016-06-25T08:00:40Z","v":2.0},
{"d":"2016-06-25T08:00:50Z","v":1.0}
]}]
Group -> Aggregation
The Group -> Aggregation
merges series first, and then splits the merged series into periods.
At the first stage, grouping produces the following SUM
series:
[
{
"startDate": "2016-06-25T08:00:00Z",
"endDate": "2016-06-25T08:01:00Z",
"entities": ["e-1", "e-2"],
"metric": "m-1",
"group": {
"type": "SUM"
}
}
]
| datetime | e1.value | e2.value | SUM |
|----------------------|----------|----------|-----|
| 2016-06-25T08:00:00Z | 1 | 11 | 12 |
| 2016-06-25T08:00:05Z | 3 | - | 3 |
| 2016-06-25T08:00:10Z | 5 | - | 5 |
| 2016-06-25T08:00:15Z | 8 | 8 | 16 |
| 2016-06-25T08:00:30Z | 3 | 13 | 16 |
| 2016-06-25T08:00:45Z | 5 | 15 | 20 |
| 2016-06-25T08:00:59Z | - | 19 | 19 |
The grouped SUM
series is then aggregated into periods.
Note that if period is not specified, the
group
function automatically applies aggregation for the same period as theaggregate
function.
To avoid this, specify"period": {"count": 1, "unit": "MILLISECOND"}
ingroup
.
[
{
"startDate": "2016-06-25T08:00:00Z",
"endDate": "2016-06-25T08:01:00Z",
"entities": ["e-1", "e-2"],
"metric": "m-1",
"aggregate": {
"type": "COUNT",
"period": {"count": 10, "unit": "SECOND"}
},
"group": {
"type": "SUM",
"period": {"count": 1, "unit": "MILLISECOND"}
}
}
]
| datetime | COUNT(SUM(value)) |
|----------------------|-------------------|
| 2016-06-25T08:00:00Z | 2 |
| 2016-06-25T08:00:10Z | 2 |
| 2016-06-25T08:00:30Z | 1 |
| 2016-06-25T08:00:40Z | 1 |
| 2016-06-25T08:00:50Z | 1 |
[{"entity":"*","metric":"m-1","tags":{},"entities":["e-1","e-2"],"type":"HISTORY",
"aggregate":{"type":"COUNT","period":{"count":10,"unit":"SECOND","align":"CALENDAR"}},
"group":{"type":"SUM","period":{"count":1,"unit":"MILLISECOND","align":"CALENDAR"}},
"data":[
{"d":"2016-06-25T08:00:00Z","v":2.0},
{"d":"2016-06-25T08:00:10Z","v":2.0},
{"d":"2016-06-25T08:00:30Z","v":1.0},
{"d":"2016-06-25T08:00:40Z","v":1.0},
{"d":"2016-06-25T08:00:50Z","v":1.0}
]}]
Place
The place
setting specifies how to place grouped series into subgroups based on constraints and the objective function.
The operation supports two placement types: partitioning and packing.
Place Fields
Name | Type | Description |
---|---|---|
count | integer | Maximum number of subgroups. The response can contain less than the maximum. |
constraint | string | Boolean expression that series in each subgroup must satisfy, for example max() < 10 . |
minimize | string | Objective function calculated for each subgroup. Partitioning into subgroups is performed to minimize the sum of function values. |
method | string | The algorithm used to solve the packing problem. Allowed values: greedy_simple (default), or greedy_correlations . |
entities | array | Entities containing group-specific information using entity tags with numeric characteristics of a subgroup. |
entityGroup | string | Alternative to entities setting using an entity group name. |
entityExpression | string | Alternative to entities setting using an expression. |
Functions Available in the Place Context
stdev()
median_abs_dev()
max()
min()
sum()
count()
median()
avg()
percentile(p)
Partitioning
The algorithm finds the best partitioning of a given group of time series into no more than the specified number of subgroups.
The best combination is determined such that each group satisfies the constraint
condition and the objective function value is minimal. Objective function is the sum of values of the minimize
formula for all subgroups.
The maximum number of subgroups is limited by the count
setting.
To find the exact solution the algorithm iterates over all possible combinations, and calculates the objective function for each combination as follows:
Create aggregated series for each subgroup, using the specified
type
,period
,interpolate
, andtruncate
settings.Check boolean
constraint
expression for each aggregated series. Discard the partitions for which theconstraint
isfalse
for one of the subgroups. Theconstraint
expression can include statistical functions applied to aggregated series.For each subgroup calculate the value of the
minimize
expression, which can also reference the same statistical functions.Calculate the value of the objective function as sum of values calculated at the previous step.
The combination with the lowest objective function value and the smallest number of subgroups is selected as the best partitioning.
Partitioning Algorithm Complexity
The algorithm checks all combinations of the N
input series subject to the count
limit. The total number of combinations equals S(N, 1) + S(N, 2) + ... + S(N, count)
where S
is the Stirling number.
This sum grows extremely fast, for example, there are over 16
trillion combinations to check if the number of input series is 20
and they are allocated to no more than 7
subgroups.
The implementation expects N
to not exceed 64
, and throws an error if number of partitions exceeds 1,000,000
.
Query Example
[{
"startDate": "2019-02-01T00:00:00Z",
"endDate": "2019-08-08T00:00:00Z",
"entity": "*",
"metric": "cpu_usage",
"group": {
"type": "SUM",
"period": {"count": 1, "unit": "HOUR"},
"interpolate": {"extend": false, "type": "NONE"},
"place": {
"count": 3,
"constraint": "max() < 10",
"minimize": "stdev()"
}
},
"limit": 2
}]
Response
The response contains information about each subgroup: grouped series keys, the minimize
expression value for the subgroup as groupScore
, the objective function value as totalScore
, and aggregated series for the subgroup as the data
.
[
{
"entity": "*",
"metric": "lp_usage",
"exactMatch": false,
"type": "HISTORY",
"meta": {
"groupScore": 0.0021842975376114887,
"series": [
{"entity": "lp_1", "tags": {}, "label": "lp_1"},
{"entity": "lp_2", "tags": {}, "label": "lp_2"},
{"entity": "lp_3", "tags": {}, "label": "lp_3"}
],
"name": "lp_1 lp_2 lp_3",
"totalScore": 0.0021842975376114887
},
"transformationOrder": ["GROUP"],
"group": {
"series": [
{"entity": "lp_1", "tags": {}, "label": "lp_1"},
{"entity": "lp_2", "tags": {}, "label": "lp_2"},
{"entity": "lp_3", "tags": {}, "label": "lp_3"}
],
"groupScore": 0.0021842975376114887,
"totalScore": 0.0021842975376114887,
"type": "SUM",
"period": {"count": 1, "unit": "HOUR", "align": "CALENDAR"},
"interpolate": {"type": "NONE", "extend": false},
"truncate": false,
"place": {"count": 3, "constraint": "max() < 10", "minimize": "stdev()"}
},
"data": [
{"d": "2019-02-07T22:00:00.000Z", "v": 4.998999189242195},
{"d": "2019-02-07T23:00:00.000Z", "v": 5.000519124198814}
]
},
{
"entity": "*",
"metric": "lp_usage",
"exactMatch": false,
"type": "HISTORY",
"meta": {
"groupScore": 0,
"series": [
{"entity": "lp_6", "tags": {}, "label": "lp_6"},
{"entity": "lp_7", "tags": {}, "label": "lp_7"}
],
"name": "lp_6 lp_7",
"totalScore": 0.0021842975376114887
},
"transformationOrder": ["GROUP"],
"group": {
"series": [
{"entity": "lp_6", "tags": {}, "label": "lp_6"},
{"entity": "lp_7", "tags": {}, "label": "lp_7"}
],
"groupScore": 0,
"totalScore": 0.0021842975376114887,
"type": "SUM",
"period": {"count": 1, "unit": "HOUR", "align": "CALENDAR"},
"interpolate": {"type": "NONE", "extend": false},
"truncate": false,
"place": {"count": 3, "constraint": "max() < 10", "minimize": "stdev()"}
},
"data": [
{"d": "2019-02-07T22:00:00.000Z", "v": 5},
{"d": "2019-02-07T23:00:00.000Z", "v": 5}
]
},
{
"entity": "*",
"metric": "lp_usage",
"exactMatch": false,
"type": "HISTORY",
"meta": {
"groupScore": 0,
"series": [
{"entity": "lp_4", "tags": {},"label": "lp_4"},
{"entity": "lp_5", "tags": {}, "label": "lp_5"}
],
"name": "lp_4 lp_5",
"totalScore": 0.0021842975376114887
},
"transformationOrder": ["GROUP"],
"group": {
"series": [
{"entity": "lp_4", "tags": {},"label": "lp_4"},
{"entity": "lp_5", "tags": {}, "label": "lp_5"}
],
"groupScore": 0,
"totalScore": 0.0021842975376114887,
"type": "SUM",
"period": {"count": 1, "unit": "HOUR", "align": "CALENDAR"},
"interpolate": {"type": "NONE", "extend": false},
"truncate": false,
"place": {"count": 3, "constraint": "max() < 10", "minimize": "stdev()"}
},
"data": [
{"d": "2019-02-07T22:00:00.000Z", "v": 5},
{"d": "2019-02-07T23:00:00.000Z", "v": 5}
]
}
]
Packing
The packing algorithm divides series into the smallest number of subgroups, where each subgroup satisfies the constraint
.
The objective function is ignored by the packing algorithm, but the constraint
expression can use numeric parameters specific for each subgroup, and numeric parameters specific for each series.
Each entity specified in the entities
, or entityGroup
, or entityExpression
holds numeric parameters of single subgroup as values of its tags. These values are available in the constraint
expression by the tag names.
Subgroup Parameters Example
Assume the following entities with tags cpu_limit
and memory_limit
.
| entity name/tags | cpu_limit | memory_limit |
|------------------|-----------|--------------|
| lp_group_1 | 10 | 20 |
| lp_group_2 | 8 | 40 |
| lp_group_3 | 12 | 10 |
| lp_group_4 | 14 | 20 |
| lp_group_5 | 8 | 10 |
The constraint
expression can reference the cpu_limit
variable which resolves to the cpu_limit
entity tag for the given entity.
"constraint": "max() < cpu_limit"
When the above expression is evaluated for the subgroup corresponding to the entity lp_group_3
, the value of cpu_limit
is 12
.
Each series can also store numeric parameters in series entity tags.
To reference these parameters in the constraint
expression use tag name prepended by the series.
prefix.
Series Parameters Example
Following series entities have tag memory_allocated
:
| entity name/tags | memory_allocated |
|------------------|------------------|
| lp_1 | 4 |
| lp_2 | 4 |
| lp_3 | 4 |
| lp_4 | 8 |
| lp_5 | 4 |
| lp_6 | 2 |
| lp_7 | 4 |
The following constraint
checks that sum of memory_allocated
parameters of all series in a subgroup is less than 14:
"constraint": "sum(series.memory_allocated) < 14"
Subgroup and Series Parameters Example
Series with entities lp_2, lp_3, lp_6
are placed in a group associated with entity lp_group_1
. Consider the constraint:
"constraint": "max() < cpu_limit && sum(series.memory_allocated) < memory_limit"
It checks that maximum value of aggregated series computed for lp_2, lp_3, lp_6
is less than cpu_limit
which equals 10
, and sum of memory allocated to each series sum(series.memory_allocated) = 4 + 4 + 2
is
less that memory_limit
which equals 20
.
Packing Algorithm
The exact solution of the packing algorithm requires checking
combinations where N
is the number of series and K
is the maximum number of subgroups.
This is computationally unfeasible if N
and K
are sufficiently large.
As a workaround, the database applies the following heuristics:
- Sort entities by name.
- Check each entity, and build a subgroup for the given entity.
- If all entities are checked and there are still unassigned series, place them into a special subgroup.
The subgroup containing unassigned series is formed depends on the method
setting.
greedy_simple
Method The method sorts series lexicographically by key which contains entity and tags.
Unassigned series are iterated and inserted into the current group if the constraint
is not violated.
greedy_correlations
Method Select the first series in the subgroup. Iterate over unassigned series sorted by series name and select the first series that satisfies the
constraint
.For series that are already assigned to subgroups calculate the aggregation function, specified in the
type
parameter. Calculate correlations between assigned series and unassigned series (both aggregated). Sort unassigned series by correlation in the ascending order. Iterate over unassigned series starting with series with the lowest correlation, and add the series to the subgroup. With the series added, repeat all calculations: recalculate aggregated series, the correlations coefficients, and add next series to the subgroup. The step is completed if no more series can be added to the subgroup.
Approximate Query Example
[{
"startDate": "2019-02-01T00:00:00Z",
"endDate": "2019-02-08T00:00:00Z",
"entity": "*",
"metric": "lp_usage",
"group": {
"type": "SUM",
"period": {"count": 1, "unit": "HOUR"},
"interpolate": {"extend": true, "type": "LINEAR"},
"place": {
"entities": ["lp_group_1", "lp_group_2", "lp_group_3", "lp_group_4", "lp_group_5"],
"constraint": "max() < cpu_limit && sum(series.memory_allocated) < memory_limit",
"method": "greedy_correlations"
}
}
}]
Approximate Query Response
[
{
"metric": "lp_usage",
"entity": "*",
"exactMatch": false,
"type": "HISTORY",
"meta": {
"groupScore": 0,
"series": [
{"entity": "lp_1", "tags": {}, "label": "lp_1"},
{"entity": "lp_2", "tags": {}, "label": "lp_2"},
{"entity": "lp_3", "tags": {}, "label": "lp_3"},
{"entity": "lp_5", "tags": {}, "label": "lp_5"}
],
"name": "lp_1 lp_2 lp_3 lp_5",
"group-entity": "lp_group_1",
"grouping-tags": {
"cpu_limit": "10",
"memory_limit": "20"
},
"totalScore": 0
},
"transformationOrder": ["GROUP"],
"group": {
"series": [
{"entity": "lp_1", "tags": {}, "label": "lp_1"},
{"entity": "lp_2", "tags": {}, "label": "lp_2"},
{"entity": "lp_3", "tags": {}, "label": "lp_3"},
{"entity": "lp_5", "tags": {}, "label": "lp_5"}
],
"entity": "lp_group_1",
"tags": {"cpu_limit": "10", "memory_limit": "20"},
"type": "SUM",
"period": {"count": 1, "unit": "HOUR", "align": "CALENDAR"},
"interpolate": {"type": "LINEAR", "extend": true},
"truncate": false,
"place": {
"count": 0,
"constraint": "max() < cpu_limit && sum(series.memory_allocated) < memory_limit"
}
},
"data": [
{
"d": "2019-02-07T23:00:00.000Z",
"v": 9.000519124198814
},
{
"d": "2019-02-08T00:00:00.000Z",
"v": 9.000519124198814
}
]
},
...
]