Weekly Change Log: December 19-25, 2016
|3737||sql||Bug||Fixed issue with long scan, followed by a timeout for an entity that does not collect the specified metric.|
|3735||sql||Bug||Math functions do not accept arithmetic expressions in the |
|3731||api-rest||Bug||Fixed issue with property queries (|
|3729||api-rest||Bug||Updated error URL and message text for requests to non-existent URLs.|
|3727||api-network||Feature||Optimized TCP handler for faster processing of |
|3725||sql||Bug||Optimized queries with |
|3719||sql||Feature||Optimized windowing queries by narrowing the requested timespan based on minimum last insert date.|
|3718||UI||Bug||Changed metric form to prevent users from saving metric names without metric name validation.|
|3715||UI||Feature||Updated styles on the account create page displayed post-installation.|
|3714||UI||Bug||Fixed errors with the Decimal Precision input in the SQL Console.|
|3713||sql||Bug||Fixed number format error raised in queries with the |
|3703||sql||Feature||Added capability to display first/last sample time in windowing queries.|
|3697||sql||Feature||Modified processing sequence so that the |
|3696||sql||Bug||Fixed issue with |
|3694||sql||Bug||Optimized execution of |
|3693||UI||Bug||Fixed navigation issue between Query console and Query Plan.|
|3689||sql||Feature||Added support for the |
|3687||UI||Bug||Corrected User Group link on the Admin menu.|
|3672||sql||Feature||Added new details to query plan: start and end dates for each HBase scan and scans to atsd_li table.|
|3421||sql||Feature||Implemented the |
|3732||core||Feature||Modified Collector start-up routine to wait until ATSD is ready when running jobs from the |
|3724||core||Feature||Created a |
|3723||data-source||Bug||Added missing Avatica package dependencies to the ATSD JDBC driver.|
|3686||core||Support||Added a list of pre-configured jobs and their xml files here.|
|3571||admin||Bug||Modified Dockerfile to speed up Collector application startup at the expense of a slight larger image size.|
SELECT entity, tags.*, value, datetime FROM disk_used WHERE datetime > now - 1 * day AND entity = 'nurswgvml501' WITH ROW_NUMBER(entity, tags ORDER BY time DESC) <= 1
SELECT entity, avg(value), ABS((last(value) / avg(value) - 1)*100) FROM cpu_busy WHERE datetime > previous_minute AND ABS(value) > 0 GROUP BY entity HAVING ABS((last(value) / avg(value) - 1)*100) > 0
The TCP handler, running on the default port 8081, was optimized for faster processing of
series commands streamed by a single TCP client. The new implementation provides a pool of
threads instead of a single one to offload parsing and processing from the TCP handler. The size of the pool is controlled with the
series.processing.pool.size parameter on the
Settings > Server Properties page. The default value is 2 and is recommended to be set to the number of cores on the server.
As a result, the TCP processing and parsing throughput (measured in commands per second) has increased by 40% on average.
Previously, execution of queries with the
LIMIT clause involved copying selected rows into a temporary table, even if only a small subset of the rows, restricted with
LIMIT, was required.
DESC ordered results were optimized by reducing the number of rows copied into a temporary table. The following queries experienced a 90% speedup in execution time.
SELECT * FROM mpstat.cpu_busy tot ORDER BY datetime DESC LIMIT 10
SELECT * FROM mpstat.cpu_busy tot ORDER BY datetime LIMIT 10
We added an optimization to narrow the start date in windowing queries which is now determined as the minimum (last insert date) for all series. Prior to this change, the start date was set to 0 (not applied) if it was not specified explicitly in the query.
SELECT date_format(time, 'yyyy-MM-dd') as 'date', tags.city, tags.state, sum(value) FROM "dmv.incidents" GROUP BY entity, tags, datetime WITH time = last_time ORDER BY sum(value) desc
The above query scans data with a start date determined as the earliest of all the matching series. For example, if you have 3 series with the following last insert dates:
- A - 2016-12-21
- B - 2016-10-05
- C - 2016-12-20
The SQL optimizer adds a condition
AND datetime >= '2016-20-05T00:00:00Z', even if the interval condition is not included in the query.
SELECT tot.datetime, tot.tags.city as 'city', tot.tags.state as 'state', tot.value - t1.value - t24.value - t44.value - t64.value - t64o.value as 'other_deaths', t1.value as 'infant_deaths', t24.value as '1-24_deaths', t44.value as '25-44_deaths', t64.value as '45-64_deaths', t64o.value as '64+_deaths', tot.value as 'all_deaths' FROM cdc.all_deaths tot JOIN cdc._1_year t1 JOIN cdc._1_24_years t24 JOIN cdc._25_44_years t44 JOIN cdc._54_64_years t64 JOIN cdc._65_years t64o WHERE tot.entity = 'mr8w-325u' AND tot.tags.city = 'New York' AND tot.datetime > '2016-08-27T00:00:00Z' AND (tot.value - t1.value - t24.value - t44.value - t64.value - t64o.value) != 0 OPTION (ROW_MEMORY_THRESHOLD 500000)
Now aggregate functions such as
DELTA can be applied to the
time column, which returns the sampling time in Unix milliseconds.
One of the use cases is to display the most recent time in windowing queries where the last_time function can be utilized to select data for a sliding interval, such as the most recent 4 weeks for each series in the example below.
SELECT tags.city, tags.state, sum(value), date_format(max(time)) as Last_Date FROM dmv.incidents GROUP BY entity, tags WITH time > last_time - 2*week ORDER BY max(time)
| tags.city | tags.state | sum(value) | Last_Date | |--------------|------------|------------|--------------------------| | Fort Worth | TX | 411 | 2009-01-31T00:00:00.000Z | | Philadelphia | PA | 53882 | 2012-11-24T00:00:00.000Z | | Pittsburgh | PA | 38926 | 2015-06-27T00:00:00.000Z | | New Haven | CT | 13311 | 2016-07-16T00:00:00.000Z | | Washington | DC | 41937 | 2016-08-06T00:00:00.000Z |
SELECT date_format(period(1 MONTH)), count(value) FROM dmv.incidents WHERE tags.city = 'Boston' AND datetime >= '2016-09-01T00:00:00Z' AND datetime < '2016-12-01T00:00:00Z' GROUP BY period(1 MONTH, VALUE 0) HAVING count(value) > 0 ORDER BY datetime
Assuming there were no detailed records for this series in October 2016, the current result looks as follows:
| date_format(period(1 MONTH)) | sum(value) | count(value) | |------------------------------|------------|--------------| | 2016-09-01T00:00:00.000Z | 537.0 | 4.0 | | 2016-11-01T00:00:00.000Z | 234.0 | 4.0 |
| date_format(period(1 MONTH)) | sum(value) | count(value) | |------------------------------|------------|--------------| | 2016-09-01T00:00:00.000Z | 537.0 | 4.0 | | 2016-10-01T00:00:00.000Z | 0.0 | 0.0 | <- this period was added by interpolation set in period(1 MONTH, VALUE 0), after HAVING. | 2016-11-01T00:00:00.000Z | 234.0 | 4.0 |
SELECT date_format(period(1 MONTH)), sum(value), count(value) FROM cdc.all_deaths tot WHERE tags.city = 'Boston' AND datetime >= '2016-01-01T00:00:00Z' GROUP BY period(1 MONTH) HAVING count(value) >= 4 ORDER BY 1
The query optimizer was modified to apply tag filter specified in
JOIN queries on one of the tables to the remaining tables, since
JOINs in ATSD perform merging of rows on time, entity, and series tags anyway. Prior to this change, the tag filter
was applied only to those tables where the filter was set explicitly.
Implemented the special
SELECT 1 query, which is typically used to test connectivity and validate open
connections in the shared connection pool in active state.
SQL Query Plan is used for diagnosing slow query response times. The plan was extended to:
- Display start and end dates for each HBase scan.
- Display scans to the atsd_li (last insert) table, which are used to add additional filters and to determine optimal query plan.
Implemented the LOOKUP function, which translates the key into a value using the specified replacement table.
The primary purpose of a replacement table is to act as a dictionary for decoding series tags/values.
SELECT datetime, entity, value, LOOKUP('tcp-status-codes', value) FROM 'docker.tcp-connect-status' WHERE datetime > now - 5 * MINUTE AND LOOKUP('tcp-status-codes', value) NOT LIKE '*success*' LIMIT 10
| datetime | entity | value | LOOKUP('tcp-status-codes',value) | |--------------------------|----------|-------|----------------------------------| | 2016-12-28T13:06:11.085Z | 1f4faa42 | 1 | Connection Error | | 2016-12-28T13:06:11.085Z | 131b6339 | 1 | Connection Error | | 2016-12-28T13:06:11.085Z | 37dc00da | 2 | No Route To Host |
searched case variant of the CASE expression.
CASE expression evaluates a sequence of boolean expressions and returns a matching result expression.
CASE WHEN search_expression THEN result_expression [ WHEN search_expression THEN result_expression ] [ ELSE result_expression ] END
Refer to examples for additional information.
SELECT entity, avg(value), CASE WHEN avg(value) < 20 THEN 'under-utilized' WHEN avg(value) > 80 THEN 'over-utilized' ELSE 'right-sized' END AS 'Utilization' FROM cpu_busy WHERE datetime > current_hour GROUP BY entity
| entity | avg(value) | Utilization | |--------------|------------|----------------| | nurswgvml006 | 6.2 | under-utilized | | nurswgvml007 | 80.8 | over-utilized | | nurswgvml010 | 3.8 | under-utilized |