SQL Query Performance
Monitoring query execution is an important administrative task to optimize database performance, to identify and prevent expensive and long-running queries, and to provide feedback to end-users and application developers.
The database keeps track of query executions including detailed statistics in an in-memory structure. The list of running and completed queries is available on the SQL Query Statistics page.
The list can be filtered by user, source, status, query part, and elapsed time. Additional information about the query is displayed on the query detail page.
Users with an
ADMIN role are authorized to view and cancel all queries whereas non-administrative users are restricted to viewing and cancelling only their own queries.
Query Detail Fields:
| ||New, Running, Completed, Error, Cancelled.|
| ||api, console, scheduled.|
| ||Name of the user who initiated the query.|
For API clients, username specified in login credentials.
| ||Unique query identifier.|
| ||Query statement text.|
| ||Query start time.|
| ||Time elapsed between start time and completion (or current) time.|
| ||Number of rows returned to the client.|
| ||Number of time:value pairs.|
| ||Number of HBase rows.|
| ||Number of bytes in Result objects from HBase region servers.|
| ||Number of bytes in Result objects from remote region servers.|
| ||Total number of milliseconds spent between sequential scan.next() calls.|
| ||Number of RPC calls.|
| ||Number of remote RPC calls.|
| ||Number of RPC retries.|
| ||Number of remote RPC retries.|
| ||Number of regions scanned.|
| ||Number of |
A running query can be cancelled at any time, for example if its is execution time is longer than expected.
When a query is cancelled results are not returned to the client and the query is terminated with an error.
A query submitted via the
/api/sql endpoint can be cancelled by submitting a request to
/api/sql/cancel?queryId=myid url and referencing the user-defined handle with the
Queries executed by the database are recorded in the main application log
atsd.log at the INFO level.
Each query is assigned a unique identifier for correlating starting and closing events.
2017-08-15 18:44:01,183;INFO;qtp1878912978-182;com.axibase.tsd.service.sql.SqlQueryServiceImpl;Starting sql query execution. [uid=218], user: user003, source: scheduled, sql: SELECT entity, AVG(value) AS "Average", median(value), MAX(value), count(*), percentile(50, value), percentile(75, value), percentile(90, value), percentile(99, value) FROM "mpstat.cpu_busy" WHERE time BETWEEN PREVIOUS_DAY and CURRENT_DAY GROUP BY entity ORDER BY AVG(value) DESC 2017-08-15 18:44:02,369;INFO;qtp1878912978-182;com.axibase.tsd.service.sql.SqlQueryServiceImpl;Sql query execution took 1.19 s, rows returned 7. [uid=218], user: user003, sql: SELECT entity, AVG(value) AS "Average", median(value), MAX(value), count(*), percentile(50, value), percentile(75, value), percentile(90, value), percentile(99, value) FROM "mpstat.cpu_busy" WHERE time BETWEEN PREVIOUS_DAY and CURRENT_DAY GROUP BY entity ORDER BY AVG(value) DESC
Query Control Messages
Execution events are also stored as messages with type=
sql and source=
api|console|scheduled for monitoring query performance using the built-in Rule Engine.
The following message tags are available for filtering and grouping:
| ||Unique query id which is reset on application restart.|
| ||Result set format: csv, json, html.|
| ||User initiating the query.|
| ||Query text.|
Messages for scheduled queries include additional tags
The most efficient query path is metric+entity+date+tags.
Query execution speed can be improved by adopting the following guidelines for the
- Specify start time and end time whenever possible to limit the time range.
- Specify entity name whenever possible to avoid a scan of all rows in the virtual table.
Consider the following recommendations when developing queries:
- Pre-test queries on a smaller dataset in an ATSD-development instance.
- Avoid queries without any conditions. Apply
LIMITto reduce the number of rows returned.
- Add the
WHEREclause. Include as many conditions to the
WHEREclause as possible, in particular add entity and interval conditions.
WHEREconditions narrow and specific, for example, specify a small time interval.
- Avoid the
ORDER BYclause since it may cause a full scan and a copy of data to a temporary table.
- Add the
LIMIT 1clause to reduce the number of rows returned. Note that
LIMITdoes not prevent expensive queries with
GROUP BYclauses because
LIMITis applied to final results and not to the number of rows read from the database.
- Develop a simple query first. Adjust conditions gradually as you inspect the results. Add grouping, partitioning, and ordering to finalize the query.
Optimizing Interval Queries
EXTRACT functions in the
WHERE condition and the
GROUP BY clause is not efficient as it causes the database to perform a full scan while comparing literal strings or numbers.
Instead, filter dates using the indexed
datetime column and apply the
PERIOD function to aggregate records by interval.
- Slow full scan with string or number comparison.
WHERE date_format(time, 'yyyy') > '2018' WHERE YEAR(time) > 2018
- Fast date range scan using an indexed column.
WHERE datetime >= '2018' WHERE datetime >= '2018-01-01T00:00:00Z'
GROUP BY date_format(time, 'yyyy') GROUP BY YEAR(time)
GROUP BY PERIOD(1 YEAR)
To assist in inspecting query results by selecting a subset of rows from large series, the following queries have been locally optimized to provide improved performance with
LIMIT 1 clause:
SELECT * FROM "mpstat.cpu_busy" LIMIT 1
SELECT * FROM "mpstat.cpu_busy" ORDER BY datetime LIMIT 1
SELECT * FROM "mpstat.cpu_busy" WHERE entity = 'nurswgvml007' ORDER BY datetime LIMIT 1
SELECT * FROM "mpstat.cpu_busy" ORDER BY time DESC LIMIT 1
SELECT * FROM "mpstat.cpu_busy" ORDER BY datetime DESC LIMIT 1
SELECT * FROM "mpstat.cpu_busy" WHERE datetime >= CURRENT_DAY ORDER BY time DESC LIMIT 1
SELECT * FROM "mpstat.cpu_busy" WHERE entity = 'nurswgvml007' ORDER BY datetime DESC LIMIT 1