Diurnal Queries
The date_format function can be used in the WHERE
, GROUP BY
, and HAVING
clauses to filter and group dates by month name, day name, or hour number.
MMM
: short, 3-letter, month name, for example, JanMMMM
: full month name, for example Januaryeee
: short, 3-letter, weekday name, for example Sateeeee
: full weekday name, for example Saturdayu
: weekday number, starting with 1 for MondayHH
: hour of the day, 2 digit, 00 to 23.
For additional patterns, refer to time patterns.
Example: Filter by calendar
In this example we calculate the totals for the month of January over a 7 year period to determine a range of pneumonia deaths recorded in East-North-Central US. To retain only samples recorded in January, we use the date_format(time, 'MMM') = 'Jan'
condition.
SELECT date_format(time, 'yyyy MMM') AS "date",
LOOKUP('us-region', tags.region) AS "region",
sum(value) AS "pneumonia_influenza_deaths"
FROM "cdc.pneumonia_and_influenza_deaths"
WHERE entity = 'mr8w-325u' AND tags.region = '3'
AND datetime >= '2010-01-01T00:00:00Z'
AND date_format(time, 'MMM') = 'Jan'
GROUP BY tags.region, period(1 MONTH)
ORDER BY 3 DESC
Results
| date | region | pneumonia_influenza_deaths |
|----------|--------------------|----------------------------|
| 2015 Jan | East-North-Central | 1203 |
| 2013 Jan | East-North-Central | 898 |
| 2016 Jan | East-North-Central | 732 |
| 2011 Jan | East-North-Central | 730 |
| 2014 Jan | East-North-Central | 722 |
| 2010 Jan | East-North-Central | 691 |
| 2012 Jan | East-North-Central | 641 |
Example: Daily Averages
To calculate averages or totals by day of the week, use date_format(time, 'eee')
or date_format(time, 'u')
functions.
The eee
pattern returns short day name for each sample: Mon
, Tue
, Wed
, Thu
, Fri
, Sat
, Sun
, whereas u
pattern returns day number starting with 1
for Monday.
SELECT date_format(time, 'u') AS day_of_week, avg(value) AS average
FROM "mpstat.cpu_busy"
WHERE datetime >= previous_week
GROUP BY date_format(time, 'u')
ORDER BY date_format(time, 'u')
SELECT substr(date_format(time, 'u-eee'), 3) AS day_of_week, avg(value) AS average
FROM "mpstat.cpu_busy"
WHERE datetime >= previous_week
GROUP BY date_format(time, 'u-eee')
ORDER BY date_format(time, 'u-eee')
| day_of_week | average |
|-------------|---------|
| Mon | 5.73 |
| Tue | 7.05 |
| Wed | 7.36 |
| Thu | 7.50 |
| Fri | 8.48 |
| Sat | 6.08 |
| Sun | 5.97 |
Example: Diurnal Seasonality
By grouping samples by hour of the day (regardless which day it is) you can create diurnal charts which show changes in activity throughout the day.
SELECT date_format(time, 'HH') AS "hour_in_day",
avg(value)
FROM "mpstat.cpu_busy"
WHERE datetime >= current_month
GROUP BY date_format(time, 'HH')
ORDER BY 1
Results
| hour_in_day | avg(value) |
|-------------|------------|
| 0 | 5.0 |
| 1 | 22.2 |
| 2 | 6.3 |
| 3 | 9.3 |
| 4 | 5.3 |
| 5 | 4.7 |
| 6 | 5.4 |
| 7 | 6.2 |
| 8 | 6.7 |
| 9 | 7.1 |
| 10 | 6.9 |
| 11 | 6.8 |
| 12 | 6.0 |
| 13 | 5.9 |
| 14 | 6.1 |
| 15 | 6.8 |
| 16 | 6.5 |
| 17 | 5.8 |
| 18 | 5.3 |
| 19 | 5.9 |
| 20 | 6.2 |
| 21 | 4.5 |
| 22 | 8.0 |
| 23 | 5.6 |
Example: Weekly Diurnal Seasonality
The weekly diurnal charts take day of week into account and can be used, for example, to calculate both weekly seasonality, as well as weekly highs and lows using different columns in the ORDER clause.
SELECT concat(date_format(time, 'eeeee, HH'), ':00') AS "day, hour",
avg(value)
FROM "mpstat.cpu_busy"
WHERE datetime >= current_week
AND date_format(time, 'HH') >= '09' AND date_format(time, 'HH') < '18'
GROUP BY date_format(time, 'eeeee HH')
ORDER BY 2 DESC
Results
| day, hour | avg(value) |
|------------------|------------|
| Wednesday, 14:00 | 18.70 |
| Tuesday, 14:00 | 12.99 |
| Wednesday, 09:00 | 12.53 |
| Wednesday, 16:00 | 12.29 |
| Tuesday, 13:00 | 11.50 |
| Wednesday, 15:00 | 10.71 |
| Monday, 16:00 | 10.12 |
| Thursday, 09:00 | 9.63 |
Example: Numeric Comparison
The above example relies on the lexicographical comparison of 2-digit hours strings.
The following alternative utilizes the CAST
function to convert the date_format
output to numbers to filter parts of the day.
SELECT date_format(time, 'eee HH') AS "hour_in_day",
percentile(75, value)
FROM "mpstat.cpu_busy"
WHERE datetime >= previous_week
AND CAST(date_format(time, 'H') AS number) >= 9 AND CAST(date_format(time, 'H') AS number) < 18
GROUP BY date_format(time, 'eee HH')
ORDER BY 2 DESC
| hour_in_day | percentile(75,value) |
|-------------|----------------------|
| Tue 09 | 41.8 |
| Tue 10 | 41.2 |
| Mon 12 | 40.7 |
| Mon 16 | 40.5 |
| Mon 13 | 40.4 |