Cell References in SQL and Excel
Overview
This article contains examples of how analytic functions in SQL can be used to emulate step-by-step calculations in Microsoft Excel.
Staged Calculations in Excel
To assist users in organizing complex analysis into step-by-step calculations, Excel provides a convenient A1
addressing notation, or reference style, to pass calculation results between cells by value:
Columns are assigned letter names, starting with
A
for the left-most column.Rows are assigned ordinal numbers, starting with
1
for the top row.=B2
returns the value of cell located in columnB
and row2
.=B3 - B2
returns the difference between values of cellsB3
andB2
.
Users can calculate the year-on-year change by entering =B3 - B2
formula as the value of the current cell.
Such references are relative, and the row index is updated behind the scenes when you copy the cell to fill the Y-o-Y Change, $M
column.
To fix the row in the address, use $
prefix, for example =B3 - B$2
. This makes it possible to calculate change in sales since 2010, for example.
Analytic Functions in SQL
To achieve the same results in SQL we need to first cover a special class of functions called analytic functions.
The analytic functions in SQL allow access to grouped statistics without reducing the number of rows returned by the query.
SELECT value,
-- SUM(value) below is calculated without GROUP BY
value/SUM(value) OVER() AS weight
FROM table-name ORDER BY weight DESC
Often called windowing functions due to their ability to operate on ordered and partitioned groups of records, analytic functions are different from aggregate functions which reduce multiple rows in the same group into a single result row.
Many aggregate functions such as SUM
, AVG
, and COUNT
can be invoked as analytic functions using the OVER
clause. However the class of analytic functions also includes reference functions which operate on an ordered set of records. Such reference functions include:
LAG
. Provides access to a previous row at a specified offset from the current position.LEAD
. Provides access to a following row at a specified offset from the current position.FIRST_VALUE
. Provides access to the first row.LAST_VALUE
. Provides access to the last row.
Documentation Links
- Excel
cell referencing
- ATSD
LAG
function - Oracle
LAG
function
Referencing in SQL
The purpose of the LAG
and LEAD
functions in SQL is similar - to access a column value in a preceding or following row.
Assuming the same dataset is loaded in the database, it can be queried with a SELECT
statement as follows:
SELECT date_format(time, 'yyyy') AS "Year",
value AS "Sales Volume, $M"
FROM "win-sales"
ORDER BY time
| Year | Sales Volume, $M |
|------|------------------|
| 2010 | 50 |
| 2011 | 52 |
| 2012 | 55 |
| 2013 | 55 |
| 2014 | 70 |
| 2015 | 72 |
| 2016 | 70 |
| 2017 | 75 |
| 2018 | 80 |
Relative References
Similar to Excel, the reference in LAG
function consists of the column name and the row offset, which starts with 1
by default. Excel assigns column names automatically, starting with A
(up to 16,384 are allowed). In SQL, the function must to refer to the name of an existing column or alias such as LAG(value)
.
SELECT date_format(time, 'yyyy') AS "Year",
value AS "Sales Volume, $M",
LAG(value) AS "Previous Sales Volume, $M",
value - LAG(value) AS "Y-o-Y Change, $M"
FROM "win-sales"
| Year | Sales Volume, $M | Previous Sales Volume, $M | Y-o-Y Change, $M |
|------|------------------|---------------------------|------------------|
| 2010 | 50 | | |
| 2011 | 52 | 50 | 2 |
| 2012 | 55 | 52 | 3 |
| 2013 | 55 | 55 | 0 |
| 2014 | 70 | 55 | 15 |
| 2015 | 72 | 70 | 2 |
| 2016 | 70 | 72 | -2 |
| 2017 | 75 | 70 | 5 |
| 2018 | 80 | 75 | 5 |
The value - LAG(value)
expression returns the same results as =B3 - B2
in Excel. In this case B
and value
are equivalent column names.
To accomplish the same result in Oracle Database, add OVER
clause after each analytical function.
- Oracle SQL version:
SELECT time AS "Year",
value AS "Sales Volume, $M",
LAG(value) OVER (ORDER BY time) AS "Previous Sales Volume, $M",
value - LAG(value) OVER (ORDER BY time) AS "Y-o-Y Change, $M"
FROM win_sales
Absolute References
FIRST
and LAST
functions can be used to access first and last rows in the partition respectively. This is equivalent to =B3 - B$2
in Excel which calculates the change in sales since 2010.
SELECT date_format(time, 'yyyy') AS "Year",
value AS "Sales Volume, $M",
LAG(value) AS "Previous Sales Volume, $M",
value - LAG(value) AS "Y-o-Y Change, $M",
value - FIRST_VALUE(value) AS "Change since 2010, $M"
FROM "win-sales"
WITH ROW_NUMBER(entity, tags ORDER BY time) > 0
| Year | Sales Volume, $M | Previous Sales Volume, $M | Y-o-Y Change, $M | Change since 2010, $M |
|------|------------------|---------------------------|------------------|-----------------------|
| 2010 | 50 | 0 | 0 | 0 |
| 2011 | 52 | 50 | 2 | 2 |
| 2012 | 55 | 52 | 3 | 5 |
| 2013 | 55 | 55 | 0 | 5 |
| 2014 | 70 | 55 | 15 | 20 |
| 2015 | 72 | 70 | 2 | 22 |
| 2016 | 70 | 72 | -2 | 20 |
| 2017 | 75 | 70 | 5 | 25 |
| 2018 | 80 | 75 | 5 | 30 |
- Oracle SQL version:
SELECT time AS "Year",
value AS "Sales Volume, $M",
LAG(value) OVER (ORDER BY time) AS "Previous Sales Volume, $M",
value - LAG(value) OVER (ORDER BY time) AS "Y-o-Y Change, $M",
value - FIRST_VALUE(value) OVER (ORDER BY time) AS "Change since 2010, $M"
FROM win_sales
Aggregate Functions
Sliding Total
To calculate a running (sliding) total in Excel, one can pass a relative range of cell values, for example: =SUM(B2:B4)
.
To return the same result in SQL one can add LAG
results with increasing offsets.
SELECT date_format(time, 'yyyy') AS "Year",
value AS "Sales Volume, $M",
value + LAG(value) + LAG(value,2) AS "3-Year Sales Volume, $M"
FROM "win-sales"
| Year | Sales Volume, $M | 3-Year Sales Volume, $M |
|------|------------------|-------------------------|
| 2010 | 50 | |
| 2011 | 52 | |
| 2012 | 55 | 157 |
| 2013 | 55 | 162 |
| 2014 | 70 | 180 |
| 2015 | 72 | 197 |
| 2016 | 70 | 212 |
| 2017 | 75 | 217 |
| 2018 | 80 | 225 |
- Oracle SQL version:
SELECT time AS "Year",
value AS "Sales Volume, $M",
value + LAG(value) OVER (ORDER BY time) + LAG(value,2) OVER (ORDER BY time) AS "3-Year Sales Volume, $M"
FROM win_sales
However, if the number of rows is large enough, applying an aggregate function to all values within the fixed-length partition is more efficient.
SELECT date_format(time, 'yyyy') AS "Year",
value AS "Sales Volume, $M",
SUM(value) AS "3-Year Sales Volume, $M"
FROM "win-sales"
WITH ROW_NUMBER(entity, tags ORDER BY time) <= 3
The above query produces the same result using the SUM
function with the size of the sliding window controlled in the ROW_NUMBER
clause.
- Oracle version:
SELECT time AS "Year",
value AS "Sales Volume, $M",
SUM(value) OVER(ORDER BY time ROWS BETWEEN 2 PRECEDING AND CURRENT ROW) AS "3-Year Sales Volume, $M"
FROM win_sales
Growing Total
To calculate a growing total in Excel, set the starting row to an absolute value =SUM(B$2:B4)
.
To calculate a growing total for all values in the column in SQL, create a single partition ordered by time and disable row filtering with ROW_NUMBER > 0
.
SELECT date_format(time, 'yyyy') AS "Year",
value AS "Sales Volume, $M",
SUM(value) AS "Total Volume since 2010, $M"
FROM "win-sales"
WITH ROW_NUMBER(entity, tags ORDER BY time) > 0
| Year | Sales Volume, $M | Total Volume since 2010, $M |
|------|------------------|-----------------------------|
| 2010 | 50 | 50 |
| 2011 | 52 | 102 |
| 2012 | 55 | 157 |
| 2013 | 55 | 212 |
| 2014 | 70 | 282 |
| 2015 | 72 | 354 |
| 2016 | 70 | 424 |
| 2017 | 75 | 499 |
| 2018 | 80 | 579 |
- Oracle SQL version:
SELECT time AS "Year",
value AS "Sales Volume, $M",
SUM(value) OVER (ORDER BY time) AS "Total Volume since 2010, $M"
FROM win_sales
Appendix: Sample Dataset
- ATSD data commands
series e:win-test d:2010-01-01T00:00:00Z m:win-sales=50
series e:win-test d:2011-01-01T00:00:00Z m:win-sales=52
series e:win-test d:2012-01-01T00:00:00Z m:win-sales=55
series e:win-test d:2013-01-01T00:00:00Z m:win-sales=55
series e:win-test d:2014-01-01T00:00:00Z m:win-sales=70
series e:win-test d:2015-01-01T00:00:00Z m:win-sales=72
series e:win-test d:2016-01-01T00:00:00Z m:win-sales=70
series e:win-test d:2017-01-01T00:00:00Z m:win-sales=75
series e:win-test d:2018-01-01T00:00:00Z m:win-sales=80
- Oracle SQL:
CREATE TABLE win_sales (
time int,
value NUMBER
)
INSERT INTO win_sales (time, value) VALUES (2010, 50);
INSERT INTO win_sales (time, value) VALUES (2011, 52);
INSERT INTO win_sales (time, value) VALUES (2012, 55);
INSERT INTO win_sales (time, value) VALUES (2013, 55);
INSERT INTO win_sales (time, value) VALUES (2014, 70);
INSERT INTO win_sales (time, value) VALUES (2015, 72);
INSERT INTO win_sales (time, value) VALUES (2016, 70);
INSERT INTO win_sales (time, value) VALUES (2017, 75);
INSERT INTO win_sales (time, value) VALUES (2018, 80);