Cell References in SQL and Excel

Overview

This article contains examples of how analytic functions in SQL can be used to emulate step-by-step calculations in Microsoft Excel.

Analytic functions in SQL allow developers to access grouped statistics without reducing the number of rows returned by the query.

SELECT value,
  -- SUM calculated without GROUP BY
  value/SUM(value) OVER() AS weight
  FROM table-name
ORDER BY weight DESC

Often called windowing functions due to their ability to operate on ordered and partitioned groups of records, analytic functions are different from aggregate functions which reduce several rows into a single result row.

Many aggregate functions such as SUM, AVG, and COUNT can be invoked as analytic functions using the OVER clause. However the class of analytic functions also includes reference functions which operate on an ordered set of records. Such reference functions include:

  • LAG. Provides access to a previous row at a specified offset from the current position.
  • LEAD. Provides access to a following row at a specified offset from the current position.
  • FIRST_VALUE. Provides access to the first row.
  • LAST_VALUE. Provides access to the last row.

Staged Calculations in Excel

To assist users in organizing complex analysis into step-by-step calculations, Excel provides a convenient A1 addressing notation, or reference style, to pass calculation results between cells by value:

  • Columns are assigned letter names, starting with A for the left-most column.

  • Rows are assigned ordinal numbers, starting with 1 for the top row.

  • =B2 returns the value of cell located in column B and row 2.

  • =B3 - B2 returns the difference between values of cells B3 and B2.

Users can easily calculate the year-over-year change by entering =B3 - B2 formula as the value of the current cell.

Since the reference in relative, the row and column indexes are automatically updated when we copy the cell value to the remaining cells in the Y-o-Y Change, $M column.

The above references are relative to the current cell position but they can also refer to an absolute address using $ prefix, for example =B3 - B$2 to calculate change in sales since 2010.

Referencing in SQL

The purpose of the LAG and LEAD functions in SQL is similar - to access a column value in a preceding or following row.

Assuming the same dataset is loaded in the database, it can be queried with a SELECT statement as follows:

SELECT date_format(time, 'yyyy') AS "Year",
  value AS "Sales Volume, $M"
FROM "win-sales"
  ORDER BY time
| Year | Sales Volume, $M |
|------|------------------|
| 2010 |               50 |
| 2011 |               52 |
| 2012 |               55 |
| 2013 |               55 |
| 2014 |               70 |
| 2015 |               72 |
| 2016 |               70 |
| 2017 |               75 |
| 2018 |               80 |

Relative References

Similar to Excel, the reference in LAG function consists of the column name and the row offset, which starts with 1 by default. Excel assigns column names automatically, starting with A (up to 16,384 are allowed). In SQL, the function must to refer to the name of an existing column or alias such as LAG(value).

SELECT date_format(time, 'yyyy') AS "Year",
  value AS "Sales Volume, $M",
  LAG(value) AS "Previous Sales Volume, $M",
  value - LAG(value) AS "Y-o-Y Change, $M"
FROM "win-sales"
| Year | Sales Volume, $M | Previous Sales Volume, $M | Y-o-Y Change, $M |
|------|------------------|---------------------------|------------------|
| 2010 |               50 |                           |                  |
| 2011 |               52 |                        50 |                2 |
| 2012 |               55 |                        52 |                3 |
| 2013 |               55 |                        55 |                0 |
| 2014 |               70 |                        55 |               15 |
| 2015 |               72 |                        70 |                2 |
| 2016 |               70 |                        72 |               -2 |
| 2017 |               75 |                        70 |                5 |
| 2018 |               80 |                        75 |                5 |

The value - LAG(value) expression returns the same results as =B3 - B2 in Excel. In this case B and value are equivalent column names.

To accomplish the same result in Oracle Database, add OVER clause after each analytical function.

  • Oracle SQL version:
SELECT time AS "Year",
  value AS "Sales Volume, $M",
  LAG(value) OVER (ORDER BY time) AS "Previous Sales Volume, $M",
  value - LAG(value) OVER (ORDER BY time) AS "Y-o-Y Change, $M"
FROM win_sales

Absolute References

FIRST and LAST functions can be used to access first and last rows in the partition respectively. This is equivalent to =B3 - B$2 in Excel which calculates the change in sales since 2010.

SELECT date_format(time, 'yyyy') AS "Year",
  value AS "Sales Volume, $M",
  LAG(value) AS "Previous Sales Volume, $M",
  value - LAG(value) AS "Y-o-Y Change, $M",
  value - FIRST_VALUE(value) AS "Change since 2010, $M"
FROM "win-sales"
  WITH ROW_NUMBER(entity, tags ORDER BY time) > 0
| Year | Sales Volume, $M | Previous Sales Volume, $M | Y-o-Y Change, $M | Change since 2010, $M |
|------|------------------|---------------------------|------------------|-----------------------|
| 2010 |               50 |                         0 |                0 |                     0 |
| 2011 |               52 |                        50 |                2 |                     2 |
| 2012 |               55 |                        52 |                3 |                     5 |
| 2013 |               55 |                        55 |                0 |                     5 |
| 2014 |               70 |                        55 |               15 |                    20 |
| 2015 |               72 |                        70 |                2 |                    22 |
| 2016 |               70 |                        72 |               -2 |                    20 |
| 2017 |               75 |                        70 |                5 |                    25 |
| 2018 |               80 |                        75 |                5 |                    30 |
  • Oracle SQL version:
SELECT time AS "Year",
  value AS "Sales Volume, $M",
  LAG(value) OVER (ORDER BY time) AS "Previous Sales Volume, $M",
  value - LAG(value) OVER (ORDER BY time) AS "Y-o-Y Change, $M",
  value - FIRST_VALUE(value) OVER (ORDER BY time) AS "Change since 2010, $M"
FROM win_sales

Aggregate Functions

Sliding Total

To calculate a running (sliding) total in Excel, one can pass a relative range of cell values, for example: =SUM(B2:B4).

To return the same result in SQL one can add LAG results with increasing offsets.

SELECT date_format(time, 'yyyy') AS "Year",
  value AS "Sales Volume, $M",
  value + LAG(value) + LAG(value,2) AS "3-Year Sales Volume, $M"
FROM "win-sales"
| Year | Sales Volume, $M | 3-Year Sales Volume, $M |
|------|------------------|-------------------------|
| 2010 |               50 |                         |
| 2011 |               52 |                         |
| 2012 |               55 |                     157 |
| 2013 |               55 |                     162 |
| 2014 |               70 |                     180 |
| 2015 |               72 |                     197 |
| 2016 |               70 |                     212 |
| 2017 |               75 |                     217 |
| 2018 |               80 |                     225 |
  • Oracle SQL version:
SELECT time AS "Year",
  value AS "Sales Volume, $M",
  value + LAG(value) OVER (ORDER BY time) + LAG(value,2) OVER (ORDER BY time) AS "3-Year Sales Volume, $M"
FROM win_sales

However, if the number of rows is large enough, applying an aggregate function to all values within the fixed-length partition is more efficient.

SELECT date_format(time, 'yyyy') AS "Year",
  value AS "Sales Volume, $M",
  SUM(value) AS "3-Year Sales Volume, $M"
FROM "win-sales"
  WITH ROW_NUMBER(entity, tags ORDER BY time) <= 3

The above query produces the same result using the SUM function with the size of the sliding window controlled in the ROW_NUMBER clause.

  • Oracle version:
SELECT time AS "Year",
  value AS "Sales Volume, $M",
  SUM(value) OVER(ORDER BY time ROWS BETWEEN 2 PRECEDING AND CURRENT ROW) AS "3-Year Sales Volume, $M"
FROM win_sales

Growing Total

To calculate a growing total in Excel, set the starting row to an absolute value =SUM(B$2:B4).

To calculate a growing total for all values in the column in SQL, create a single partition ordered by time and disable row filtering with ROW_NUMBER > 0.

SELECT date_format(time, 'yyyy') AS "Year",
  value AS "Sales Volume, $M",
  SUM(value) AS "Total Volume since 2010, $M"
FROM "win-sales"
  WITH ROW_NUMBER(entity, tags ORDER BY time) > 0
| Year | Sales Volume, $M | Total Volume since 2010, $M |
|------|------------------|-----------------------------|
| 2010 |               50 |                          50 |
| 2011 |               52 |                         102 |
| 2012 |               55 |                         157 |
| 2013 |               55 |                         212 |
| 2014 |               70 |                         282 |
| 2015 |               72 |                         354 |
| 2016 |               70 |                         424 |
| 2017 |               75 |                         499 |
| 2018 |               80 |                         579 |
  • Oracle SQL version:
SELECT time AS "Year",
  value AS "Sales Volume, $M",
  SUM(value) OVER (ORDER BY time) AS "Total Volume since 2010, $M"
FROM win_sales

Appendix: Sample Dataset

  • ATSD data commands
series e:win-test d:2010-01-01T00:00:00Z m:win-sales=50
series e:win-test d:2011-01-01T00:00:00Z m:win-sales=52
series e:win-test d:2012-01-01T00:00:00Z m:win-sales=55
series e:win-test d:2013-01-01T00:00:00Z m:win-sales=55
series e:win-test d:2014-01-01T00:00:00Z m:win-sales=70
series e:win-test d:2015-01-01T00:00:00Z m:win-sales=72
series e:win-test d:2016-01-01T00:00:00Z m:win-sales=70
series e:win-test d:2017-01-01T00:00:00Z m:win-sales=75
series e:win-test d:2018-01-01T00:00:00Z m:win-sales=80
  • Oracle SQL:
CREATE TABLE win_sales (
 time int,
 value NUMBER
)
INSERT INTO win_sales (time, value) VALUES (2010, 50);
INSERT INTO win_sales (time, value) VALUES (2011, 52);
INSERT INTO win_sales (time, value) VALUES (2012, 55);
INSERT INTO win_sales (time, value) VALUES (2013, 55);
INSERT INTO win_sales (time, value) VALUES (2014, 70);
INSERT INTO win_sales (time, value) VALUES (2015, 72);
INSERT INTO win_sales (time, value) VALUES (2016, 70);
INSERT INTO win_sales (time, value) VALUES (2017, 75);
INSERT INTO win_sales (time, value) VALUES (2018, 80);