T-SQL Windowing in SQL Server
T-SQL
Windowing in SQL Server
The WINDOW Clause
The WINDOW clause is part
of the ISO/IEC SQL standard. It allows you to name parts of a window
specification—or an entire one—and then use the window name in the OVER clause
of your query’s window functions. This clause allows you to shorten your code
by avoiding the repetition of identical parts of your window specifications.
This clause is now available in Azure SQL Database and SQL Server 2022,
provided you use database compatibility level 160 or higher.
The WINDOW clause is
located between the query’s HAVING and ORDER BY clauses:
SELECT
FROM
WHERE
GROUP BY
HAVING
WINDOW
ORDER BY
The WINDOW clause has the
following syntax:
WINDOW window_name AS ( [
reference_window_name ]
[ <window partition
clause> ]
[ <window order clause> ]
[ <window frame
clause> ] )
As an example where the
WINDOW clause can be handy in shortening your code, consider the following
query:
USE TSQLV6;
SELECT orderid, custid,
orderdate, qty, val,
SUM(qty) OVER( PARTITION BY custid
ORDER BY orderdate, orderid
ROWS UNBOUNDED PRECEDING ) AS
runsumqty,
SUM(val) OVER( PARTITION BY custid
ORDER BY orderdate, orderid
ROWS UNBOUNDED PRECEDING ) AS
runsumval
FROM Sales.OrderValues
WHERE custid IN (1, 2)
ORDER BY custid,
orderdate, orderid;
This query generates the
following output:
orderid custid
orderdate qty val
runsumqty runsumval
----------- -----------
---------- ----------- ------- ----------- ----------
10643 1 2021-08-25 38 814.50 38
814.50
10692 1 2021-10-03 20 878.00 58
1692.50
10702 1
2021-10-13 21 330.00 79
2022.50
10835 1 2022-01-15 17 845.80 96
2868.30
10952 1 2022-03-16 18 471.20 114
3339.50
11011 1 2022-04-09 60 933.50 174
4273.00
10308 2 2020-09-18 6 88.80 6
88.80
10625 2 2021-08-08 18 479.75 24
568.55
10759 2 2021-11-28 10 320.00 34
888.55
10926 2
2022-03-04 29 514.40 63
1402.95
In this query you can see
two window functions using identical window specifications, including window
partitioning, ordering and framing clauses. To shorten the query, you can use
the WINDOW clause to name a window specification with all three elements, say
as W, and then specify OVER W in both window functions, like so:
SELECT orderid, custid,
orderdate, qty, val,
SUM(qty) OVER W AS runsumqty,
SUM(val) OVER W AS runsumval
FROM Sales.OrderValues
WHERE custid IN (1, 2)
WINDOW W AS ( PARTITION
BY custid
ORDER BY orderdate, orderid
ROWS UNBOUNDED PRECEDING )
ORDER BY custid,
orderdate, orderid;
As you can see, when the
window name represents the whole window specification that you need and not
just part of it, you specify the window name right after the OVER clause
without parentheses.
You may have noticed in
the WINDOW clause’s syntax one window name specification can have a reference
to another window name. This is especially useful when your query has different
window functions with different window specifications and one window specification
is the same as part of another. Consider the following query as an example:
SELECT orderid, custid,
orderdate, qty, val,
ROW_NUMBER() OVER( PARTITION BY custid
ORDER BY orderdate,
orderid ) AS ordernum,
MAX(orderdate) OVER( PARTITION BY custid ) AS
maxorderdate,
SUM(qty) OVER( PARTITION BY custid
ORDER BY orderdate, orderid
ROWS UNBOUNDED PRECEDING ) AS
runsumqty,
SUM(val) OVER( PARTITION BY custid
ORDER BY orderdate,
orderid
ROWS UNBOUNDED PRECEDING ) AS
runsumval
FROM Sales.OrderValues
WHERE custid IN (1, 2)
ORDER BY custid,
orderdate, orderid;
This query generates the
following output:
orderid custid
orderdate qty val
ordernum maxorderdate
runsumqty runsumval
-------- -------
---------- ---- ------- --------- ------------ ----------- -----------
10643 1
2021-08-25 38 814.50 1
2022-04-09 38 814.50
10692 1
2021-10-03 20 878.00 2 2022-04-09
58 1692.50
10702 1
2021-10-13 21 330.00 3
2022-04-09 79 2022.50
10835 1
2022-01-15 17 845.80 4
2022-04-09 96 2868.30
10952 1
2022-03-16 18 471.20 5
2022-04-09 114 3339.50
11011 1
2022-04-09 60 933.50 6
2022-04-09 174 4273.00
10308 2
2020-09-18 6 88.80 1
2022-03-04 6 88.80
10625 2
2021-08-08 18 479.75 2
2022-03-04 24 568.55
10759 2
2021-11-28 10 320.00 3
2022-03-04 34 888.55
10926 2
2022-03-04 29 514.40 4
2022-03-04 63 1402.95
The MAX function’s window
specification has only a window partition clause. The ROW_NUMBER function’s
window specification has a window partition clause that is the same as the MAX
function’s, plus a window order clause. Both SUM functions have the same window
partition and order clauses as the ROW_NUMBER function’s, plus a window frame
clause.
The recursive capability
of the WINDOW clause’s syntax allows you to shorten the query’s code, like so:
SELECT orderid, custid,
orderdate, qty, val,
ROW_NUMBER() OVER PO AS ordernum,
MAX(orderdate) OVER P AS maxorderdate,
SUM(qty) OVER POF AS runsumqty,
SUM(val) OVER POF AS runsumval
FROM Sales.OrderValues
WHERE custid IN (1, 2)
WINDOW P AS ( PARTITION
BY custid ),
PO AS ( P ORDER BY orderdate, orderid ),
POF AS ( PO ROWS UNBOUNDED PRECEDING )
ORDER BY custid,
orderdate, orderid;
The order of the window
name definitions in the WINDOW clause is insignificant. For example, the
following code is valid and has the same meaning as the above query:
SELECT orderid, custid,
orderdate, qty, val,
ROW_NUMBER() OVER PO AS ordernum,
MAX(orderdate) OVER P AS maxorderdate,
SUM(qty) OVER POF AS runsumqty,
SUM(val) OVER POF AS runsumval
FROM Sales.OrderValues
WHERE custid IN (1, 2)
WINDOW POF AS ( PO ROWS
UNBOUNDED PRECEDING ),
PO AS ( P ORDER BY orderdate, orderid ),
P AS ( PARTITION BY custid )
ORDER BY custid,
orderdate, orderid;
Note, though, you can't
use multiple window name references in one window name specification. You're
limited to only one window name reference, plus any relevant additional window
specification elements. For example, the following code isn’t valid for this
reason:
SELECT orderid, custid,
orderdate, qty, val,
SUM(qty) OVER ( P O F ) AS runsumqty,
SUM(val) OVER ( P O F ) AS runsumval
FROM Sales.OrderValues
WHERE custid IN (1, 2)
WINDOW P AS ( PARTITION
BY custid ),
O AS ( ORDER BY orderdate, orderid ),
F AS ( ROWS UNBOUNDED PRECEDING )
ORDER BY custid,
orderdate, orderid;
This code generates the
following error:
Msg 102, Level 15, State
1, Line 106
Incorrect syntax near
'O'.
You're allowed to mix one
window name and additional windowing elements in a window specification, like
so:
SELECT orderid, custid,
orderdate, qty, val,
ROW_NUMBER() OVER ( P ORDER BY orderdate,
orderid ) AS ordernum,
MAX(orderdate) OVER P AS maxorderdate
FROM Sales.OrderValues
WHERE custid IN (1, 2)
WINDOW P AS ( PARTITION
BY custid )
ORDER BY custid,
orderdate, orderid;
This query generates the
following output:
orderid custid
orderdate qty val
ordernum maxorderdate
----------- -----------
---------- ----------- ------- -------------------- ------------
10643 1 2021-08-25 38 814.50 1 2022-04-09
10692 1 2021-10-03 20 878.00 2 2022-04-09
10702 1 2021-10-13 21 330.00 3 2022-04-09
10835 1 2022-01-15 17 845.80 4 2022-04-09
10952 1 2022-03-16 18 471.20 5 2022-04-09
11011 1 2022-04-09 60 933.50 6 2022-04-09
10308 2 2020-09-18 6 88.80 1 2022-03-04
10625 2 2021-08-08 18 479.75 2 2022-03-04
10759 2 2021-11-28 10 320.00 3 2022-03-04
10926 2 2022-03-04 29 514.40 4 2022-03-04
As I mentioned before,
when a window name represents the whole window specification, like with the MAX
function in this query, you specify the window name right after the OVER clause
without parentheses. When the window name is only part of the window
specification, like with the ROW_NUMBER function in this query, you specify the
window name followed by the rest of the windowing elements within parentheses.
By now, you know you're
allowed to recursively define one window name based on another. However, in
case it wasn’t obvious, cyclic references aren’t allowed. For example, the
following query is valid since the window name definitions aren’t cyclic:
SELECT 'This is valid'
WINDOW W1 AS (), W2 AS
(W1), W3 AS (W2);
This query generates the
following output:
-------------
This is valid
However, the following
query is invalid since the window name definitions are cyclic:
SELECT 'This is invalid'
WINDOW W1 AS (W2), W2 AS
(W3), W3 AS (W1);
This code generates the
following error:
Msg 5365, Level 15, State
1, Line 108
Cyclic window references
are not permitted.
Lastly, the scope of the
defined window names is the immediate query/table expression, and can't cross
table expression boundaries. For instance, if you define a window name in the
inner query of a CTE, derived table, view or inline table valued function, the
outer query won’t recognize the inner window name. As an example, the following
query is invalid for this reason:
WITH C AS
(
SELECT orderid, custid, orderdate, qty, val,
SUM(qty) OVER W AS runsumqtyall
FROM Sales.OrderValues
WHERE custid IN (1, 2)
WINDOW W AS ( PARTITION BY custid
ORDER BY orderdate, orderid
ROWS UNBOUNDED PRECEDING )
)
SELECT *,
SUM(qty) OVER W AS runsumqty22
FROM C
WHERE orderdate >=
'20220101';
This code generates the following
error:
Msg 5362, Level 15, State
3, Line 172
Window 'W' is undefined.
You have to define a
window name you want to use in each of the scopes where you want to use it,
like so:
WITH C AS
(
SELECT orderid, custid, orderdate, qty, val,
SUM(qty) OVER W AS runsumqtyall
FROM Sales.OrderValues
WHERE custid IN (1, 2)
WINDOW W AS ( PARTITION BY custid
ORDER BY orderdate, orderid
ROWS UNBOUNDED PRECEDING )
)
SELECT *,
SUM(qty) OVER W AS runsumqty22
FROM C
WHERE orderdate >=
'20220101'
WINDOW W AS ( PARTITION
BY custid
ORDER BY orderdate, orderid
ROWS UNBOUNDED PRECEDING );
This query generates the
following output:
orderid custid
orderdate qty
val runsumqtyall runsumqty22
----------- -----------
---------- ----------- ------- ------------ -----------
10835 1 2022-01-15 17 845.80 96
17
10952 1 2022-03-16 18 471.20 114
35
11011 1 2022-04-09 60 933.50 174
95
10926 2 2022-03-04 29 514.40 63
29
Each of the scopes
defines its own window name W, and they don’t have to be based on the same
specification (though they are in this example).
The Windowing NULL
Treatment Clause
The NULL treatment clause
is part of the ISO/IEC SQL standard and is available to the offset window
functions FIRST_VALUE, LAST_VALUE, LAG and LEAD. This clause has the following
syntax:
<function>(<scalar_expression>[,
<other args>]) [IGNORE NULLS | RESPECT NULLS] OVER( <specification>
)
The RESPECT NULLS option
is the default, in case you don’t indicate this clause. It means you want the
function to return the value of <scalar_expression> in the requested
position (first, last, previous, next), whether it's NULL or non-NULL. The
IGNORE NULLS option introduces a new capability that people have been eagerly
waiting to have in T-SQL for a long time. It means you want the function to
return the value of <scalar_expression> in the requested position if it's
non-NULL. However, if it is NULL, you want the function to keep going in the
relevant direction (backward for LAST_VALUE and LAG, forward for FIRST_VALUE
and LEAD) until a non-NULL value is found. If no non-NULL value is found, then
it will return a NULL.
To illustrate the utility
of this clause, I’ll use a table called T1 in my examples. Use the following
code to create and populate T1:
DROP TABLE IF EXISTS
dbo.T1;
CREATE TABLE dbo.T1
(
id INT NOT NULL CONSTRAINT PK_T1 PRIMARY KEY,
col1 INT NULL,
col2 INT NULL
);
GO
INSERT INTO dbo.T1(id,
col1, col2) VALUES
( 2, NULL,
200),
( 3,
10, NULL),
( 5,
-1, NULL),
( 7, NULL,
202),
(11, NULL,
150),
(13,
-12, 50),
(17, NULL,
180),
(19, NULL,
170),
(23, 1759, NULL);
Suppose the column id
represents the chronological order of the events recorded in T1. Each row
represents an event where one or more attribute values have changed. A NULL
means the attribute retains whatever last non-NULL value it had up to that
point.
Suppose you need to
return the last known (non-NULL) col1 value per event. Without access to the
NULL treatment clause, you'd need to use a fairly complex technique such as the
following:
WITH C AS
(
SELECT id, col1,
MAX(CASE WHEN col1 IS NOT NULL THEN id END)
OVER(ORDER BY id
ROWS UNBOUNDED PRECEDING) AS grp
FROM dbo.T1
)
SELECT id, col1,
MAX(col1) OVER(PARTITION BY grp
ORDER BY id
ROWS UNBOUNDED PRECEDING) AS
lastknowncol1
FROM C;
If you aren't already
familiar with this technique, it can take a bit to figure out the logic here.
This code generates the
following output:
id col1 lastknowncol1
----------- -----------
-------------
2 NULL NULL
3 10 10
5 -1 -1
7 NULL -1
11 NULL -1
13 -12 -12
17 NULL -12
19 NULL -12
23 1759 1759
Having access to the NULL
treatment clause, you can easily achieve the same using the LAST_VALUE function
with the IGNORE NULLS option, like so:
SELECT id, col1,
LAST_VALUE(col1) IGNORE NULLS OVER( ORDER BY
id ROWS UNBOUNDED PRECEDING ) AS lastknowncol
FROM dbo.T1;
The difference is of
course more dramatic if you need to apply this logic to multiple attributes.
Without access to the
NULL treatment clause, you'd use the following code to return the last known
col1 and col2 values:
WITH C AS
(
SELECT id, col1, col2,
MAX(CASE WHEN col1 IS NOT NULL THEN id END)
OVER(ORDER BY id
ROWS UNBOUNDED PRECEDING) AS grp1,
MAX(CASE WHEN col2 IS NOT NULL THEN id END)
OVER(ORDER BY id
ROWS UNBOUNDED PRECEDING) AS grp2
FROM dbo.T1
)
SELECT id,
col1,
MAX(col1) OVER(PARTITION BY grp1
ORDER BY id
ROWS UNBOUNDED PRECEDING) AS
lastknowncol1,
col2,
MAX(col2) OVER(PARTITION BY grp2
ORDER BY id
ROWS UNBOUNDED PRECEDING) AS
lastknowncol2
FROM C;
This code generates the
following output:
id col1 lastknowncol1 col2 lastknowncol2
----------- -----------
------------- ----------- -------------
2 NULL NULL 200
200
3 10 10 NULL 200
5 -1 -1 NULL 200
7 NULL -1 202 202
11 NULL -1 150 150
13 -12 -12 50 50
17 NULL -12 180 180
19 NULL
-12 170 170
23 1759 1759 NULL 170
I should also note even though the table T1 has a supporting covering index with id as the key, each of the last known attribute calculations in the query above results in an explicit sort operator in the plan.
This fact makes this solution quite expensive.
Here’s the alternative using the NULL treatment
clause:
SELECT id,
col1,
LAST_VALUE(col1) IGNORE NULLS OVER W AS lastknowncol1,
col2,
LAST_VALUE(col2) IGNORE NULLS OVER W AS lastknowncol2
FROM dbo.T1
WINDOW W AS ( ORDER BY id ROWS UNBOUNDED PRECEDING );
This solution is so much shorter and more elegant, and
the optimization of the functions with this option can rely on an ordered scan
of a supporting index, and thus avoid explicit sorting, as shown in the plan
for this query.
As mentioned, the NULL treatment clause is available
to all offset window functions (FIRST_VALUE, LAST_VALUE, LAG, and LEAD). Here’s
an example using LAG to return the previous known value:
SELECT id, col1,
LAG(col1)
IGNORE NULLS OVER ( ORDER BY id ) AS prevknowncol1
FROM dbo.T1;
This code generates the following output:
id col1
prevknowncol1
----------- ----------- -------------
2
NULL NULL
3
10 NULL
5
-1 10
7
NULL -1
11
NULL -1
13
-12 -1
17
NULL -12
19
NULL -12
23
1759 -12
Want to try to achieve the same without the NULL
treatment clause? I bet you don’t!
Conclusion
Reuse parts of—or entire—window definitions with the
WINDOW clause
Control NULL treatment in offset window functions with
the NULL treatment clause
SQL Server 2022 has additional T-SQL improvements,
covered by Aaron Bertrand in this article:
- GREATEST / LEAST
- STRING_SPLIT
- DATE_BUCKET
- GENERATE_SERIES
Comments
Post a Comment