Summary: in this tutorial, you will learn how to use the SQL Server CUME_DIST()
function to calculate a cumulative distribution of a value within a group of values.
Introduction to SQL Server CUME_DIST() function
Sometimes, you want to make a report that contains the top or bottom x% values from a data set e.g., top 5% sales staffs by net sales. One way to achieve this with SQL Server is to use the CUME_DIST()
function.
The CUME_DIST()
function calculates the cumulative distribution of a value within a group of values. Simply put, it calculates the relative position of a value in a group of values.
The following shows the syntax of the CUME_DIST()
function:
CUME_DIST() OVER (
[PARTITION BY partition_expression, ... ]
ORDER BY sort_expression [ASC | DESC], ...
)
Code language: SQL (Structured Query Language) (sql)
Let’s examine this syntax in detail.
PARTITION BY clause
The PARTITION BY
clause distributes rows into multiple partitions to which the CUME_DIST()
function is applied.
The PARTITION BY
clause is optional. The CUME_DIST()
function will treat the whole result set as a single partition if you omit the PARTITION BY
clause.
ORDER BY clause
The ORDER BY
clause specifies the logical order of rows in each partition to which the CUME_DIST()
function is applied. The ORDER BY
clause considers NULL values as the lowest possible values.
Return value
The result of CUME_DIST()
is greater than 0 and less than or equal to 1.
0 < CUME_DIST() <= 1
Code language: SQL (Structured Query Language) (sql)
The function returns the same cumulative distribution values for the same tie values.
SQL Server CUME_DIST() examples
Let’s take some examples of using the CUME_DIST()
function.
Using SQL Server CUME_DIST() function over a result set example
The following statement calculates the sales percentile for each sales staff in 2017:
SELECT
CONCAT_WS(' ',first_name,last_name) full_name,
net_sales,
CUME_DIST() OVER (
ORDER BY net_sales DESC
) cume_dist
FROM
sales.vw_staff_sales t
INNER JOIN sales.staffs m on m.staff_id = t.staff_id
WHERE
year = 2017;
Code language: SQL (Structured Query Language) (sql)
Here is the result:
As shown in the output, 50% of the sales staff have net sales greater than 285K.
Using SQL Server CUME_DIST() function over a partition example
This example uses the CUME_DIST()
function to calculate the sales percentile for each sales staff in 2016 and 2017.
SELECT
CONCAT_WS(' ',first_name,last_name) full_name,
net_sales,
year,
CUME_DIST() OVER (
PARTITION BY year
ORDER BY net_sales DESC
) cume_dist
FROM
sales.vw_staff_sales t
INNER JOIN sales.staffs m on m.staff_id = t.staff_id
WHERE
year IN (2016,2017);
Code language: SQL (Structured Query Language) (sql)
Here is the output:
In this example:
- The
PARTITION BY
clause distributed the rows into two partitions by year, 2016 and 2017. - The
ORDER BY
clause sorted rows in each partition by net sales from high to low to which theCUME_DIST()
function is applied.
To get the top 20% of sales staff by net sales in 2016 and 2017, you use the following query:
WITH cte_sales AS (
SELECT
CONCAT_WS(' ',first_name,last_name) full_name,
net_sales,
year,
CUME_DIST() OVER (
PARTITION BY year
ORDER BY net_sales DESC
) cume_dist
FROM
sales.vw_staff_sales t
INNER JOIN sales.staffs m
ON m.staff_id = t.staff_id
WHERE
year IN (2016,2017)
)
SELECT
*
FROM
cte_sales
WHERE
cume_dist <= 0.20;
Code language: SQL (Structured Query Language) (sql)
The following picture shows the output:
In this tutorial, you have learned how to use the SQL Server CUME_DIST()
function to calculate the cumulative distribution of a value in a group of values.