Summary: in this tutorial, you will learn how to use the SQL Server VAR()
function to calculate the sample variances of values.
Introduction to the sample variance
A sample variance measures how far a set of numbers is spread out from their average value (mean). A sample variance allows you to get insights into the variability within a sample dataset.
To calculate a sample variance of a set of n numbers, you follow these steps:
- First, calculate the average (or mean) by dividing the sum of all numbers by the number of values in the set.
- Second, calculate the total of square differences.
- Third, divide the total of square differences by the number of values minus one.
For example, given a sample set {1, 2, 3 }, you can calculate the sample variance as follows:
First, calculate the mean:
(1 + 2 + 3) / 3 = 2
Second, calculate the total square difference between each number and the mean:
(1−2)2+(2−2)2+(3−2)2 = 2
Third, calculate the sample variance:
2 / (3 – 1) = 1
The sample variance is 1.
The SQL Server VAR() aggregate function
In SQL Server, you can use the VAR()
aggregate function to calculate the sample variance of a set of numbers.
Here’s the syntax of the VAR()
function:
VAR( [ALL | DISTINCT] expression)
Code language: SQL (Structured Query Language) (sql)
In this syntax:
ALL
: This option instructs the function to use all values including duplicate ones for calculating the sample variance.DISTINCT
: This option instructs the function to use unique values for calculating the sample variance.
The VAR()
function uses the ALL
option by default. Please note that the VAR()
function ignores NULL
.
The VAR()
function returns the sample variance as a float number. It returns NULL
if there is one or no row in the sample.
SQL Server VAR() function examples
Let’s explore some examples that use the SQL Server VAR()
function.
1) Basic SQL Server VAR() function examples
First, create a sample table called t with one column id that has some numbers 1, 2, 2, 3, and NULL
:
CREATE TABLE t (id INT);
INSERT INTO
t (id)
VALUES
(1),
(2),
(2),
(3),
(NULL);
SELECT * FROM t;
Code language: SQL (Structured Query Language) (sql)
Output:
id
-----------
1
2
2
3
NULL
Code language: SQL (Structured Query Language) (sql)
Second, calculate the sample variance of unique values in the id column of the t table using the VAR()
function with DISTINCT
option:
SELECT
VAR(DISTINCT id) variance
FROM
t;
Code language: SQL (Structured Query Language) (sql)
Output:
variance
----------------------
1
Warning: Null value is eliminated by an aggregate or other SET operation.
(1 row affected)
Code language: SQL (Structured Query Language) (sql)
The sample variance is 1.
Second, calculate the sample variance of all values including duplicates in the id column of the t table using the VAR()
function:
variance
----------------------
0.666666666666667
Warning: Null value is eliminated by an aggregate or other SET operation.
(1 row affected)
Code language: SQL (Structured Query Language) (sql)
The sample variance is 0.666666666666667
.
2) Practical VAR() function examples
We’ll use the tables from the sample database for the demonstration:
The following example uses the VAR()
function to calculate the sample variance of list prices of the brand id 1 and category id 5 in the production.products
table:
SELECT
VAR(list_price) var_list_price
FROM
production.products
WHERE
brand_id = 1
AND category_id = 5;
Code language: SQL (Structured Query Language) (sql)
Output:
var_list_price
----------------------
40000
Code language: SQL (Structured Query Language) (sql)
To interpret the sample variance, we can include the average list price in the result set:
SELECT
VAR(list_price) var_list_price,
AVG(list_price) avg_list_price
FROM
production.products
WHERE
brand_id = 1
AND category_id = 5;
Code language: SQL (Structured Query Language) (sql)
Output:
var_list_price | avg_list_price
---------------+---------------
40000.0 | 2799.990000
Code language: SQL (Structured Query Language) (sql)
A sample variance of 40,000.0
means that the list prices of products within the category id 5 and brand 1 have a relatively high degree of variability around the mean list price of 2,799.99
.
The following example uses the VAR()
function to calculate the sample variances of list prices of products in category id 1 for each brand:
SELECT
b.brand_name,
AVG(list_price) avg_list_price,
VAR(list_price) var_list_price
FROM
production.products p
INNER JOIN production.brands b ON b.brand_id = p.brand_id
WHERE
category_id = 1
GROUP BY
b.brand_name;
Code language: SQL (Structured Query Language) (sql)
Output:
brand_name | avg_list_price | var_list_price
-------------+----------------+--------------------
Electra | 330.347142 | 3248.015873015962
Haro | 249.990000 | 3199.9999999999905
Strider | 209.990000 | 11200.0
Sun Bicycles | 109.990000 | NULL
Trek | 260.424782 | 8040.7114624506585
(5 rows)
Code language: SQL (Structured Query Language) (sql)
The output indicates that:
- The Sun Bicycles brand has a sample variance of the list price
NULL
because it has one product with a list price of109.99
as indicated in theavg_list_price
column. - The Strider brand has the highest degree of variability around the mean list price of
209.99
, compared to other brands. - Electra, Haro, and Trek brands have less spread or dispersion of list prices around their respective means compared with other brands.
Summary
- Use the
VAR()
function to calculate the sample variances of values.