SQL Server VAR Function

Summary: in this tutorial, you will learn how to use the SQL Server VAR() function to calculate the sample variances of values.

Introduction to the sample variance

A sample variance measures how far a set of numbers is spread out from their average value (mean). A sample variance allows you to get insights into the variability within a sample dataset.

To calculate a sample variance of a set of n numbers, you follow these steps:

First, calculate the average (or mean) by dividing the sum of all numbers by the number of values in the set.
Second, calculate the total of square differences.
Third, divide the total of square differences by the number of values minus one.

For example, given a sample set {1, 2, 3 }, you can calculate the sample variance as follows:

First, calculate the mean:

(1 + 2 + 3) / 3 = 2

Second, calculate the total square difference between each number and the mean:

(1−2)²+(2−2)²+(3−2)² = 2

Third, calculate the sample variance:

2 / (3 – 1) = 1

The sample variance is 1.

The SQL Server VAR() aggregate function

In SQL Server, you can use the VAR() aggregate function to calculate the sample variance of a set of numbers.

Here’s the syntax of the VAR() function:

VAR( [ALL | DISTINCT] expression)Code language: SQL (Structured Query Language) (sql)

In this syntax:

ALL: This option instructs the function to use all values including duplicate ones for calculating the sample variance.
DISTINCT: This option instructs the function to use unique values for calculating the sample variance.

The VAR() function uses the ALL option by default. Please note that the VAR() function ignores NULL.

The VAR() function returns the sample variance as a float number. It returns NULL if there is one or no row in the sample.

SQL Server VAR() function examples

Let’s explore some examples that use the SQL Server VAR() function.

1) Basic SQL Server VAR() function examples

First, create a sample table called t with one column id that has some numbers 1, 2, 2, 3, and NULL:

CREATE TABLE t (id INT);

INSERT INTO
  t (id)
VALUES
  (1),
  (2),
  (2),
  (3),
  (NULL);

SELECT * FROM t;Code language: SQL (Structured Query Language) (sql)

Output:

id
-----------
1
2
2
3
NULLCode language: SQL (Structured Query Language) (sql)

Second, calculate the sample variance of unique values in the id column of the t table using the VAR() function with DISTINCT option:

SELECT
  VAR(DISTINCT id) variance
FROM
  t;Code language: SQL (Structured Query Language) (sql)

Output:

variance
----------------------
1
Warning: Null value is eliminated by an aggregate or other SET operation.

(1 row affected)Code language: SQL (Structured Query Language) (sql)

The sample variance is 1.

Second, calculate the sample variance of all values including duplicates in the id column of the t table using the VAR() function:

variance
----------------------
0.666666666666667
Warning: Null value is eliminated by an aggregate or other SET operation.

(1 row affected)Code language: SQL (Structured Query Language) (sql)

The sample variance is 0.666666666666667.

2) Practical VAR() function examples

We’ll use the tables from the sample database for the demonstration:

The following example uses the VAR() function to calculate the sample variance of list prices of the brand id 1 and category id 5 in the production.products table:

SELECT
  VAR(list_price) var_list_price
FROM
  production.products
WHERE
  brand_id = 1
  AND category_id = 5;Code language: SQL (Structured Query Language) (sql)

Output:

var_list_price
----------------------
40000Code language: SQL (Structured Query Language) (sql)

To interpret the sample variance, we can include the average list price in the result set:

SELECT
  VAR(list_price) var_list_price,
  AVG(list_price) avg_list_price
FROM
  production.products
WHERE
  brand_id = 1
  AND category_id = 5;Code language: SQL (Structured Query Language) (sql)

Output:

var_list_price | avg_list_price
---------------+---------------
40000.0        | 2799.990000Code language: SQL (Structured Query Language) (sql)

A sample variance of 40,000.0 means that the list prices of products within the category id 5 and brand 1 have a relatively high degree of variability around the mean list price of 2,799.99.

The following example uses the VAR() function to calculate the sample variances of list prices of products in category id 1 for each brand:

SELECT
  b.brand_name,
  AVG(list_price) avg_list_price,
  VAR(list_price) var_list_price
FROM
  production.products p
INNER JOIN production.brands b ON b.brand_id = p.brand_id
WHERE
  category_id = 1 
GROUP BY
  b.brand_name;Code language: SQL (Structured Query Language) (sql)

Output:

brand_name   | avg_list_price | var_list_price
-------------+----------------+--------------------
Electra      | 330.347142     | 3248.015873015962
Haro         | 249.990000     | 3199.9999999999905
Strider      | 209.990000     | 11200.0
Sun Bicycles | 109.990000     | NULL
Trek         | 260.424782     | 8040.7114624506585
(5 rows)Code language: SQL (Structured Query Language) (sql)

The output indicates that:

The Sun Bicycles brand has a sample variance of the list price NULL because it has one product with a list price of 109.99 as indicated in the avg_list_price column.
The Strider brand has the highest degree of variability around the mean list price of 209.99, compared to other brands.
Electra, Haro, and Trek brands have less spread or dispersion of list prices around their respective means compared with other brands.

Summary

Use the VAR() function to calculate the sample variances of values.

Was this tutorial helpful?