Variance

Variance is the average of the squared differences of a random variable from its mean. It is a statistical measurement of variability that indicates how far a set of numbers varies from the mean. A high variance tells us that the collected data has higher variability, and the data is generally further from the mean. A low variance tells us the opposite, that the collected data is generally similar, and does not deviate much from the mean.

Variance is used throughout statistics in areas such as descriptive statistics, inferential statistics, hypothesis testing, and more.

Variance formulas

The formula for variance changes depending on whether the variance is being calculated for a population or a sample. A statistical population is any complete group of observations or objects from which a sample is taken, while a sample comprises some subset of a population. Because it can be impractical or even impossible to collect data for an entire population, samples of a population are often gathered then used to make generalizations or inferences about the population as a whole.

Population variance

Variance is commonly denoted as σ2 or s2 depending on whether it is a population or sample variance, respectively.

where σ2 is the variance of the population, xi is the ith element in the set, μ is the population mean, and N is the population size.

Another form of the population variance formula that can be computationally simpler (when calculating variance by hand) is:

Refer to the variance formula page to see the algebra involved in re-arranging the formula.

Sample variance

The formula for sample variance is similar to that for a population with some adjustments to account for the differences in data types:

where s2 is the variance of the sample, xi is the ith element in the set, x is the sample mean, and n is the sample size.

Another form of the sample variance formula that can be computationally simpler (when calculating variance by hand) is:

Refer to the variance formula page to see the algebra involved in re-arranging the formula.

Example

Find the variance of the following weights (lbs) obtained from a sample of students: 127, 134, 155, 171, and 202.

1. Calculate the sample mean:

2. Calculate the sum of squares (SS):

SS =
= (127 - 157.8)2 + (134 - 157.8)2 + (155 - 157.8)2 + (171 - 157.8)2 + (202 - 157.8)2
= 3650.8

3. Calculate the sample variance:

If we used the simplified version of the sample variance formula instead, the summation that we need to compute is simpler:

= 128155

Then, plugging in the mean and the result of the summation into the simplified formula yields:

Thus, in both cases, the variance is 912.7 lbs2, confirming the equivalence of both formulas.

Variance and standard deviation

Variance is commonly used to calculate the standard deviation, another measure of variability. Standard deviation is a rough measure of how much a set of numbers varies on either side of their mean, and is calculated as the square root of variance (so if the variance is known, it is fairly simple to determine the standard deviation).

One of the drawbacks of variance is that it results in a value in terms of units2 which can be difficult to interpret. Since standard deviation is the square root of variance, it can be a more useful measure that is easier to interpret since it results in values that are consistent with the unit of measurement of the random variable in question.

For instance, given a set of heights measured in meters, the variance would be in units of squared meters. In contrast, the standard deviation would be measured in meters.

Example

Given that the mean of the measured heights is 1.7 m with a standard deviation of 0.1 m, find the range of heights that fall within one, two, and three standard deviations.

  • 1 standard deviation: 1.7 ± 0.1 = 1.6 or 1.8 m
  • 2 standard deviations: 1.7 ± 2(0.1) = 1.5 or 1.9 m
  • 3 standard deviations: 1.7 ± 3(0.1) = 1.4 or 2.0 m

In a standard normal distribution, 68% of the values lie within 1 standard deviation from the mean, 95% lie within 2 standard deviations from the mean, and 99.7% lie within 3 standard deviations from the mean. Based on the above data, this would mean that 99.7% of heights would fall between 1.4 and 2.0 m, 95% between 1.5 and 1.9 m, and 68% between 1.6 and 1.8 m.