# Measures of Variability

## Key Questions

In the formula for a population standard deviation, you divide by the population size $N$, whereas in the formula for the sample standard deviation, you divide by $n - 1$ (the sample size minus one).

#### Explanation:

If $\mu$ is the mean of the population, the formula for the population standard deviation of the population data ${x}_{1} , {x}_{2} , {x}_{3} , \setminus \ldots , {x}_{N}$ is

$\sigma = \sqrt{\setminus \frac{{\sum}_{k = 1}^{N} {\left({x}_{k} - \mu\right)}^{2}}{N}}$.

If $\overline{x}$ is the mean of a sample, the formula for the sample standard deviation of the sample data ${x}_{1} , {x}_{2} , {x}_{3} , \setminus \ldots , {x}_{n}$ is

$s = \sqrt{\setminus \frac{{\sum}_{k = 1}^{n} {\left({x}_{k} - \overline{x}\right)}^{2}}{n - 1}}$.

The reason this is done is somewhat technical. Doing this makes the sample variance ${s}^{2}$ a so-called unbiased estimator for the population variance ${\sigma}^{2}$. In effect, if the population size is really large and you are doing many, many random samples of the same size $n$ from that large population, the mean of the many, many values of ${s}^{2}$ will have an average very close to the value of ${\sigma}^{2}$ (and, as far as a theoretical perspective goes, the mean of ${s}^{2}$ as a "random variable" will be exactly ${\sigma}^{2}$).

The technicalities for why this is true involve lots of algebra with summations, and is usually not worth the time spent for beginning students.

• Standard deviation is most widely used.

Range simply gives the difference between lowest and highest value, and a few extreme values will alter the range excessively.

The standard deviation $\sigma$ tells you where most of the values will be, and in a normal distribution 68% of all values will be within one standard deviation from the mean $\mu$, and 95% will be within two standard deviations of the mean.

Example:
You have a filling machine that fills kilogram bags of sugar. It will not fill exactly $1000 g$ every time, the standard deviation is $10 g$.
Then you know, that 68% is between $990 \mathmr{and} 1010 g$, and 95% between $980 \mathmr{and} 1020 g$, a total span of $20 g$ or $40 g$ respectively.

Every now and again a bag will be far over-filled (say $1100 g$) and sometimes a bag will end up empty ($0 g$), so the range will be a total of $1100 g$.

You may decide which of the two gives a better idea of the spread in this distribution.

• SD: it gives you an numerical value about the variation of the data.
Range: it gives you the maximal and minimal values of all data.

Mean: a pontual value that represents the average value of data. Doesn't represent the true in assimetrical distributions and it is influenced by outliers