Measures of Variability
Key Questions
In the formula for a population standard deviation, you divide by the population size
NN , whereas in the formula for the sample standard deviation, you divide byn-1n−1 (the sample size minus one).Explanation:
muμ is the mean of the population, the formula for the population standard deviation of the population datax_{1},x_{2},x_{3},\ldots, x_{N} issigma=sqrt{\frac{sum_{k=1}^{N}(x_{k}-mu)^{2}}{N}} .If
bar{x} is the mean of a sample, the formula for the sample standard deviation of the sample datax_{1},x_{2},x_{3},\ldots, x_{n} iss=sqrt{\frac{sum_{k=1}^{n}(x_{k}-bar{x})^{2}}{n-1}} .The reason this is done is somewhat technical. Doing this makes the sample variance
s^{2} a so-called unbiased estimator for the population variancesigma^{2} . In effect, if the population size is really large and you are doing many, many random samples of the same sizen from that large population, the mean of the many, many values ofs^{2} will have an average very close to the value ofsigma^{2} (and, as far as a theoretical perspective goes, the mean ofs^{2} as a "random variable" will be exactlysigma^{2} ).The technicalities for why this is true involve lots of algebra with summations, and is usually not worth the time spent for beginning students.
Standard deviation is most widely used.
Range simply gives the difference between lowest and highest value, and a few extreme values will alter the range excessively.
The standard deviation
sigma tells you where most of the values will be, and in a normal distribution 68% of all values will be within one standard deviation from the meanmu , and 95% will be within two standard deviations of the mean.Example:
You have a filling machine that fills kilogram bags of sugar. It will not fill exactly1000g every time, the standard deviation is10g .
Then you know, that68% is between990and1010g , and95% between980and1020g , a total span of20g or40g respectively.Every now and again a bag will be far over-filled (say
1100g ) and sometimes a bag will end up empty (0g ), so the range will be a total of1100g .You may decide which of the two gives a better idea of the spread in this distribution.
SD: it gives you an numerical value about the variation of the data.
Range: it gives you the maximal and minimal values of all data.Mean: a pontual value that represents the average value of data. Doesn't represent the true in assimetrical distributions and it is influenced by outliers