Measures of Variability
Key Questions
-
Answer:
In the formula for a population standard deviation, you divide by the population size
NN , whereas in the formula for the sample standard deviation, you divide byn-1n−1 (the sample size minus one).Explanation:
If
muμ is the mean of the population, the formula for the population standard deviation of the population datax_{1},x_{2},x_{3},\ldots, x_{N} issigma=sqrt{\frac{sum_{k=1}^{N}(x_{k}-mu)^{2}}{N}} .If
bar{x} is the mean of a sample, the formula for the sample standard deviation of the sample datax_{1},x_{2},x_{3},\ldots, x_{n} iss=sqrt{\frac{sum_{k=1}^{n}(x_{k}-bar{x})^{2}}{n-1}} .The reason this is done is somewhat technical. Doing this makes the sample variance
s^{2} a so-called unbiased estimator for the population variancesigma^{2} . In effect, if the population size is really large and you are doing many, many random samples of the same sizen from that large population, the mean of the many, many values ofs^{2} will have an average very close to the value ofsigma^{2} (and, as far as a theoretical perspective goes, the mean ofs^{2} as a "random variable" will be exactlysigma^{2} ).The technicalities for why this is true involve lots of algebra with summations, and is usually not worth the time spent for beginning students.
-
Standard deviation is most widely used.
Range simply gives the difference between lowest and highest value, and a few extreme values will alter the range excessively.
The standard deviation
sigma tells you where most of the values will be, and in a normal distribution 68% of all values will be within one standard deviation from the meanmu , and 95% will be within two standard deviations of the mean.Example:
You have a filling machine that fills kilogram bags of sugar. It will not fill exactly1000g every time, the standard deviation is10g .
Then you know, that68% is between990and1010g , and95% between980and1020g , a total span of20g or40g respectively.Every now and again a bag will be far over-filled (say
1100g ) and sometimes a bag will end up empty (0g ), so the range will be a total of1100g .You may decide which of the two gives a better idea of the spread in this distribution.
-
SD: it gives you an numerical value about the variation of the data.
Range: it gives you the maximal and minimal values of all data.Mean: a pontual value that represents the average value of data. Doesn't represent the true in assimetrical distributions and it is influenced by outliers