Measures of Variability

Key Questions

  • Answer:

    In the formula for a population standard deviation, you divide by the population size #N#, whereas in the formula for the sample standard deviation, you divide by #n-1# (the sample size minus one).

    Explanation:

    If #mu# is the mean of the population, the formula for the population standard deviation of the population data #x_{1},x_{2},x_{3},\ldots, x_{N}# is

    #sigma=sqrt{\frac{sum_{k=1}^{N}(x_{k}-mu)^{2}}{N}}#.

    If #bar{x}# is the mean of a sample, the formula for the sample standard deviation of the sample data #x_{1},x_{2},x_{3},\ldots, x_{n}# is

    #s=sqrt{\frac{sum_{k=1}^{n}(x_{k}-bar{x})^{2}}{n-1}}#.

    The reason this is done is somewhat technical. Doing this makes the sample variance #s^{2}# a so-called unbiased estimator for the population variance #sigma^{2}#. In effect, if the population size is really large and you are doing many, many random samples of the same size #n# from that large population, the mean of the many, many values of #s^{2}# will have an average very close to the value of #sigma^{2}# (and, as far as a theoretical perspective goes, the mean of #s^{2}# as a "random variable" will be exactly #sigma^{2}#).

    The technicalities for why this is true involve lots of algebra with summations, and is usually not worth the time spent for beginning students.

  • Standard deviation is most widely used.

    Range simply gives the difference between lowest and highest value, and a few extreme values will alter the range excessively.

    The standard deviation #sigma# tells you where most of the values will be, and in a normal distribution 68% of all values will be within one standard deviation from the mean #mu#, and 95% will be within two standard deviations of the mean.

    Example:
    You have a filling machine that fills kilogram bags of sugar. It will not fill exactly #1000g# every time, the standard deviation is #10g#.
    Then you know, that #68%# is between #990and1010g#, and #95%# between #980and1020g#, a total span of #20g# or #40g# respectively.

    Every now and again a bag will be far over-filled (say #1100g#) and sometimes a bag will end up empty (#0g#), so the range will be a total of #1100g#.

    You may decide which of the two gives a better idea of the spread in this distribution.

  • SD: it gives you an numerical value about the variation of the data.
    Range: it gives you the maximal and minimal values of all data.

    Mean: a pontual value that represents the average value of data. Doesn't represent the true in assimetrical distributions and it is influenced by outliers

Questions