When you find the upper and lower quartiles what happens if the median is not in the data set?

For example:
1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12

The median is 6.5

To find the upper and lower quartiles you need to find the mean of the set of data to the right of the median and the left of the median. But what happens when the median is not in the data set? How do you find the upper and lower quartiles?

1 Answer
May 22, 2018

Use 0.25(n+1) and 0.75(n+1) to calculate the positions that the quartiles would be in. Compute the appropriate weighted average of the elements on either side of these positions.

Q_1 = 3.25.
Q_3 = 9.75.

Explanation:

The median is often introduced as "the middle term of an ordered set". Of course, there is a catch: there's only a 1-in-2 chance a set will have a middle term, because sets with an even number of elements have no middle term.

At this point, we use a formula to estimate what the middle term would be, if it existed. In words, the formula is "the average of the two middle terms."

For a set of size n, the median is the term that would be in position 0.5(n+1). Think of this as "halfway through a set of size n+1". For example, if n=11, we get 0.5(11+1) = 6, and if n=12, we get 0.5(12+1) = 6.5. The element that would be at position 6.5 is halfway between the elements in positions 6 and 7. This agrees with what we know about medians. So far, so good.


The lower and upper quartlies (Q_1 and Q_3) may be similarly introduced as "the medians of the subsets of data to the left/right of the median". What we really mean is, they are the elements that are 1/4 and 3/4 of the way through the set. But when the original median is not an element of the set, we need to resort to a formula that gives us what the 1/4 and 3/4 terms would be, if they existed.

To find the quartiles, we modify the formula for the median's position to give us the quartiles' positions. For the lower quartile, its position is 0.25(n+1). Likewise, the upper quartile's position is 0.75(n+1).

These position numbers could be integers (i.e. when n+1 is a multiple of 4). For example, if n=11, then we get 0.25(11+1) = 3 and 0.75(11+1) = 9, giving us quartlies that are elements of the set.

If n=12, then we get 0.25(12+1)= 3.25 and 0.75(12+1) = 9.75. In this case, Q_1 is 25% of the way between elements 3 and 4, while Q_3 is 75% of the way between elements 9 and 10. How do we calculate these "elements"?

We use Q_1 = x_3 + 0.25(x_4-x_3) (i.e. the 3rd element, plus 25% of the distance to the 4th element). Likewise, Q_3 = x_9 + 0.75(x_10-x_9).

For this question, Q_1 = 3 + 0.25(4-3) = 3 + 0.25(1) = 3.25, and Q_3 = 9 + 0.75(10-9) = 9+0.75(1) = 9.75. The fact that these quartiles match their positions is just luck. The given data are just ordered integers. Usually, the positions and the "elements" will not match.

Summary:

The p^"th" percentile is the element in position p%(n+1). Calling this number w.dd (for "whole"."decimal"), the percentile's value is x_w+dd%(x_(w+1)-x_w).