The Empirical Rule
Let’s talk about one of the coolest applications of standard deviation – the Empirical Rule. The Empirical Rule tells us that, if we have normally distributed data, then the following will be true:
- about 68% of all scores will fall within 1 standard deviation of the mean (that is, within 1 standard deviation above or below the mean)
- about 95% of all scores will fall within 2 standard deviations of the mean
- and 99.7% or nearly all scores will fall within 3 standard deviations of the mean.
The Empirical Rule is also sometimes called the 68-95-99.7 rule, because it provides those percentages noted above. You will want to remember those values because we’ll see just how useful they are.
Let’s visualize what this means.
Here, we’ve got our normal distribution like we’ve seen before, but now we’ve added six equal sections marked off using vertical lines. Each of those sections represent the distance of one population standard deviation (little sigma, σ) away from a population mean, mu (Μ). In this illustration, we’ve got a more detailed breakdown of the same percentages we have listed up above, so let’s talk through how they match up.
The Empirical Rule tells us we should see that about 68% of all possible scores will fall within 1 standard deviation of the mean (either above it or below it); this matches up with the 34.1% plus 34.1% we see from -σ to σ in the two middle sections on the graph.
Next, moving out to two standard deviations away (all the space between -2σ to 2σ on the graph), the Empirical Rule said that should total up to about 95% of all scores. And if we sum up the four sections in that range of -2σ to 2σ, 13.6% + 34.1% + 34.1% + 13.6%, we indeed get a sum of 95.4%.
Last, if we add up everything between -3σ to 3σ, we have a total of 2.1% + 13.6% + 34.1% + 34.1% + 13.6% + 2.1% = 99.6% of all possible scores. This matches up with the “about 99.7%” we have listed in the Empirical Rule percentages above.
Notice that there are some scores indicated in the graph that fall outside of the range of -3σ to 3σ. These are very small percentages (0.1%), though, which tells us that there are very rarely any scores this far away from the mean in a normal distribution. It’s certainly possible, but the chances are very small.
Let’s think through some examples to make this more concrete and start seeing how cool and useful this rule is.
Suppose we are thinking about visiting the famous Old Faithful geyser in Yellowstone National Park, and we want to be sure that we catch the geyser erupting, which it does every 90 minutes, on average. Let’s assume a standard deviation of 15 minutes, and a normal distribution of times between eruptions.
What are the chances we’ll have to wait only an hour to see it? What about two hours? Is there any chance we have to wait three hours? Let’s think through this using the Empirical Rule.
First, let’s fill in the distribution of #s of minutes it takes the geyser to erupt, according to the Empirical Rule. We’ll put the mean of 90 minutes in the middle as Μ, then fill in each σ above and below that by adding or subtracting the corresponding number of standard deviations to Μ.
For example, for one standard deviation below the mean (Μ – σ), we’ll calculate 90 – 15 to get 75 minutes. We’ll repeat this for (Μ – 2σ), (Μ – 3σ), and then on the right side by adding (Μ + σ), (Μ + 2σ), and (Μ + 3σ).
Now we can answer those questions we had above:
1. What are the chances we’ll have to wait only an hour to see it? For this one, we want to know the chances that we’ll get a “score” of 60 minutes (1 hour) or less.
For this distribution, 60 minutes is two standard deviations below the mean. And, according to the Empirical Rule, only 2% of scores fall below that point.
So, we have only about a 2% chance of seeing the geyser erupt in an hour or less. Good thing the park has a restaurant and gift shop for us to check out! 🙃
2. What about two hours? Two hours = 120 minutes, and 120 minutes is 2 standard deviations above the mean.
And according to the Empirical Rule, there’s only about 2% of scores above that point, so the chances of being below that point (in other words, waiting 2 hours or less) equal about 98%.
Chances are very high (98%) that we won’t have to wait more than two hours to see the geyser erupt. This is helpful to know for planning our day around the rest of the park!
3. Is there any chance we have to wait three hours? Three hours = 180 minutes.
That would be “off the charts” here, well above the highest point we have mapped out at 135 minutes. So there’s a chance (under 0.1%) that we’d have to wait that long, but it’s very unlikely (again, under 0.1%).
Some More Practice with the Empirical Rule – and a Participation Grade
OK, let’s get in some more practice with this rule, and this will also earn you a participation grade. To get participation credit here, please submit your answers to the following three questions (only need to submit your final answers) using this form, also linked on our Moodle site. We’ll go over the solutions at the start of class Wednesday.
Based on reports a large health organization, let’s suppose the average female shoe size in the US is an 8.5 (with σ = 1), and for men, the average is 10.5 (also with σ of 1). And, shoe sizes for both groups are normally distributed.
Note that we’re actually dealing with two distributions here – sketch out one distribution for females’ shoe sizes, and a separate one for males’ shoe sizes.
1. What is the approximate chance that a randomly selected male will wear a shoe size of 10.5 or larger?
2. What is the approximate chance that a randomly selected female will wear a shoe size of 10.5 or larger?
3. Approximately what percentage of the population of females would we expect wear a shoe size between 8 and 9?
For participation credit: Submit your final answers to these three questions using this form (which is the same one linked on our Moodle site). We’ll go over the solutions at the start of class Wednesday.
The Empirical Rule (AKA the 68-95-99.7 Rule) is a really helpful rule of thumb for figuring out the likelihood of observing various scores in a normal distribution. We’ll practice with some other useful ways of applying this rule in our next class.