Normal distributions

This is about probability of measuring some continuous quantity, such as time, length, and mass, to be within some interval.

When to Use
Definition
Mean = Median = Mode
Empirical rule and $z$ -score
Examples
Tips

When to Use

If it exhibits the following symptoms, normal distribution may be for you

a single measurement or occurrence
no reference to number of tries or measurements
measurement is continuous (time, length, mass etc)

Normal distribution questions often state the use of the normal distribution. However, it is often the case that a question starts out normal, only to shift into binomial distribution regarding $k$ successes out of $n$ tries.

Definition

A normally distributed random variable $X$ is notated as

X \sim \text{N}(\mu, \sigma^2)

with expected value (or mean) $\mu$ and standard deviation $\sigma > 0$ or variance $\sigma^2$ .

Normal distribution involves a continuous distribution, meaning the probability can only be calculated over an interval. The probability of any particular value is $0$ .

\text{P}(X = a) = 0

\text{P}(a \leq X \leq b) = \text{P}(a < X < b)

This is done on the calculator. For probability where the interval involves $-\infty$ or $\infty$ , use $-1000$ or $1000$ , or any arbitrary values more than $4$ standard deviations away from the mean in the correct direction.

To find the endpoint $x$ such that $\text{P}(X < x) = 0.4$ , take the inverse normal using $p=0.4$ and the given mean and standard deviation. Calculators often use standard deviation, while the mathematical definition uses variance.

For values within distance $d$ of $k$ , you may see

\text{P}(\lvert X - k \rvert < d) = \text{P}(k - d < X < k + d)

For values outside of distance $d$ of $k$ , you may see

\begin{align*} &\,\text{P}(\lvert X - k \rvert > d) \\ =&\, \text{P}(X < k - d) + \text{P}(X > k + d) \end{align*}

Mean = Median = Mode

For a normal distribution, $\mu$ is the mean, the median, and the mode. This means

\text{P}(X < \mu) = \text{P}(X > \mu) = 0.5

This also means that normal distribution is symmetric about the mean.

Other properties based on the symmetry include, for some $0 < j < k$

\begin{align*} \text{P}(\mu - k < X < \mu) &= \text{P}(\mu < X < \mu + k) \\ &= 0.5 - \text{P}(X < \mu - k) \end{align*}

\begin{align*} &\,\text{P}(\mu - k < X < \mu - j) \\ =&\, \text{P}(\mu + j < X < \mu + k) \end{align*}

Empirical rule and $z$ -score

The empirical rule, or $68-95-99.7$ rule, states that, for a normal distribution, regardless of the values of $\mu$ or $\sigma$

\text{P}(\mu - \sigma < X < \mu + \sigma) \approx 0.6827

\text{P}(\mu - 2\sigma < X < \mu + 2\sigma) \approx 0.9545

\text{P}(\mu - 3\sigma < X < \mu + 3\sigma) \approx 0.9973

For a single measurement, there is a $68\%$ chance it is within $1$ standard deviation of the mean, $95\%$ chance of within $2\sigma$ , and $99.7\%$ chance of within $3\sigma$ .

The above are generally not true for other types of distributions.

This prompts the definition of the $z$ -score

z = \frac{x - \mu}{\sigma}

The $z$ -score is the number of standard deviations that a value $x$ is more than the mean. Negative $z$ -scores mean the value is that many standard deviations below the mean.

Then all calculations can simply refer to the standardized normal distribution $\text{N}(0, 1)$ .

$z$ -scores allow for solving for $\mu$ or $\sigma$ when given the probability.

A sketch of the derivation is provided in the discussion on definite integrals.

Examples

On TI-84 Plus and TI-Nspire, the inverse normal function invNorm returns $x$ , such that $\text{P}(X < x) = p$ for some known distribution and probability $p$ . On Casio and other brands, there is often an option to solve $\text{P}(X > x) = p$ as well. We shall assume we can only solve the first case on the calculator.

Example: Let $X \sim \text{N}(50, 16)$ , find $x$ such that $\text{P}(40 < X < x) = 0.200$ .

The normal distribution has mean $50$ and standard deviation $4$ . We have

\begin{align*} &\,\text{P}(X < 40) + \text{P}(40 < X < x) \\ =&\, \text{P}(X < x) \end{align*}

Then,

\begin{align*} \text{P}(40 < X < x) &= 0.200 \\ \text{P}(X < x) - \text{P}(X < 0.40) &= 0.200 \\ \text{P}(X < x) - 0.00621\dots &= 0.200 \\ \text{P}(X < x) &= 0.20621\dots \\ x &\approx 46.7 \qed \end{align*}

Example: The percentage by volume of potato chips in a bag is normally distributed. The company wants $97\%$ of bags to contain at least $65\%$ chips, and $1\%$ of bags to contain at least $70\%$ chips. To three significant figures, find the mean and standard deviation to pack the least amount of chips.

Let $Y$ be the percentage of chips in the bag.

At the minimum amount of chips packed,

\text{P}(Y > 65) = 0.97 \text{ and } {\text{P}(Y > 70) = 0.01}

The endpoints of the standardized normal distribution with left-tail probabilities $0.03$ and $0.99$ are $-1.88079$ and $2.32635$ , respectively.

Using $z$ -scores,

\begin{align*} \frac{65 - \mu}{\sigma} &= -1.88079\dots \\ \frac{70 - \mu}{\sigma} &= 2.32635\dots \end{align*}

Most calculators require the equations in the form

\begin{align*} -1.88079\sigma + \mu &= 65 \\ 2.32635\sigma + \mu &= 70 \end{align*}

before finding $\mu = 67.2, \sigma = 1.19 \qed$

The bags should be filled with mean $67.2\%$ and standard deviation $1.19\%$ .

Tips

Draw a bell curve. Label the mean and the desired interval(s).
Which way is the inequality sign?
Does it want probability of being inside or outside some interval?
Does it want a minimum or a maximum?
Math notation uses variance but calculators typically use standard deviation.
Write down the calculator command you used, so you can easily check your work later.
The intersection of two events is their area(s) of overlap. Conditional probability is a ratio (fraction) of areas. See also probability formulas.

Previous : Binomial distributions

Next SL: [none]

Next HL: Continuous random variables (HL)