Normal distributions

This is about probability of measuring some continuous quantity, such as time, length, and mass, to be within some interval.

Contents

When to Use

If it exhibits the following symptoms, normal distribution may be for you

  • a single measurement or occurrence
  • no reference to number of tries or measurements
  • measurement is continuous (time, length, mass etc)

Normal distribution questions often state the use of the normal distribution. However, it is often the case that a question starts out normal, only to shift into binomial distribution regarding kk successes out of nn tries.

Definition

A normally distributed random variable XX is notated as

XN(μ,σ2)X \sim \text{N}(\mu, \sigma^2)

with expected value (or mean) μ\mu and standard deviation σ>0\sigma > 0 or variance σ2\sigma^2.

Normal distribution involves a continuous distribution, meaning the probability can only be calculated over an interval. The probability of any particular value is 00.

P(X=a)=0\text{P}(X = a) = 0
P(aXb)=P(a<X<b)\text{P}(a \leq X \leq b) = \text{P}(a < X < b)

This is done on the calculator. For probability where the interval involves -\infty or \infty, use 1000-1000 or 10001000, or any arbitrary values more than 44 standard deviations away from the mean in the correct direction.

To find the endpoint xx such that P(X<x)=0.4\text{P}(X < x) = 0.4, take the inverse normal using p=0.4p=0.4 and the given mean and standard deviation. Calculators often use standard deviation, while the mathematical definition uses variance.

For values within distance dd of kk, you may see

P(Xk<d)=P(kd<X<k+d)\text{P}(\lvert X - k \rvert < d) = \text{P}(k - d < X < k + d)

For values outside of distance dd of kk, you may see

P(Xk>d)=P(X<kd)+P(X>k+d)\text{P}(\lvert X - k \rvert > d) = \text{P}(X < k - d) + \text{P}(X > k + d)

Mean = Median = Mode

For a normal distribution, μ\mu is the mean, the median, and the mode. This means

P(X<μ)=P(X>μ)=0.5\text{P}(X < \mu) = \text{P}(X > \mu) = 0.5

This also means that normal distribution is symmetric about the mean.

Other properties based on the symmetry include, for some 0<j<k0 < j < k

P(μk<X<μ)=P(μ<X<μ+k)=0.5P(X<μk)\text{P}(\mu - k < X < \mu) = \text{P}(\mu < X < \mu + k)= 0.5 - \text{P}(X < \mu - k)
P(μk<X<μj)=P(μ+j<X<μ+k)\text{P}(\mu - k < X < \mu - j) = \text{P}(\mu + j < X < \mu + k)

Empirical rule and zz-score

The empirical rule, or 689599.768-95-99.7 rule, states that, for a normal distribution, regardless of the values of μ\mu or σ\sigma

P(μσ<X<μ+σ)0.6827\text{P}(\mu - \sigma < X < \mu + \sigma) \approx 0.6827
P(μ2σ<X<μ+2σ)0.9545\text{P}(\mu - 2\sigma < X < \mu + 2\sigma) \approx 0.9545
P(μ3σ<X<μ+3σ)0.9973\text{P}(\mu - 3\sigma < X < \mu + 3\sigma) \approx 0.9973

For a single measurement, there is a 68%68\% chance it is within 11 standard deviation of the mean, 95%95\% chance of within 2σ2\sigma, and 99.7%99.7\% chance of within 3σ3\sigma.

The above are generally not true for other types of distributions.

This prompts the definition of the zz-score

z=xμσz = \frac{x - \mu}{\sigma}

The zz-score is the number of standard deviations that a value xx is more than the mean. Negative zz-scores mean the value is that many standard deviations below the mean.

Then all calculations can simply refer to the standardized normal distribution N(0,1)\text{N}(0, 1).

zz-scores allow for solving for μ\mu or σ\sigma when given the probability.

A sketch of the derivation is provided in the discussion on definite integrals.

Examples

On TI-84 Plus and TI-Nspire, the inverse normal function invNorm returns xx, such that P(X<x)=p\text{P}(X < x) = p for some known distribution and probability pp. On Casio and other brands, there is often an option to solve P(X>x)=p\text{P}(X > x) = p as well. We shall assume we can only solve the first case on the calculator.

Example: Let XN(50,16)X \sim \text{N}(50, 16), find xx such that P(40<X<x)=0.200\text{P}(40 < X < x) = 0.200.


The normal distribution has mean 5050 and standard deviation 44. We have

P(X<40)+P(40<X<x)=P(X<x)\text{P}(X < 40) + \text{P}(40 < X < x) = \text{P}(X < x)

Then,

P(40<X<x)=0.200P(X<x)P(X<0.40)=0.200P(X<x)0.00621=0.200P(X<x)=0.20621x46.7\begin{align*} \text{P}(40 < X < x) &= 0.200 \\ \text{P}(X < x) - \text{P}(X < 0.40) &= 0.200 \\ \text{P}(X < x) - 0.00621\dots &= 0.200 \\ \text{P}(X < x) &= 0.20621\dots \\ x &\approx 46.7 \qed \end{align*}

Example: The percentage by volume of potato chips in a bag is normally distributed. The company wants 97%97\% of bags to contain at least 65%65\% chips, and 1%1\% of bags to contain at least 70%70\% chips. To three significant figures, find the mean and standard deviation to pack the least amount of chips.


Let YY be the percentage of chips in the bag.

At the minimum amount of chips packed,

P(Y>65)=0.97 and P(Y>70)=0.01\text{P}(Y > 65) = 0.97 \text{ and } {\text{P}(Y > 70) = 0.01}

The endpoints of the standardized normal distribution with left-tail probabilities 0.030.03 and 0.990.99 are 1.88079-1.88079 and 2.326352.32635, respectively.

Using zz-scores,

65μσ=1.8807970μσ=2.32635\begin{align*} \frac{65 - \mu}{\sigma} &= -1.88079\dots \\ \frac{70 - \mu}{\sigma} &= 2.32635\dots \end{align*}

Most calculators require the equations in the form

1.88079σ+μ=652.32635σ+μ=70\begin{align*} -1.88079\sigma + \mu &= 65 \\ 2.32635\sigma + \mu &= 70 \end{align*}

before finding μ=67.2,σ=1.19\mu = 67.2, \sigma = 1.19 \qed

The bags should be filled with mean 67.2%67.2\% and standard deviation 1.19%1.19\%.

Tips

  • Draw a bell curve. Label the mean and the desired interval(s).
  • Which way is the inequality sign?
  • Does it want probability of being inside or outside some interval?
  • Does it want a minimum or a maximum?
  • Math notation uses variance but calculators typically use standard deviation.
  • Write down the calculator command you used, so you can easily check your work later.
  • The intersection of two events is their area(s) of overlap. Conditional probability is a ratio (fraction) of areas. See also probability formulas.