Data lists and frequency distributions

This looks at data presented in frequency tables with precise values (not intervals). We view enumerated (listed out) data as a special case where the frequency is 11.

Major re-write: 2026-03-24.

Contents

Frequency table data

Example: An IBDP Year 1 math class has 1717 students. Here is their grade distribution.

Grade, xix_i223344556677
Frequency, fif_i334466330011

Formulas

The formulas are not necessarily there for you to compute values by hand in Paper 1, but they may allow you to set up an equation solve for an unknown value with help of calculators in Paper 2. The purpose of the examples are to illustrate the formulas.

mean

xˉ=1ni=1nfixi\bar{x} = \frac1n\sum_{i=1}^n f_i x_i

For our example

xˉ=117(32+43+64+35+06+17)=6417=3.7647\begin{align*} \bar{x} &= \frac1{17}(3\cdot2+4\cdot3 + 6\cdot4 + 3\cdot5 + 0\cdot6 + 1\cdot7) \\ &= \frac{64}{17} = 3.7647\dots \qed \end{align*}

standard deviation and variance

SL students should be able to compute standard deviation and variance when given a GDC.


HL students need to know the following formulas.

Formula for variance:

σ2=1ni=1nfi(xixˉ)2\sigma^2 = \frac1n\sum_{i=1}^n f_i (x_i - \bar x)^2

but usually we use

σ2=1ni=1nfixi2(xˉ)2\sigma^2 = \frac1n\sum_{i=1}^n f_i x_i^2 - \left(\bar{x}\right)^2

Note that the (xˉ)2-\left(\bar x\right)^2 is outside the summation.

A proof for the special case of fi=1f_i = 1 is provided at Sigma notation, sequences, series, finance..

For our example

σ2=117(322+432+642+352+062+172)(6417)2=26817(6417)2=460289\begin{align*} \sigma^2 &= \frac1{17}\left(3\cdot2^2+4\cdot3^2 + 6\cdot4^2 + 3\cdot5^2 + 0\cdot6^2 + 1\cdot7^2\right) - \left(\frac{64}{17}\right)^2 \\ &= \frac{268}{17} - \left(\frac{64}{17}\right)^2 \\ &= \frac{460}{289} \qed \end{align*}

The standard deviation, σ\sigma in our case would be

σ=460289=46017=1.261624\sigma = \sqrt{\frac{460}{289}} = \frac{\sqrt{460}}{17} = 1.261624\dots \qed

Technology

In general we use lists and 1-variable statistics on our GDC to compute statistics. Provide a list of values xx followed by a list of ff frequencies.

In stat 1:Edit, enter the values in L1 and the frequencies in L2.

store values in L1 and L2
data entry on TI 84 Plus CE

In stat CALC, choose 1:1-Var Stats. L1 can be retrieved from 2nd 1.

1-Var Stats. List:L1. FreqList:L2. Calculate
1 var stats settings
1-Var Stats. Showing values for mean, sum of x, sum of x squared, Sx, σx, n, minX, Q1
first part of 1 var stats results
1-Var Stats. Showing Q1, Median, Q3, maxX
second part of 1 var stats results

To retrieve statistics such as σx, use vars 5:Statistics.

Vars. 5:Statistics
1-var and 2-var statistics variables
main screen. σx. 1.261624152
standard deviation

Do not use 2nd stat Math 7:stdDev(, as that is for Sx.

Using List & Spreadsheets App, store xx list in A and ff list in B.

In menu 4:Statistics > 1:Stat Calculations > 1:One-Variable Statistics, set Num of lists to 1.

store values in a and b, then set num of lists to 1
set num of lists to 1

Set X1 list, Frequency list, 1st Result Column. Use square brackets from ctrl (

One-Variable Statistics popup. X1 List: a[]. Frequency List: b[]. Category List: [blank]. Include Categories: [blank]. 1st Result Column: c[].
set X1 list, Frequency list, 1st Result Column
Results showing mean x, sum x, sum x squared, sx
the results are shown and saved in variable "Stat."
Results showing sx, σx, n, MinX, Q1X
standard deviation is 1.2616
Results showing Q1X, MedianX, Q3X, MaxX, SSX
rest of results

To use any of these values eg standard deviation, use var and find stat.σx.

retrieve stat.σx
Stat.σx

Tip: You can also just ctrl C, ctrl V. I like to ctrl var the copied value in a variable before using.

In Statistics (STAT) App, enter xx values in List 1, and ff values in List 2.

store values List 1, and frequencies in List 2

In CALC, SET add the suitable lists. To enter list 1, press LIST and enter 1.

1Var XList: List1. 1Var Freq: List2

Exit and use 1-VAR to see the statistics.

1-Variable Results for mean of x, sum of x, sum of x squared, σx, sx, n.
1-variable statistics, first part
1-Variable results for n, minX, Q1, median, Q3, maxX
rest of results showing quartiles

To use one of the values, in VARS, STAT, X and choose for instance σx.

standard deviation
σx

In Statistics 1Var app, enter the xx values in D1 and ff values in D2.

Statistics 1Var Numeric View. Values in D1. Frequencies in D2
data entry in D1 and D2

In Symb ✗ , H1, enter D1 and D2 together in that row. They are available from Column button on screen.

Statistics 1Var Symbolic View. H1 checked. H1: D1 D2
Enter both D1 and D2 when asked for H1 in Symbolic View

Go back to Num ⊞ , press Stats to view the statistics.

Statistics 1Var Numeric View. Results showing n, quartiles, sum of x, sum of x squared, mean of x, sx.
Statistics 1Var Numeric View, showing the statistics
Statistics 1Var Numeric View. Results quartiles, sum of x, sum of x squared, mean of x, sx, σx, serrX, ssX
Statistics 1Var Numeric View, showing the last part

Press OK to go back. To use a particular statistics, you can use Vars App Statistics 1Var Results and choose for example σx.

Statistics 1Var. Menus showing how to access σx
Retrieving σx for use in a calculation
Statistics 1Var. σx. 1.2616...
σx

Tip: You can also just Shift ⧉View , Shift Menu . I like to the copied value in a variable before using.

In Statistics app, enter values in V1 and frequencies in N1.

Data tab highlights, shows values in column V1 and frequenceis in N1
entering the frequency table in NumWorks

Use arrow keys to the Stats tab to see the statistics.

Stats tab. Showing n, quartiles, range, IQR.
first half of the statistics
Stats tab. Showing mean, population standard deviation and other statistics
second half of the statistics

To use a statistic, copy-paste using shift var, shift . I like to shift the copied value in a variable before using.


In addition to mean , standard deviation σx, and quartiles, notice the relevance of 6464 as Σx, and 268268 as Σx² in the above calculations.

Median

Sort the list.

For an odd nn number of data, the median is the value at the n+12\displaystyle \frac{n+1}{2}th position.

For an even nn number of data, the median is the mean between the n2\displaystyle \frac{n}{2}th and n2+1\displaystyle \frac{n}{2}+1st position.

Calculating medians by hand for frequency tables is required for SL and HL.

Quartiles

Q1, or 25th percentile, is greater than 25% of the data.

Median, or Q2, or 50th percentile, is greater than 50% of the data.

Q3, or 75th percentile, is greater than 75% of the data.

Q1 and Q3 should be found using 1-variable statistics on the calculator, as shown above.

Interquartile range and outliers

IQR=Q3Q1\text{IQR} = Q3 - Q1

Outliers lies more than 1.5×1.5 \times the interquartile range away from the nearest quartile.

Outliers don’t necessarily need to be thrown away. In contrast, they can be central in many scientific studies. They could mean measuring a value that has a very small chance of occurring, observing an effect that only impacts extreme conditions, in addition to indicating a potential mistake.

Presence of an outlier does not require recomputing any statistical value or measure, unless otherwise specified in the question.

For our example, IQR=Q3Q1=1.5\text{IQR} = Q3-Q1=1.5. We want to check if 77 is an outlier. It is closest to Q3=4.5Q3=4.5.

4.5+1.5(1.5)=6.7574.5 + 1.5(1.5) = 6.75 \leq 7

Therefore, 77 is an outlier.

Box and whisker diagram

The boxes are drawn from Q1 to Q2, and Q2 to Q3. Lines extend from the minimum to Q1, and from Q3 to the max. Outliers are shown as a cross.

box and whisker diagram with outlier
the box and whisker diagram shows minimum 2, Q1 = 3, Median = 4, Q3 = 4.5, a non-outlier maximum at 5, and an outlier at 7

Here in our example, the box and whisker diagram has a box from Q1Q1 to Q2Q2 and from Q2Q2 to Q3Q3. There are lines (whiskers) from minimum to Q1Q1 and from Q3Q3 to maximum. There is an outlier at 77 marked with a cross.

Outliers only affect the min and max values drawn, and not the boxes.

Mode and mean cannot be determined from a box and whisker plot; instead they require access to the original, individual values.