Grouped data

Here data are grouped into “classes”, which are often equal-sized intervals. Instead of seeing the original values, you would be given frequencies or number of values described by a class. Statistics such as mean and quartiles are estimates and not exact.

Contents

Overview

quantitywhat to usehow to estimate
meanmid-interval valuesby hand, by GDC
stdev, variancemid-interval valuesby hand (HL only), by GDC (SL and HL)
modal classintervalby hand, by GDC (if given cumulative frequency table)
medianmid-interval valuesby hand, by GDC
quartiles, IQRupper/right endpointsby GDC
percentilesupper/right endpointsfrom cumulative frequency diagrams

Classes (intervals)

Classes on tests will be equally spaced. For calculations, each class will be represented by its median.

For continuous data, the median is the mean between the start of this class and the start of next class.

For discrete data, the median, is the mean between the start of this class and the end of this class.

Consider the intervals

20<x30,    30<x40,    40<x5020 < x \leq 30,\;\; 30 < x \leq 40,\;\; 40 < x \leq 50

For continuous data, the mid-interval values are 2525, 3535, and 4545.

For discrete data, say only integers, the mid-interval values are 21+302=25.5\frac{21 + 30}{2} = 25.5, 35.535.5, and 45.545.5.


For another set of interval classes,

15x<25,    25x<35,    35x<4515 \leq x < 25,\;\; 25 \leq x < 35,\;\; 35 \leq x < 45

For continuous data, the mid-interval values are 2020, 3030, and 4040.

For whole number data, the mid-interval values are 15+242=19.5\frac{15 + 24}{2} = 19.5, 29.529.5 and 39.539.5.

Frequency tables and histograms

The mean is estimated using the mid-interval values and frequencies. This is usually done on the calculator.

The modal class is the one with the highest frequency.

The formulas are estimates.

mean (μ\mu)

xˉ=1nxfxx\bar x = \frac 1n \sum_{x} f_xx

where fxf_x is the frequency for the corresponding min-interval values xx, and

n=xfxn = \sum_x f_x

HL: variance (σ2\sigma^2) and standard deviation (σ\sigma) by hand

The formula for variance, with frequencies ff, is

s2=1nxfx(xxˉ)2=1nxfxx2xˉ2s^2 = \frac 1n \sum_{x} f_x(x - \bar x)^2 = \frac 1n \sum_{x} f_xx^2 - {\bar x}^2

Note that the xˉ2- {\bar x}^2 is outside the summation. Taking the positive square root of both sides is the formula for standard deviation.

Cumulative frequency diagrams

In cumulative frequency diagrams, the yy-value is the sum of frequencies up to each xx-value.

When constructing cfd from frequency tables or histograms, At the upper bound of each class, the frequency is added to the existing total (cumulative) frequency. Connect the dots using straight lines.

For quartile (and median) calculations, first find the total number of frequencies (data points). This may or may not be given in the question.

From xx values of 25%25\%, 50%50\%, 75%75\% times the total, read off the yy values of Q1Q1, Q2Q2(median), and Q3Q3.

For mean, the diagram should be subdivided into equal classes and enter class mid-interval values and frequencies into calculator. The question should suggest a class size.

The x-values of the 25%, 50%, and 75% of maximum cumulative frequencies correspond to Q1, Q2, and Q3 of box and whiskers plot
Box and whiskers plot is basically cumulative frequency diagram projected down to 1-D.

Calculator

Group data are still one-variable statistics. Calculator should provide an option to enter frequencies.

Be able to calculate all the statistics as you would for a data list. Interpreting statistics on TI-84 Plus (example using discrete random variables), where frequencies are analogous to probabilities.