Grouped data

Here data are grouped into “classes”, which are often equal-sized intervals. Instead of seeing the original values, you would be given frequencies or number of values described by a class.

Contents

Classes (intervals)

Classes on tests will be equally spaced. For calculations, each class will be represented by its median.

For continuous data, the median is the mean between the start of this class and the start of next class.

For discrete data, the median, is the mean between the start of this class and the end of this class.

Consider the intervals

20<x30,    30<x40,    40<x5020 < x \leq 30,\;\; 30 < x \leq 40,\;\; 40 < x \leq 50

For continuous data, the mid-interval values are 2525, 3535, and 4545.

For discrete data, say only integers, the mid-interval values are 21+302=25.5\frac{21 + 30}{2} = 25.5, 35.535.5, and 45.545.5.


For another set of interval classes,

15x<25,    25x<35,    35x<4515 \leq x < 25,\;\; 25 \leq x < 35,\;\; 35 \leq x < 45

For continuous data, the mid-interval values are 2020, 3030, and 4040.

For whole number data, the mid-interval values are 15+242=19.5\frac{15 + 24}{2} = 19.5, 29.529.5 and 39.539.5.

Frequency tables and histograms

The mean is estimated using the mid-interval values and frequencies. This is usually done on the calculator.

The modal class is the one with the highest frequency.

The formulas are estimates.

mean (μ\mu)

xˉ2=1nxfxx{\bar x}^2 = \frac 1n \sum_{x} f_xx

where fxf_x is the frequency for the corresponding min-interval values xx, and

n=xfxn = \sum_x f_x

HL: variance (σ2\sigma^2) and standard deviation (σ\sigma) by hand

The formula for variance, with frequencies ff, is

s2=1nxfx(xxˉ)2=1nxfxx2xˉ2s^2 = \frac 1n \sum_{x} f_x(x - \bar x)^2 = \frac 1n \sum_{x} f_xx^2 - {\bar x}^2

Note that the xˉ2- {\bar x}^2 is outside the summation. Taking the positive square root of both sides is the formula for standard deviation.

Cumulative frequency diagrams

In cumulative frequency diagrams, the yy-value is the sum of frequencies up to each xx-value.

When constructing cfd from frequency tables or histograms, At the upper bound of each class, the frequency is added to the existing total (cumulative) frequency. Connect the dots using straight lines.

For quartile (and median) calculations, first find the total number of frequencies (data points). This may or may not be given in the question.

From xx values of 25%25\%, 50%50\%, 75%75\% times the total, read off the yy values of Q1Q1, Q2Q2(median), and Q3Q3.

For mean, the diagram should be subdivided into equal classes and enter class mid-interval values and frequencies into calculator. The question should suggest a class size.

Calculator

Group data are still one-variable statistics. Calculator should provide an option to enter frequencies.

Be able to calculate all the statistics as you would for a data list. Interpreting statistics on TI-84 Plus (example using discrete random variables), where frequencies are analogous to probabilities.