Correlation and regression

Loosely speaking, regression is a model and correlation is about the strength of the model.

Linear regression
Pearson’s correlation coefficient, $r$

Linear regression

In $y$ on $x$ regression ( $y = ax + b$ ), we assume $x$ is very certain and known, and we are looking to predict $y$ .

In $x$ on $y$ regression ( $x = ay + b$ ), we assume the opposite, and tries to predict $x$ using $y$ that we trust or is given.

The student should make a judgement on which one is the known value and which one is the unknown.

On some calculators, only $y$ on $x$ regression is supported. In such cases, for $x$ on $y$ regression, swap $x$ and $y$ for the calculator inputs, then for final answer, write it as $x = ay + b$ .

Both $y$ on $x$ and $x$ on $y$ regression pass through the point $(\bar{x}, \bar{y})$ , ie the average $x$ and average $y$ value.

See an example on the TI-84 Plus.

Pearson’s correlation coefficient, $r$

This coefficient only evaluates a linear regression or linear model.

The coefficient takes on values between $-1$ and $1$ . Values near $0$ means no linear correlation. A value near $1$ means there is a strong indication that $y$ increases linearly as $x$ . While a value near $-1$ means $y$ decreases linearly as $x$ increases.

If the absolute value of the coefficient is above a critical value, the correlation is statistically significant. The critical value depends on the number of data points, and will be given on exams; otherwise it would require a $t$ -test, which is beyond the syllabus.

The $r$ between $(x, y)$ is same as that of $(ax + b, cx + d)$ , meaning that it is unaffected under linear transformations.

The changes upon the regression line itself, however, can be investigated using function transformations.

Previous : Effects of changes on mean and standard deviation

Next : Venn diagrams

Correlation and regression

Contents

Linear regression

Pearson’s correlation coefficient, rrr

Pearson’s correlation coefficient, $r$