Correlation and regression

Loosely speaking, regression is a model and correlation is about the strength of the model.

Contents

Linear regression

In yy on xx regression (y=ax+by = ax + b), we assume xx is very certain and known, and we are looking to predict yy.

In xx on yy regression (x=ay+bx = ay + b), we assume the opposite, and tries to predict xx using yy that we trust or is given.

The student should make a judgement on which one is the known value and which one is the unknown.

On some calculators, only yy on xx regression is supported. In such cases, for xx on yy regression, swap xx and yy for the calculator inputs, then for final answer, write it as x=ay+bx = ay + b.

Both yy on xx and xx on yy regression pass through the point (xˉ,yˉ)(\bar{x}, \bar{y}), ie the average xx and average yy value.

See an example on the TI-84 Plus.

Pearson’s correlation coefficient, rr

This coefficient only evaluates a linear regression or linear model.

The coefficient takes on values between 1-1 and 11. Values near 00 means no linear correlation. A value near 11 means there is a strong indication that yy increases linearly as xx. While a value near 1-1 means yy decreases linearly as xx increases.

If the absolute value of the coefficient is above a critical value, the correlation is statistically significant. The critical value depends on the number of data points, and will be given on exams; otherwise it would require a tt-test, which is beyond the syllabus.

The rr between (x,y)(x, y) is same as that of (ax+b,cx+d)(ax + b, cx + d), meaning that it is unaffected under linear transformations.

The changes upon the regression line itself, however, can be investigated using function transformations.