Statistics and probability definitions

This is a list of statistics and probability terms.

Data that we collect can be discrete meaning over integers or specific types, or continuous meaning on a number line.

sample size, $n$ : number of measurements taken

mean, $\mu$ : sum of all measurements divided by the sample size.

mode: most frequent discrete value(s) measured, has to appear more than once.

median, aka $Q2$ , the middle value when ranked, or their mean, if multiple middle values

quartiles: $Q1$ , $Q2$ , $Q3$ , the $25\%$ , $50\%$ , and $75\%$ marks when ordering values from least to greatest

inter-quartile range, IQR: $Q3$ - $Q1$

outlier: A value that is further than $1.5$ IQR away from the nearest quartile. Only extreme values are potential outliers. The presence of an outlier does not retroactively change other metrics such as median or inter-quartile range.

percentile: 100 equal-sized divisions where $1$ st percentile is higher than $1\%$ of the data, and $99$ th percentile is higher than $99\%$ of the data.

skew: left skew means the lower-end values are sparse and further away. Graphically left skew looks like a tail. Left skew also means some of the lower values may be outliers. The opposite is true for right skew.

standard deviation, $\sigma$ : “average” distance to the mean. Here “average” refers to the root mean square or quadratic mean.

variance, $\sigma^2$ : square of standard deviation

Technically speaking, $\mu$ , $\sigma$ , and $\sigma^2$ are population mean, standard deviation, and variance. The sample counterparts are $\bar x$ , $s$ , and $s^2$ .

Next : Sampling techniques