Statistics and probability definitions

This is a list of statistics and probability terms.

Data that we collect can be discrete meaning over integers or specific types, or continuous meaning on a number line.

sample size, nn: number of measurements taken

mean, μ\mu: sum of all measurements divided by the sample size.

mode: most frequent discrete value(s) measured, has to appear more than once.

median, aka Q2Q2, the middle value when ranked, or their mean, if multiple middle values

quartiles: Q1Q1, Q2Q2, Q3Q3, the 25%25\%, 50%50\%, and 75%75\% marks when ordering values from least to greatest

inter-quartile range, IQR: Q3Q3 - Q1Q1

outlier: A value that is further than 1.51.5 IQR away from the nearest quartile. Only extreme values are potential outliers. The presence of an outlier does not retroactively change other metrics such as median or inter-quartile range.

percentile: 100 equal-sized divisions where 11st percentile is higher than 1%1\% of the data, and 9999th percentile is higher than 99%99\% of the data.

skew: left skew means the lower-end values are sparse and further away. Graphically left skew looks like a tail. Left skew also means some of the lower values may be outliers. The opposite is true for right skew.

standard deviation, σ\sigma: “average” distance to the mean. Here “average” refers to the root mean square or quadratic mean.

variance, σ2\sigma^2: square of standard deviation

Technically speaking, μ\mu, σ\sigma, and σ2\sigma^2 are population mean, standard deviation, and variance. The sample counterparts are xˉ\bar x, ss, and s2s^2.