# Linear models

Linear models are crucial to modern data analysis. All the following pages of this tutorial rely on some variation of a linear model.

Back in 1886 Francis Galton1 Galton (1886) Regression Towards Mediocrity in Hereditary Stature. The Journal of the Anthropological Institute of Great Britain and Ireland, Vol. 15, 246-263 found a relationship between the heights of parents and their children and introduced the term “regression”; we’ll look at his analysis on the next page. Ronald Fisher developed analysis of variance (ANOVA) and promoted it in his 1925 text book2Fisher, R.A. (1925) Statistical methods for research workers, Oliver and Boyd, Edinburgh; ANOVA compares values of a continuous variable across different categories.

Regression and ANOVA are types of linear models: in regression, the predictors are continuous variables (eg, parent’s height) while in ANOVA they are categorical (eg, variety of oats). But the machinery of analysis is the same, and it is easy to combine continuous and categorical predictors in the same model.

The key feature of a linear model is that the response depends on the sum of an intercept and one or more terms, each consisting of a predictor multiplied by a coefficient. $${\rm response} \sim \beta_0 + \beta_1 x_1 + \beta_2 x_2 + \beta_3 x_3 …$$

Notice the term “predictor” here, rather than “covariate”. A covariate is a “raw” variable that appears in the data, while a predictor has usually been processed in some way. Predictors are commonly formed by centering the covariate, subtracting the mean. Predictors can also be formed by squaring the covariate; then we have a relationship between the response and the covariate which is not a straight line. Thus a “linear model” can model a non-linear relationship. An interaction among covariates can be modeled by using a predictor formed by multiplying the covariates.

“Generalised linear models” (GLMs) allow us to apply the same linear model logic to responses which are not continuous measures, in particular binary or count data, which are so common in ecology.

While linear models in their various forms are widely used in ecology, they are not the only kind, and for some applications, not the best kind. For example the best model for wolf kill rates on Isle Royale3Vucetich, J.A., Peterson, R.O., & Schaefer, C.L. (2002) The effect of prey and predator densities on wolf predation. Ecology, 83, 3003-3013 was $$\frac{aN}{(P + ahN)}$$ where $P$ is the number of wolves, $N$ the number of moose, and $a$ and $h$ are parameters to be estimated from the data.

In the next two pages we will look at a simple regression example and then logistic regression, a type of GLM for binary data.