News

New paper!

Thursday, August 26, 2021

GAM with mgcv in R

GAM (general additive model) is a non-linear regression model. It uses smooth terms as well as linear terms, which does the trick for non-linearity. Smooth terms are functions of variable, which takes a certain smooth function shape. Amazing thing is that gam function can find a good smooth without assuming a priori relationship between the variable and the response variable.

It's also called "additive" because smooth terms and linear terms will be summed up.

btw, bam is good for fitting a gam model to a large dataset with several tens of thousands of observations.

Basic code:
> library(mgcv)
> mod <- gam(response~s(x1)+s(x2)+... + l1 + l2, method="REML",data=data)

Or to estimate the log transformed response variable,
> mod <- gam(log(response)~s(x1)+s(x2)+...+ l1 + l2, method="REML",data=data)

If l1, l2, ... are factor (factorized categorical variable), then these terms will be considered as categorical terms instead of linear terms.
You can also use data$response, data$x1, etc by omitting data=data argument.

s() can takes two more main optional arguments:
- k: the number of basis functions to be used. This needs to be equal or less than the number of unique values in the data$x1.
- bs: specify the spline. default is "ts" (thin plate spline).

To check the output:
summary(mod): shows formula, information about coefficients and their significance.
mod$aic: AIC value of the model.
mod$coefficients: coefficients of the fitted model.
gam.check(mod): tells whether k values are adequate and shows plots to check model quality.