Computational Biology log: Linear regression (OLS) and p-value in Python (and F-test in multiple regression)

-----------------------------------------

import numpy as np

import pandas as pd

import statsmodels.api as sm

X = data['x1'] or X = np.column_stack((data['x1'], data['x2'], data['x3'])) for a multivariate case

Y = data['y']

X2 = xm.add_constant(X)

ext = sm.OLS(y, X2)

est2 = est.fit()

print(ext2.summary())

-----------------------------------------

which gives something like this:

T-test: for each coefficient

F-statistic (F-test of the overall significance)

"In multiple regression, since we are fitting many predictors, we need to consider a case where there are a lot of features. With a very large amount of predictors, there will always be about 5% of them that will have, by chance, a very small p-value even though they are not statistically significant. Therefore, we use the F-statistic to avoid considering unimportant predictors as significant predictors. " link

What does this F-test do? link

"The F-test of the overall significance is a specific form of the F-test. It compares a model with no predictors (intercept-only model) to the model that you specify."

That is,

"Null hypothesis: The fit of the intercept-only model and your model are equal.

Alternative hypothesis: The fit of the intercept-only model is significantly reduced compared to your model."

Computational Biology log

News

New paper!

Monday, September 6, 2021

Linear regression (OLS) and p-value in Python (and F-test in multiple regression)

free swimbi unregistered