News

New paper!

Wednesday, December 16, 2020

How to calculate F-ratio in one-way ANOVA



[wikipedia]
Null hypothesis: samples in all groups are drawn from populations with the same mean values.

Assumptions that need to be met in ANOVA are:
Response variable residuals are normally distributed (or approximately normally distributed).
Variances of populations are equal.
Responses for a given group are independent and identically distributed normal random variables (not a simple random sample (SRS)).
import pandas as pd

data = pd.DataFrame #
X = ['feature1', 'feature2', ...]
GM = data.mean()[X] # overall mean

# total variation
SST = 0
for i in data[X].values:
	SST += (i - GM)**2
print('total variation\n',SST)

# between-group variation
SSB = 0
for group in set(data.GroupingColumnName):
	d = data.loc[data.GroupingColumnName == group]
	n = len(d)
	SSB += n * (d.mean()[X] - GM)**2
print('between-group variation\n',SSB)

# within-group variation
SSW = SST - SSB
print('within-group variation\n',SSW)

N = len(data)
k = len(set(data.GroupingColumnName))

MSB = SSB / (k-1)
MSW = SSW / (N-k)
print('F-value\n',MSB/MSW)

# degrees of freedom to calculate p-value in this ANOVA: numerator (k-1), denominator (N-k).
# GroupingColumnName: the name of column in the dataframe that categorize data into groups of interest.