News

New paper! in the American Naturalist

Tuesday, October 6, 2020

Non Linear Regression





#!/usr/bin/env python

"""
Codes from "COGNITIVE CLASS.ai - Non Linear Regression Analysis", author: Saeed Aghabozorgi
If the data shows a curvy trend, then linear regression will not produce very accurate results when compared to a non-linear regression because, linear regression presumes that the data is linear.
"""

import numpy as np
import matplotlib.pyplot as plt
import pandas as pd

##### Data #####
df = pd.read_csv("data.csv")
x_data = df.X # explanatory variable
xmin = min(x_data)
xmax = max(x_data)
y_data = df.Y # target variable

#### Step 1: Plot data ####
# Look at the data plotted to choose a model with non-linear function.
# Here, suppose the data looks to follow a logistic (sigmoidal) function.


##### Step 2: Choosing a model #####
# Sigmoid function as a model
# The task here is to find the best parameters (beta_1, beta_2) for the model.
def sigmoid(x, Beta_1, Beta_2):
	y = 1 / (1 + np.exp(-Beta_1*(x-Beta_2)))
	return y

##### Step 3: Normalize our data #####
xdata = x_data/max(x_data)
ydata = y_data/max(y_data)


##### Step 4: Fitting #####
from scipy.optimze import curve_fit
popt, pcov = curve_fit(sigmoid, xdata, ydata) # popt are our optimized parameters
print("beta_1 = %f, beta_2 = %f" % (popt[0], popt[1]))


##### Step 5: Plot resulting regression model #####
x = np.linspace(xmin,xmax, 100)
x = x/max(x) # Normalization
y = sigmoid(x, *popt)
plt.plot(xdata, ydata, 'ro', label='data')
plt.plot(x,y, linewidth=3, label='fit')
plt.legend(loc='best')
plt.show()


##### Step 6: Calculate the accuracy of the model #####
# split the data into train/test
msk = np.random.rand(len(df)) < 0.8
train_x = xdata[msk]
test_x = xdata[~msk]
train_y = ydata[msk]
test_y = ydata[~msk]

# build the model using train set
popt, pcov = curve_fit(sigmoid, train_x, train_y)

# predict using test set
y_hat = sigmoid(test_x, *popt)

# evaluation
from sklearn.metrics import r2_score
print("Mean absolute error: %.2f" % np.mean(np.absolute(y_hat - test_y)))
print("Residual sum of squares (MSE): %.2f" % np.mean((y_hat - test_y) ** 2))
print("R2-score: %.2f" % r2_score(y_hat, test_y))