Problem statement:
Consider the given databases, in which Y attribute is linearly dependent on attribute X. Write down python code (without using library functions) for univariate linear regression to determine the relationship between the independent variable and dependent variable for following cases. For all the cases, plot the results.
a) In dataset 'test_a.csv', some of the Y values are missing, replace them with (a) mean, (b) median and perform regression analysis and comment.
b) In dataset 'test_b.csv', some of the Y values are negative, perform regression analysis and comment.
c) Perform the regression analysis on dataset 'test_c.csv', plot the results. Detect and remove the outliers, perform regression analysis and comment.
import pandas as pd
df = pd.read_csv('mtcars.csv')
df1 = pd.read_csv('churn.csv')
plt.plot(df['am'], df['carbs'])
plt.title('Transmission vs Number of carburetors')
plt.xlabel('Transmission')
plt.ylabel('Number of carburetors')
plt.show()
#1
df1[df1['Tenure']>50 & df1['gender'] == 'female']
#2
df1[df1['SeniorCitizen']==0 & df1['gender'] == 'male']
#3
df1[df1['TechSupport']=='yes' & df1['Churn'] == 'no']
#4
df1[df1['Contract type']=='Month-to-month' & df1['Churn'] == 'Yes']
Comments
Leave a comment