Analysis of Boston data set – sklearn.linear_model.LinearRegression – fit, predict, score – StatsModels • STEPS- (1) Find the best model with <= 3 predictors in terms of RMSE to predict medv in the whole Boston data set •
- Loading of boston dataset
- Checking boston housing stats
- Spliting the data in training and testing data
- Model prediction and RMSE verification
- Finding correlation among the features and the price
- Eliminating the features with higher p values
- Finallizing 3 most enfluencing predictors- RM, PTRATIO, LSTAT
- Creating new model with these predictors
- Spliting the data in trained and test sets
- Normalizing and scaling the data
- Fitting the model
- Predicting the test set results
- Verifying the R2 and RMSE
(2) Find the best mode with <= 3 predictors including log, square, cubic transformation in the whole data set . STEPS-
- Loading the boston dataset
- Spliting the data
- Model prediction and RMSE, Rsq verification
- RMSE , Rsq verification with 3 most correlated predictors
- Building interaction model and verifying RMSE
- Building a model with square polynomial transformation
- Building a modelwith log beta
- Building a model with square root transformations
- Comparing the models in terms of Rsq and RMSE.