I have used Small_Car_Data’ dataset. The data contains 11 columns comprising of both categorical and numerical variables. Then I have removed the header of the attached Samll_Car_Data.csv file and then imported it into Spark. Randomly selected 10% of you data for testing and use remaining data for training. Looked initially at horsepower and displacement. Treat displacement as a feature and horsepower as the target variable. Used MLlib linear regression to identify the model for the relationship and used test data to illustrated accuracy of your ability to predict the relationship. Vector Assembler function which is available in the Pyspark ML Feature library to convert the feature ‘Displacement’ and target ‘Horsepower’ in Vector form so that we can use MLib to create a model from it. and got to kno from the scatter plot between the Horsepower and Displacement that the Actual values are really close to the Predicted Values and the Regression Line is a good predictor for the Horsepower of the car. Then I have treated cylinders, displacement, manufacturer, model year, origin and weight as features and used linear regression to predict two target variable: horsepower and acceleration. Here some of the variable are categorical variables.and for the Linear Regression Model only handle numerical variables, Hence encoded the categorical variables using ordinary encoding method(StringIndexer). ‘Model’ and ‘Manufacturer’ that used as features for the prediction of our Target variables. And after that analyzed that which of target variables is easier to predict, in the sense that predicted values differ less from the original values.
-
Notifications
You must be signed in to change notification settings - Fork 0
pratik412/python_with_pypark
Folders and files
| Name | Name | Last commit message | Last commit date | |
|---|---|---|---|---|
Repository files navigation
About
In this assignment I have used car_data, and analyzed that which of target variables is easier to predict, in the sense that predicted values differ less from the original values.
Resources
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published