python_with_pyspark

I have used Small_Car_Data’ dataset. The data contains 11 columns comprising of both categorical and numerical variables. Then I have removed the header of the attached Samll_Car_Data.csv file and then imported it into Spark. Randomly selected 10% of you data for testing and use remaining data for training. Looked initially at horsepower and displacement. Treat displacement as a feature and horsepower as the target variable. Used MLlib linear regression to identify the model for the relationship and used test data to illustrated accuracy of your ability to predict the relationship. Vector Assembler function which is available in the Pyspark ML Feature library to convert the feature ‘Displacement’ and target ‘Horsepower’ in Vector form so that we can use MLib to create a model from it. and got to kno from the scatter plot between the Horsepower and Displacement that the Actual values are really close to the Predicted Values and the Regression Line is a good predictor for the Horsepower of the car. Then I have treated cylinders, displacement, manufacturer, model year, origin and weight as features and used linear regression to predict two target variable: horsepower and acceleration. Here some of the variable are categorical variables.and for the Linear Regression Model only handle numerical variables, Hence encoded the categorical variables using ordinary encoding method(StringIndexer). ‘Model’ and ‘Manufacturer’ that used as features for the prediction of our Target variables. And after that analyzed that which of target variables is easier to predict, in the sense that predicted values differ less from the original values.

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
README.md		README.md
car.ipynb		car.ipynb
car_data.docx		car_data.docx

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

python_with_pyspark

About

Uh oh!

Releases

Packages

Languages

pratik412/python_with_pypark

Folders and files

Latest commit

History

Repository files navigation

python_with_pyspark

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages