Skip to content

PamWood/HARCourseProject

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

HARCourseProject

Getting and Cleaning Data - HAR Course Project

I have one script.

The first part downloads the file to a directory I've created, unzips the file, and changes my working directory to that file. if(!file.exists("/Training/DataScienceClass/data/CourseProject")) { dir.create("/Training/DataScienceClass/data/CourseProject")} FilePath<-file.path("/Training/DataScienceClass/data/CourseProject", "HAR.zip") FiletoUnzip<-"https://d396qusza40orc.cloudfront.net/getdata%2Fprojectfiles%2FUCI%20HAR%20Dataset.zip" download.file(FiletoUnzip, FilePath) unzip(FilePath) setwd("/Training/DataScienceClass/data/CourseProject/UCI HAR Dataset")

The second part reads the "features" table.
I apply Step 4 here, removing all of the "()" and "-", etc using "gsub". Doing this effectively renames the features - I did not change the cases because the case sensitivity becomes important when regarding mean and std. features<-read.table("./features.txt") features$V3<-gsub("[^[:alnum:]]","", features$V2) features$V2<-NULL names(features)<-c("list","features")

The third part reads the "activity labels" table. All I do is rename the columns. activitylabels<-read.table("./activity_labels.txt") names(activitylabels)<-c("actlabel","activity")

The fourth part reads the "test" tables, renames ths columns, combines the "test" tables, then applies the "features" column of the "features" table as the column headers for the "Test" dataset SubTest<-read.table("./test/subject_test.txt", header=FALSE, sep="") names(SubTest)<-"subject" XTest<-read.table("./test/X_test.txt", header=FALSE, sep="") YTest<-read.table("./test/Y_test.txt", header=FALSE, sep="") names(YTest)<-"activity" Test<-cbind(SubTest,YTest,XTest) names(Test)[3:563]=features[,2]

The fifth part reads the "train" tables, renames ths columns, combines the "train" tables, then applies the "features" column of the "features" table as the column headers for the "Train" dataset SubTrain<-read.table("./train/subject_train.txt", header=FALSE, sep="") names(SubTrain)<-"subject" XTrain<-read.table("./train/X_train.txt", header=FALSE, sep="") YTrain<-read.table("./train/Y_train.txt", header=FALSE, sep="") names(YTrain)<-"activity" Train<-cbind(SubTrain,YTrain,XTrain) names(Train)[3:563]=features[,2]

The sixth part completes Step 1 and merges the training and test sets to create one data set. This part also completes Step 3 by replacing the "activity label" with the "activity name" DataSet<-rbind(Test,Train) DataSetAll<-merge(activitylabels,DataSet, by.x="actlabel", by.y="activity", all=TRUE) DataSetAll$actlabel<-NULL

The seventh part completes Step 2 and extracts Only Mean and Std for each measurement. I did this by using "grep" for the columns I was looking for, then combined the columns together. Here is why I left the "features" as Upper and Lower case because the upper case "Means" were not items we wanted in this set MeanData<-DataSetAll[grep("mean",names(DataSetAll))] StdData<-DataSetAll[grep("std",names(DataSetAll))] MeanStd<-cbind(MeanData,StdData) Sub<-DataSetAll[grep("subject",names(DataSetAll))] Act<-DataSetAll[grep("activity",names(DataSetAll))] SubAct<-cbind(Sub,Act) MeanStdData<-cbind(MeanStd,SubAct)

The final part completes Setep 5 and Creates a second, independent tidy data set with the average of each variable for each activity and each subject I had to install "reshape2" package to do this. Then I write the file as both .txt and .csv library('reshape2') DF<-data.frame(MeanStdData) mdata <- melt(DF, id=c("subject","activity")) mdata <- melt(DT, id=c("subject","actlabel")) FinalDataSet<-dcast(mdata,subject+activity~variable,fun.aggregate=mean) write.table(FinalDataSet, "HARMerged.txt", sep=";") write.csv(FinalDataSet, "HARMerged.csv")

About

Getting and Cleaning Data - HAR Course Project

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published