This R script is called run_analysis which clean the data we collected from the accelerometers from the Samsung Galaxy S smartphone by University of California, Irvine.
The data is stored as a zip file. For this script, you have to unzip the data file, otherwise, it cannot read the data in the file. The defult unzipped file name is "UCI HAR Dateset".
The script does the followings:
-
It reads the Training set and Test set and combines them into one large set
-
It read the features.txt which contains all features of the data, then it use the features to name the large set.
-
It uses a for loop to find out which column contains the mean and standard diviation for each measurement, and store the column numbers to a vector.
-
It extracts only the measurements of the mean and standard diviation by using the vector in 3.
-
It reads the train/subject_train.txt and test/subject_test.txt to get the subjects for the set and combines the subjests to the large set.
-
It reads the train/y_train.txt and test/y_test.txt to get the activitiy lable of the two sets.
-
It reads the activity_labels.txt to get the activity labels in words, and uses a for loop to replace all activity labels from number to words.
-
It combines the activity labels and the large set, then it reordes the set by the activity labels and the subject labels.
-
It splits the set by the activity labels and the subject labels, then by using the for loop, it finds the mean values for each measurement with the unique activity and suject and stores the mean values to a data frame called table.
-
It returns table as it is the final clearned data.