GettingandCleaningDataProject

Study Design

The source data for this dataset can be found at the following website: https://d396quasza40orc.cloudfront.net/getdata%2Fprojectfiles%2FUCI%20HAR%20Dataset.zip

Once the data is file is unzipped, the following 9 files need to be put in the working directory:

subject_test.txt
X_test.txt
Y_test.txt
subject_train.txt
X_train.txt
Y_train.txt
activity_labels.txt
features.txt
features_info.txt

-Data in the "inertial signals" files was not used. -A description of the source data can be found in the README.txt and features_info.txt file

run_analysis.R script

-The "reshape2" package will need to be installed prior to running the run_analysis.R script

Script Objective:

Merge the training and test data set to create one data set
Extract only the features that are "means" and "standard deviations"
Appropriately labels the data seet with descriptive activity names and feature names
Create a second, independent tidy data set with the average of each feature for each activity and each subject

Description of how the script works:

library(reshape2) #Loads the "reshape2 library

subject_test=read.table("subject_test.txt",sep="") #Reads in a data file

X_test=read.table("X_test.txt",sep="") #Reads in the a file

Y_test=read.table("Y_test.txt",sep="") #Reads in a data file

subject_train=read.table("subject_train.txt",sep="") #Reads in a data file

X_train=read.table("X_train.txt",sep="") #Reads in the a file

Y_train=read.table("Y_train.txt",sep="") #Reads in the a file

activity_labels=read.table("activity_labels.txt",sep="") #Reads in the a file

features=read.table("features.txt",sep="") #Reads in the a file

features_info=read.table("features_info.txt",sep="") #Reads in the a file

labs=paste(features [,2]) #creates a character vector of the feature names names(X_train)=labs #replaces the non-descriptive names from the X_train dataframe with the feature names

names(X_test)=labs #replaces the non-descriptive names from the X_test dataframe with the feature names

names(activity_labels)=c("V1","activity") #replaces the names of the columns in the activity_lables dataframe with V1 and "activity"

names(subject_test)="subject" #replaces the column name V1 with "subject"

names(subject_train)="subject" #replaces the column name V1 with "subject"

meanstdfeatures=grep(("-mean|std"),labs) # returns the indicies of features the have "-mean" or "std" in the name

meanstdfeaturesname=grep(("-mean|std"),labs,value=TRUE) #returns the name of the features that have "-mean" or "std" in the name

X_test_meanstd=X_test[,meanstdfeatures] #selects columns that has "-mean" or "std" in the feature name

X_train_meanstd=X_train[,meanstdfeatures] #selects columns that has "-mean" or "std" in the feature name

XY_train=cbind(subject_train,Y_train,X_train_meanstd) # combines columns from the "subject" dataframe, "Y_train" dataframe and the "X_train_meanstd" (only the data with "mean" or "std" in its name)

XY_test=cbind(subject_test,Y_test,X_test_meanstd) # combines columns from the "subject" dataframe, "Y_test" dataframe and the "X_test_meanstd" (only the data with "mean" or "std" in its name)

XY=rbind(XY_train,XY_test) # merges the train and test datasets together

MergeXY=merge(activity_labels,XY,sort=FALSE) #merges the XY dataset and the activity lables on the V1 variable name

MergeXY$V1=NULL #removes the V1 column from the final dataset

molten=melt(MergeXY,id=c("subject","activity")) #melts the merged dataset to facilitate the creation of the average of each variable for each activity and each subject

finaloutput=dcast(molten,subject + activity ~ variable,fun.aggregate=mean) #obtains the mean of each variable for each activity and each subject

Code Book

The features selected for this database come from the accelerometer and gyroscope 3-axial raw signals tAcc-XYZ and tGyro-XYZ. These time domain signals (prefix 't' to denote time) were captured at a constant rate of 50 Hz. Then they were filtered using a median filter and a 3rd order low pass Butterworth filter with a corner frequency of 20 Hz to remove noise. Similarly, the acceleration signal was then separated into body and gravity acceleration signals (tBodyAcc-XYZ and tGravityAcc-XYZ) using another low pass Butterworth filter with a corner frequency of 0.3 Hz.

Subsequently, the body linear acceleration and angular velocity were derived in time to obtain Jerk signals (tBodyAccJerk-XYZ and tBodyGyroJerk-XYZ). Also the magnitude of these three-dimensional signals were calculated using the Euclidean norm (tBodyAccMag, tGravityAccMag, tBodyAccJerkMag, tBodyGyroMag, tBodyGyroJerkMag).

Finally a Fast Fourier Transform (FFT) was applied to some of these signals producing fBodyAcc-XYZ, fBodyAccJerk-XYZ, fBodyGyro-XYZ, fBodyAccJerkMag, fBodyGyroMag, fBodyGyroJerkMag. (Note the 'f' to indicate frequency domain signals).

These signals were used to estimate variables of the feature vector for each pattern:
'-XYZ' is used to denote 3-axial signals in the X, Y and Z directions.

tBodyAcc-XYZ tGravityAcc-XYZ tBodyAccJerk-XYZ tBodyGyro-XYZ tBodyGyroJerk-XYZ tBodyAccMag tGravityAccMag tBodyAccJerkMag tBodyGyroMag tBodyGyroJerkMag fBodyAcc-XYZ fBodyAccJerk-XYZ fBodyGyro-XYZ fBodyAccMag fBodyAccJerkMag fBodyGyroMag fBodyGyroJerkMag

The set of variables that were estimated from these signals are in this data set are:

mean(): Mean value std(): Standard deviation

The data in the final dataset is the average of each feature for each activity and each subject

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
README.md		README.md
run_analysis.R		run_analysis.R

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

GettingandCleaningDataProject

About

Uh oh!

Releases

Packages

Languages

sdbenson/GettingandCleaningDataProject

Folders and files

Latest commit

History

Repository files navigation

GettingandCleaningDataProject

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages