Examples of classwork:
Machine Learning:
-
Document step by step the creation of a program to classify handwritten numerals. Use the MNIST standard dataset to train and test. Use scikit-learn modules as necessary. Can an accuracy of greater than 95% accuracy be achieved on the test data?
Data Mining:
-
K-means clustering. This implements a k-means clustering tool to decide which of 3 clusters the datapoints should be assigned to. Written in Python.
-
Decision tree classifier. This creates a height = 2 (root node and two layers of decision nodes) decision tree to classify data. The tree is first trained to find the best attributes of the data to use for the decision points in the tree, and then test data is run through the resulting tree, and the data is classified based on the rules generated in the training phase. Written in Python.
https://github.com/tgw4uiuc/examples/tree/main/decision_tree
Data Visualization:
-
Using Tableau, use 2 public datasets, and merge them to show cross filtering and look for correlations. This was done in June 2020. The idea was to see if the states with the busiest airports had the highest rates of COVID-19 infection (was airline travel the major cause of early spreading of the virus?). First it looks at which states were had the busiest airports as measured by number of passenger arrivals (from the bureau of transportation statistics' airport data), then that was compared to the COVID-19 infection rate per capita (from the New York Times' COVID dataset). You can select any state in either of the bar graphs, or in the map, and the same state will be highlighted in the others. Multiple states can be highlighted by holding the control key. It did not show a strong correlation between states with the busiest airpors and COVID-19 infections. (For example, highlight the top 5 states for airline arrivals, only one of those stats is in the top 20 for COVID-19 rates per capita.
-
Using javascript and the D3 library, read in a data file, and create a visual narrative regarding vehicle fuel economy. The narritve at first directs the viewer along a story, and in the last part, lets them explore more details about each vehicle data point.