-
import data for clickstreams and orders
-
clean imported data
- set appropriate variable types
- mark NA values
- delete empty columns (only NAs)
- create plots
- simple univariate descriptive plots
- time series
- grouped plots for comparisons
- more specific plots, e.g., distribution of revenue across products or product categories (long tail), Lorentz-curve
- further analysis; some ideas
- try to merge clickstreams and orders
- customer segmentation (recency, frequency, monetary value)
- product clustering
- streams of customers between product clusters (sankes diagram)
- sequential patterns in clickstreams and orders
All exercises should be done in R and Python. For the project, the following rules apply:
| Step | R or Python |
|---|---|
| Import, manipulate | both, R: tidyverse, mostly dplyr , Python: NumPy, Pandas |
| Plots | up to the teams, but the packages should implement the Grammer of Graphics (R: ggplot2, Python: plotnine) |
| Documentation, Table | up to the teams |
| Inference | up to the teams |
| Prediction | up to the teams (probably better in Python) |
