Skip to content

Please use this repo as a starting point for the group work. Clone, then check out branches, the submission can be as simple as one pull request.

Notifications You must be signed in to change notification settings

romantilly/data

 
 

Repository files navigation

uzk_logo

What we are supposed to do

  1. import data for clickstreams and orders

  2. clean imported data

  • set appropriate variable types
  • mark NA values
  • delete empty columns (only NAs)
  1. create plots
  • simple univariate descriptive plots
  • time series
  • grouped plots for comparisons
  • more specific plots, e.g., distribution of revenue across products or product categories (long tail), Lorentz-curve
  1. further analysis; some ideas
  • try to merge clickstreams and orders
  • customer segmentation (recency, frequency, monetary value)
  • product clustering
  • streams of customers between product clusters (sankes diagram)
  • sequential patterns in clickstreams and orders

Regarding Programming in R and Python

All exercises should be done in R and Python. For the project, the following rules apply:

Step R or Python
Import, manipulate both, R: tidyverse, mostly dplyr , Python: NumPy, Pandas
Plots up to the teams, but the packages should implement the Grammer of Graphics (R: ggplot2, Python: plotnine)
Documentation, Table up to the teams
Inference up to the teams
Prediction up to the teams (probably better in Python)

About

Please use this repo as a starting point for the group work. Clone, then check out branches, the submission can be as simple as one pull request.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • R 100.0%