Skip to content

Data Viz, Data Cleaning, Feature Engineering, Modeling and the Economic Impact of a Click-Through Prediction

Notifications You must be signed in to change notification settings

myrthings/click-prediction

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

5 Commits
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

Click-through Prediction For a Company With 28M MAU

This project is the summary code of my team capstone project at IE Datascience Bootcamp. My team and I worked in a click-through prediction for a company with 28M MAU to improve the ad allocation of the company. The data is propietary and not available but I leave here a summary of all the analysis we did.

Context ๐Ÿงญ

Programmatic advertising is the automated transaction of buying and selling advertising online. Ads are displayed with banners in a website and their performance is measured with CTR (Click Throught Rate). Banners holders get paid by number of clicks and by number of impressions (a lot less than for clicks). The way to monetize the banners is by selling them in ad auctions where advertisers bid for the banners. 60% of our company revenue comes from its website banners. Our problem is to increase that revenue as much as possible.

Process ๐Ÿšถโ€โ™€๏ธ๐Ÿšถโ€โ™‚๏ธ

The process we followed is more iterative and circular than linear. However, we've ordered the notebook in the following steps:

  • The problem: Explanation of the business problem and how the challenge was approached.
  • Import the libraries and the data: The libraries needed and ourtools.py file included for custom auxiliar functions. Also an explanation of the data.
  • Data Visualization: What visualizations we did that weere more important for our process. Made with Matplotlib and Seaborn.
  • Feature Engineering: How we cleaned and prepared the data.
  • Model Preparation: The division of the sample sets in test and train.
  • Model Building and Evaluation: The different algorithms we used to fit our data and its evaluation. We compared them using AUC and at the end chose Lightgbm as the best model.
  • More about the perfomance of Lightgbm: The results of our analysis with the final model.
  • Model for Revenue: An evaluation of the Economic Impact of the model. Our model duplicated the revenue of not having any model.
  • Conclusions: How much time we spent on each task.

Development โš™๏ธ

The whole analysis is displayed in a Jupyter Notebook. We also include a Python file with some auxiliar functions. At the begining of the notebook you can find all the libraries needed. On the model part, there're some other libraries for the algorithms.

How to use it ๐Ÿ‘ฉโ€๐Ÿ’ป

You can't use it as it is uploaded because the data is propietary and not available. BUT, you can download the notebook and reuse the functions for your own data. The more useful and original parts are:

  • Visualizations: We did a prety good job thinking how to display the time, maybe you're interested in that. (You can improve their appearance for sure ๐Ÿ˜Š)
  • Undersample: We created our own function to undersample the data an divide it in train and test with the less possible lose of data points.
  • Economic model: This part is completely ours because we didn't find any information on the internet. The model is just an application of probability distributions over a lift measure given ad prices (fixed in our example). It's quite useful if you want or need to explain your results to a non-technical client.

Authors

About

Data Viz, Data Cleaning, Feature Engineering, Modeling and the Economic Impact of a Click-Through Prediction

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published