This project provides SQL and Python scripts to implement and demonstrate several common multi-touch attribution models. It's designed to help marketing teams understand the contribution of different channels to user conversions.
In digital marketing, a user often interacts with multiple marketing channels (like paid search, social media, or email) before making a purchase or converting. Multi-touch attribution is the process of determining how to distribute the credit for that conversion among these various touchpoints. Different models exist because there are different theories on how to value each touch.
There are two main files in this project:
attribution_models.sqlsimulation.py
This is a SQL script designed to run in a modern SQL database (like PostgreSQL).
- Sample Data: It starts by creating a sample dataset of user journeys, showing which channels users interacted with and when they converted. This is done using a Common Table Expression (CTE), so the script is self-contained.
- Attribution Models: It then provides queries to calculate conversions attributed to each channel based on four different models:
- First Touch: Gives 100% of the credit to the very first channel the user touched.
- Last Touch: Gives 100% of the credit to the final channel the user touched before converting.
- Linear: Splits the credit equally among all channels in the user's journey.
- U-Shaped: Gives 40% of the credit to the first touch, 40% to the last touch, and divides the remaining 20% among the touches in the middle.
This is a Python script that complements the SQL file and provides a simulation of attribution vs. revenue lift.
- Revenue Simulation: It uses the
pandaslibrary to create a similar dataset of user journeys but also includes revenue generated from each conversion. - Python Implementation: It re-implements the same four attribution models in Python.
- Revenue Analysis: The main goal of this script is to calculate the total revenue attributed to each channel under each model. It prints a final summary table that allows for easy comparison, showing how the choice of attribution model can significantly change the perceived value of each marketing channel.
The attribution_models.sql script can be executed in any SQL environment that supports CTEs and window functions. The output will show the number of conversions attributed to each channel for each model.
The simulation.py script can be run from your terminal:
python simulation.pyYou will need to have the pandas library installed. If you don't have it, you can install it via pip:
pip install pandasThe script's output will show a comparison of the revenue attributed to each channel by the different models.