Skip to content
/ SwifTag Public

Repo for Insight Data Engineering Fellow program project

Notifications You must be signed in to change notification settings

sz222/SwifTag

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

59 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

SwifTag

-- A real-time auto tagging system

Author: Stephanie(Shuyan) Zhou

Insight Data Engineering Fellowship Program Project. This project provides a real-time auto tagging pipeline for users that post questions on StackOverflow.

Gif Demo:

Auto Tagging System

Presentation Slides:

SwifTag Slides

Live Website:

SwifTag Website

Demo Video:

SwifTag Demo Video

Presentation Video:

SwifTag Presentation Video

Project Idea:

High quality data is essential in data mining and product analytics work, for questions related data, if "tag" information could be appended appropriately. Data scientists can perform user identification & behavior pattern recognization better.

However, not all the users can choose tags appropriately for their questions due to several reasons: 1) questions could be complicated and cover several areas that users don't know how to choose appropirate tags 2) sometimes users are just forget or lazy to attach tags.

Business Case:

In this project, we can provide real-time top 3 tags recommendation after end users type in question title. We also provide end users options to delete or add tags based on their preference. In this way, data scientists can get better quality question data and use it perform clustering, user behavior investigation and recommendation model evaluation better.

Data:

Source: BigQuery Open Datasets
Table Name: Posts Questions
Data Description: id, body, comment_count, community_owned_date, creation_date, last_activity_date, last_edit_date, last_editor_display_name, last_editor_user_id, owner_display_name, owner_user_id, parent_id, post_type_id, score, tags

Tech Stack:

This project includes both Batch Processing and Streaming Processing:

tech stack image

About

Repo for Insight Data Engineering Fellow program project

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published