Author: Stephanie(Shuyan) Zhou
Insight Data Engineering Fellowship Program Project. This project provides a real-time auto tagging pipeline for users that post questions on StackOverflow.
High quality data is essential in data mining and product analytics work, for questions related data, if "tag" information could be appended appropriately. Data scientists can perform user identification & behavior pattern recognization better.
However, not all the users can choose tags appropriately for their questions due to several reasons: 1) questions could be complicated and cover several areas that users don't know how to choose appropirate tags 2) sometimes users are just forget or lazy to attach tags.
In this project, we can provide real-time top 3 tags recommendation after end users type in question title. We also provide end users options to delete or add tags based on their preference. In this way, data scientists can get better quality question data and use it perform clustering, user behavior investigation and recommendation model evaluation better.
Source: BigQuery Open Datasets
Table Name: Posts Questions
Data Description: id, body, comment_count, community_owned_date, creation_date, last_activity_date, last_edit_date, last_editor_display_name, last_editor_user_id, owner_display_name, owner_user_id, parent_id, post_type_id, score, tags
This project includes both Batch Processing and Streaming Processing:

