Generating Semantic Annotations for Frequent Patterns with Context Analysis

Click Here for Voiced Presentation & Demo for Grader

Oftentimes, users found it’s hard to fathom the information that they barely learnt before. For example, how are we able to answer the question, “what is data science?”, especially to an outsider? How are we able to clarify the definition of “data science” or “computer science” by borrowing more basic or common words or terms to further help our users to understand? Our implementation of a semantic annotation algorithm based on the paper, “Generating semantic annotations for frequent patterns with context analysis”, can achieve the goal. The final goal is to automatically decipher certain words, terms, and even sentences by providing its highly-associated while distinct frequent patterns in semantic text form.

In our case, to be more specific, we used the algorithm to summarize what specialty each college published in major computer science conferences. The Digital Bibliography & Library Project (DBLP) computer science bibliography acts as good study material for our project. It contains the metadata of more than 1.8 million publications in thousands of journals and conferences proceeding series written by over 1 million authors. It first started to be a bibliography on database systems and logic programming but has since expanded to all fields of computer science.

From this well-structured dataset, we selected three top U.S.-based universities including Massachusetts Institute of Technology (MIT), Georgia Institute of Technology (GT), and the University of Maryland as our user case. By implementing the algorithm, we can extract a series of words or terms to differentiate their academic focus based on thousands of paper titles published throughout the years: some colleges will be more inclined to data analysis, the others will more concentrate on wireless systems. In the real world, the utilization of automatic annotation can also be universal: users can use the algorithm to understand not-well-defined text information such as “NLP”, “Machine Learning” and “Deep Learning” that is not defined in the dictionary such as Merriam Webster.

Reference: KDD '06: Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data miningAugust 2006 Pages 337–346https://doi.org/10.1145/1150402.1150441

Name		Name	Last commit message	Last commit date
Latest commit History 31 Commits
annotation		annotation
report		report
xml		xml
.DS_Store		.DS_Store
Final Report.pdf		Final Report.pdf
Project Progress Report.pdf		Project Progress Report.pdf
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Generating Semantic Annotations for Frequent Patterns with Context Analysis

Click Here for Voiced Presentation & Demo for Grader

About

Uh oh!

Releases

Packages

Languages

difangu/CourseProject

Folders and files

Latest commit

History

Repository files navigation

Generating Semantic Annotations for Frequent Patterns with Context Analysis

Click Here for Voiced Presentation & Demo for Grader

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages