Skip to content

difangu/CourseProject

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

31 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Generating Semantic Annotations for Frequent Patterns with Context Analysis

Oftentimes, users found it’s hard to fathom the information that they barely learnt before. For example, how are we able to answer the question, “what is data science?”, especially to an outsider? How are we able to clarify the definition of “data science” or “computer science” by borrowing more basic or common words or terms to further help our users to understand? Our implementation of a semantic annotation algorithm based on the paper, “Generating semantic annotations for frequent patterns with context analysis”, can achieve the goal. The final goal is to automatically decipher certain words, terms, and even sentences by providing its highly-associated while distinct frequent patterns in semantic text form.

In our case, to be more specific, we used the algorithm to summarize what specialty each college published in major computer science conferences. The Digital Bibliography & Library Project (DBLP) computer science bibliography acts as good study material for our project. It contains the metadata of more than 1.8 million publications in thousands of journals and conferences proceeding series written by over 1 million authors. It first started to be a bibliography on database systems and logic programming but has since expanded to all fields of computer science.

From this well-structured dataset, we selected three top U.S.-based universities including Massachusetts Institute of Technology (MIT), Georgia Institute of Technology (GT), and the University of Maryland as our user case. By implementing the algorithm, we can extract a series of words or terms to differentiate their academic focus based on thousands of paper titles published throughout the years: some colleges will be more inclined to data analysis, the others will more concentrate on wireless systems. In the real world, the utilization of automatic annotation can also be universal: users can use the algorithm to understand not-well-defined text information such as “NLP”, “Machine Learning” and “Deep Learning” that is not defined in the dictionary such as Merriam Webster.

Reference: KDD '06: Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data miningAugust 2006 Pages 337–346https://doi.org/10.1145/1150402.1150441

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Jupyter Notebook 95.2%
  • Python 4.5%
  • Rich Text Format 0.3%