Oftentimes, users found it’s hard to fathom the information that they barely learnt before. For example, how are we able to answer the question, “what is data science?”, especially to an outsider? How are we able to clarify the definition of “data science” or “computer science” by borrowing more basic or common words or terms to further help our users to understand? Our implementation of a semantic annotation algorithm based on the paper, “Generating semantic annotations for frequent patterns with context analysis”, can achieve the goal. The final goal is to automatically decipher certain words, terms, and even sentences by providing its highly-associated while distinct frequent patterns in semantic text form.
In our case, to be more specific, we used the algorithm to summarize what specialty each college published in major computer science conferences. The Digital Bibliography & Library Project (DBLP) computer science bibliography acts as good study material for our project. It contains the metadata of more than 1.8 million publications in thousands of journals and conferences proceeding series written by over 1 million authors. It first started to be a bibliography on database systems and logic programming but has since expanded to all fields of computer science.
From this well-structured dataset, we selected three top U.S.-based universities including Massachusetts Institute of Technology (MIT), Georgia Institute of Technology (GT), and the University of Maryland as our user case. By implementing the algorithm, we can extract a series of words or terms to differentiate their academic focus based on thousands of paper titles published throughout the years: some colleges will be more inclined to data analysis, the others will more concentrate on wireless systems. In the real world, the utilization of automatic annotation can also be universal: users can use the algorithm to understand not-well-defined text information such as “NLP”, “Machine Learning” and “Deep Learning” that is not defined in the dictionary such as Merriam Webster.
Reference: KDD '06: Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data miningAugust 2006 Pages 337–346https://doi.org/10.1145/1150402.1150441