This is a search engine designed to search for blogs on CSDN.net. The user interface is a WeChat Mini Porgram. The project includes crawling, text mining, inverted index, VSM, tf-idf, page rank and other methods.
CSDN-based Search Engine Design
[Python, C Language, user design, algorithm design, text analysis]
-
Crawled data from the CSDN.net, processed the data with word segmentation, empty words filtering and stop words removing, generated the inverted index
-
Used TF-IDF, VSM Weighting and Page Rank to complete the searching algorithm in Python
-
Successfully completed the front end as a WeChat mini program with high retrieval precision and speed