Skip to content
This repository was archived by the owner on Dec 30, 2021. It is now read-only.

roy-liu/CourseProject

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 

Repository files navigation

Setup/Demo

Video Demo/Setup is here:

https://youtu.be/qNMndBBiRV0

Backend

For backend setup go to the backend folder in your terminal and run:

pip install -r requirements.txt
python3 main.py

This should result in a flask server spinning up at 127.0.0.1:5000 if this does not happen, please check the video for what to do. PLEASE DO NOT CLOSE THIS TERMINAL.

Frontend

For the chrome extension, go to chrome://extensions then press load unpacked. This will open your file explorer/finder. You want to go to the frontend folder, which will present you with an empty folder. Press open or the equivalent for your OS, this should allow you to see a new extension called CourseProject.

Instructions

To use, open any pdf from chrome as long as it ends with the .pdf extension. Prefearbly this pdf is on arXiv as it will make data parsing easier, but as long as it ends in .pdf and is a valid paper it'll make a best guess. Type in an author name, this will start the process for text processing on the Flask backend.

The output will be generated in the console of the Flask backend. Please take a look, the output will look like this:

Sending request to arxiv for author query: Karam Park
Parsing Response....
Here is a preview of papers this author has written
	1: Growth of balls in the universal cover of surfaces and graphs 
 		URL: http://arxiv.org/abs/1304.3567v2
	2: Short Homotopically independent loops on surfaces 
 		URL: http://arxiv.org/abs/1310.1269v1
	3: Concept Embedding for Information Retrieval 
 		URL: http://arxiv.org/abs/2002.01071v1
	4: A Dynamic Residual Self-Attention Network for Lightweight Single Image
  Super-Resolution 
 		URL: http://arxiv.org/abs/2112.04488v1
	5: Neural Audio Fingerprint for High-specific Audio Retrieval based on
  Contrastive Learning 
 		URL: http://arxiv.org/abs/2010.11910v4
Tokenizing bm_25
Building BM25 for later query
BM25 successfully completed, parsing pdf now
analyzed stop words...
removed punctuation...
normalized data, building final word list...
building corpus dictionary (this takes a while)...
building term_matrix...
running lda (please wait)...
[['network', 'residual', 'image']]
Querying top 3: 
Relevant Papers Found
---------------------------------
1: A Dynamic Residual Self-Attention Network for Lightweight Single Image
  Super-Resolution http://arxiv.org/abs/2112.04488v1

2: Understanding How Image Quality Affects Deep Neural Networks http://arxiv.org/abs/1604.04004v2

3: Short Homotopically independent loops on surfaces http://arxiv.org/abs/1310.1269v1

4: Neural Audio Fingerprint for High-specific Audio Retrieval based on
  Contrastive Learning http://arxiv.org/abs/2010.11910v4

5: Concept Embedding for Information Retrieval http://arxiv.org/abs/2002.01071v1

---------------------------------

If no papers show up, you will see that under the Here is a preview... section of the code. This will also tell you of a network error, so if that occurs please check your internet connection.

Topics generated by the LDA will be displayed in list notation as seen in [['network', 'residual', 'image']]. The final papers that are relevant will be shown in between the dashed lines.

Good luck!

I have a bug!

It might be addressed in the video, if not shoot me an email at royliu2@illinois.edu thanks!

Progress Report

So far the project has proven to be somewhat more difficult than expected. I have accomplished the following so far:

  • Parse PDFs into a string
  • Hook up API for arXiv
  • Search for papers by author.
  • Basic UI to search and parse Due to challenges in running my own server to get elsevier running due to CORS issues I have opted to do the same thing but using arXiv papers only. This is because they have a comprehensive system for APIs. However, their return objects are in XML which takes time to convert into JSONs. Now that I have both summaries for all papers that an author has written and the data for any given PDF, I will start looking at the performance differences between algorithms and choose a more optimal one. That should finalize the project and should be on good track after.

Challenges

The biggest challenge is getting the chrome extension to work the way I want it due to having service workers running weirdly. After a few hours of messing around and debugging I have achieved my goal of getting it to work properly.

What's next

All that is left is implementing the actual algorithms and generating results. Since I already have the results, I need to compare the algorithms for relevance.

CourseProject

Team Members

Just me, royliu2!

Topic

I have chosen to create a chrome extension that helps you parse research papers and find other papers that the authors have authored in order of relevance. This is related to ranking relevance of different data. The datasets that I will be using will be coming through API keys from dev.elsevier.com. We will query for data that the author has published and rank based on paper relevance. I plan on using native javascript to parse the data, and will demonstrate that it works by finding individuals with papers published in different fields. If the relevance is valid, then fields differing from the current paper will not be selected. This will likely take more than 20 hours for the follwing reasons:

  • Parse PDFs through JS (3 hours)
  • Hook up API properly (2 hours)
  • Find relevant information and using the valid algiorithms (10 hours)
  • Testing different tradeoffs (2 hours)
  • Debugging/Optimizations (5 hours)

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • JavaScript 98.8%
  • Other 1.2%