Skip to content

successlab/D-Explorer

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 

Repository files navigation

D-Explorer

Crawler

The crawler is found in crawler/crawler_2.0.

Setting up selenium webdriver

First, install selenium with

pip install selenium

At this point, the driver should run correctly, assuming you are using Chrome. If you use another browser, you must change the selenium driver setup. This looks like

driver = webdriver.Chrome()

Correcting selenium references

Amazon changes the names and paths of the items we want to crawl constantly. If the crawler throws an error, inspect the code for the element that failed, then find that element in the browser inspection pane and fix the reference. This may occur multiple times. In some cases, parts of the old reference can be searched for to find the new one.

Running the crawler

Simply run main.py. The crawler is not designed to run in parallel, but doing so is possible. Note that data already exists, so check and make sure you aren't overwriting anything.

In some cases, the crawler may need a capcha. Fill this out when it occurs.

Crawler Analysis Tools

Assorted tools to get data about the crawler files and generate invocations. Each is a self contained python file. Make sure the inputs and outputs are where you want them.

Chatbot

Installation

You will need to install the python openAI library. Selenium should already be instulled, but if it is not, do that too.

pip install openai

OpenAI Setup

You will need an openAI account. Set up a project and get an API key.

The model is currently set to gpt-4.0. If you want a cheaper option, choose gpt-4.0-mini. It is rate limited and cannot handle parallel instances.

Amazon Setup

You will need a developer account, with username, password, and the url to the developer portal set. You may have to start a skill to access the dev portal.

Running the project

Make sure you have invocations in the same directory, and there are no collisions with output.

Run

python3 main.py 

If you want to run in parallel, you can specifiy category and then starting letter as input args. This prevents multiple instances from running on the same skills, leading to race conditions or inefficeincy. All 3 patterns are supported.

python3 main.py
python3 main.py CategoryName
python3 main.py CategoryName A

If too many parallel instances are run, the chatbot may crash. Also, some skills freeze indefinitely (5 or so in our run.) Kill the Chrome windows and delete these skill invocations, or it may freeze Chrome and possibly overheat your machine.

Analysis Tools

There are 3 folders with analysis tools, one for resource analysis, one for content analysis, and one for performance analysis. Each contains python analysis files. All are self contained, although you may need to install boto3.

Also included is sanitized result data.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages