Skip to content

Genysys/github-crawler

 
 

Repository files navigation

github-crawler

Extract GitHub repositories metadata and README content.

STEPS:

  1. environment SETUP and package installation

    cp .env.example .env
    python3 -m venv env
    source env/bin/activate
    pip install --upgrade pip
    pip install -r requirements.txt
    
  2. Update the .env file with the correct params

  3. Run the following scripts:

    i. python crawl_repos.py <topic-name> <stars-size> to crawl all the repos with the topic and stars greater or equal . If omitted will consider 0+ stars.

    ii. python get_contributors.py to crawl all the user who contributed the crawled repo from step 3.i

    iii. python get_stargazers.py to crawl all the users who starred the crawled repo from step 3.i

About

Extract GitHub repositories metadata and README content.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 97.5%
  • Dockerfile 2.0%
  • Shell 0.5%