Skip to content

hscup/rs_crawler

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Relational Stocks Crawler

❗ This only works with Python 3

Disclaim:

I followed the robots.txt in the site

User-agent: *
Crawl-delay: 120
Disallow: /cgi-bin/

There are 2 script which crawl 2 type of data from Relational Stock site: /instshow.php and /showinsiders.php

Crawled data example

Fields collected from /showinsiders.php

  • Reported Time
  • Transaction
  • Company
  • Ticker
  • Insider
  • Shares
  • Traded
  • Average Price
  • Price
  • Value

Output data is CSV format:

Alt text

report_time,trans_date,company,ticker,insider,shares_trader,avg_price,value
09-29-2017,2017-09-28 ,PATRICK INDUSTRIES INC,PATK,"NEMETH ANDY L
President, Director","2,000",$84.95,"$169,900"
09-29-2017,2017-09-27 ,PATRICK INDUSTRIES INC,PATK,"NEMETH ANDY L
President, Director","5,000",$83.03,"$415,150"
09-29-2017,2017-09-27 ,PATRICK INDUSTRIES INC,PATK,"Cleveland Todd M
CEO, Director","10,000",$83.16,"$831,600"
09-29-2017,2017-09-08 ,INTERSECTIONS INC,INTX,"Osmium Partners, LLC
10% owner","19,286",$3.35,"$64,608"

Install

  1. Install virtualenv via pip

    pip install virtualenv

    Test your installation

    virtualenv --version
  2. Create a virtual environment for a project and install scrapy

    cd my_project_folder
    virtualenv my_project
    source my_project/bin/activate
    pip install Scrapy
  3. Clone and run

    git clone https://github.com/hscup/rs_crawler.git
    cd re_crawler
    
    python run.py

About

Simple crawler using Scrapy

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages