❗ This only works with Python 3
I followed the robots.txt in the site
User-agent: *
Crawl-delay: 120
Disallow: /cgi-bin/There are 2 script which crawl 2 type of data from Relational Stock site: /instshow.php and /showinsiders.php
Fields collected from /showinsiders.php
- Reported Time
- Transaction
- Company
- Ticker
- Insider
- Shares
- Traded
- Average Price
- Price
- Value
Output data is CSV format:
report_time,trans_date,company,ticker,insider,shares_trader,avg_price,value
09-29-2017,2017-09-28 ,PATRICK INDUSTRIES INC,PATK,"NEMETH ANDY L
President, Director","2,000",$84.95,"$169,900"
09-29-2017,2017-09-27 ,PATRICK INDUSTRIES INC,PATK,"NEMETH ANDY L
President, Director","5,000",$83.03,"$415,150"
09-29-2017,2017-09-27 ,PATRICK INDUSTRIES INC,PATK,"Cleveland Todd M
CEO, Director","10,000",$83.16,"$831,600"
09-29-2017,2017-09-08 ,INTERSECTIONS INC,INTX,"Osmium Partners, LLC
10% owner","19,286",$3.35,"$64,608"
-
Install virtualenv via pip
pip install virtualenv
Test your installation
virtualenv --version
-
Create a virtual environment for a project and install
scrapycd my_project_folder virtualenv my_project source my_project/bin/activate pip install Scrapy
-
Clone and run
git clone https://github.com/hscup/rs_crawler.git cd re_crawler python run.py
