Iterates list page and then scraps the detailed page to scrap values from it. The output is in a csv file format
The executable is inside ./bin folder
You can either use the executable from cli
scrapper -f <config filename>
OR
use from your nodejs script
Scrapper = require ('scapper');
var config = {
}; // The config file
(new Scrapper(config)).execute();
A example is added in config_example.json.
- site - The domain where you want to scrap from. eg: example.com
- list - configuration from list page.
- url - The url of the list page. eg: http://www.example.com?pagetype=list&page=%page%
- startIndex - The starting page id. This value replaces
%page%in url above. - pageLimit - The ending page id. This value replaces
%page%in url above. The code iterates pages from startIndex to pageLimit - selectorForLink - The selector to find the links for detailed page
- browserDetails - configuration to mimic a browser
- userAgent - Add a valid user agent
- cookie - If cookie is required then add this
- throttleTime - time in millisecond. Throttles the page fetch speed .
- listPageThrottleTime (optional) - time in millisecond. Throttles the page fetch speed page for list page. If not present then it uses throttleTime
- detailed - Configuration for the detailed page
- scrapValues - List of option to scrap from the detailed page. It is an array of scrapOptions.
- output - Configuration for the output
- location - The file location where the output csv is saved
- bufferLength - The length of buffer to hold the output scrap values. After it is filled up , the program empties it to the output file csv file
- selector - The css seelctor where the content lies
- key - The field name
- split - configuration to split the selectors content
- sep - The separator by which the content is split
- idx - The index of the splitted string array to be picked as the value
- func - for complex processing of the selectors content you can add the function content here. The function is passed one argument and it is the selectors content. The output of the function is picked as the value