GitHub - MarkLuro/requests-html: HTML Parsing for Humans™

Requests-HTML: HTML Parsing for Humans™

This library intends to make parsing HTML (e.g. scraping the web) as simple and intuitive as possible.

When using this library you automatically get:

jQuery selectors (thanks to PyQuery).
Mocked user-agent (like a real web browser).
Automatic following of redirects.
Connection–pooling and cookie persistience.
The Requests experience you know and love, with magic parsing abilities.

Other nice features include:

Markdown export of pages and elements.

Usage

Make a GET request to 'python.org', using Requests:

>>> from requests_html import session
>>> r = session.get('https://python.org/')

Grab a list of all links on the page, as–is (anchors excluded):

>>> r.html.links
{'/users/membership/', '/about/gettingstarted/', 'http://feedproxy.google.com/~r/PythonInsider/~3/zVC80sq9s00/python-364-is-now-available.html', '/about/success/', 'http://flask.pocoo.org/', 'http://www.djangoproject.com/', '/blogs/', ... '/psf-landing/', 'https://wiki.python.org/moin/PythonBooks'}

Grab a list of all links on the page, in absolute form (anchors excluded):

>>> r.html.absolute_links
{'http://feedproxy.google.com/~r/PythonInsider/~3/zVC80sq9s00/python-364-is-now-available.html', 'https://www.python.org/downloads/mac-osx/', 'http://flask.pocoo.org/', 'https://www.python.org//docs.python.org/3/tutorial/', 'http://www.djangoproject.com/', 'https://wiki.python.org/moin/BeginnersGuide', 'https://www.python.org//docs.python.org/3/tutorial/controlflow.html#defining-functions', 'https://www.python.org/about/success/', 'http://twitter.com/ThePSF', 'https://www.python.org/events/python-user-group/634/', ..., 'https://wiki.python.org/moin/PythonBooks'}

Select an element with a jQuery selector.

>>> about = r.html.find('#about')[0]

Grab an element's text contents:

>>> print(about.text)
About
Applications
Quotes
Getting Started
Help
Python Brochure

Introspect an Element's attributes:

>>> about.attrs
{'id': 'about', 'class': 'tier-1 element-1  ', 'aria-haspopup': 'true'}

Select Elements within Elements:

>>> about.find('a')
[<Element 'a' href='/about/' title='' class=''>, <Element 'a' href='/about/apps/' title=''>, <Element 'a' href='/about/quotes/' title=''>, <Element 'a' href='/about/gettingstarted/' title=''>, <Element 'a' href='/about/help/' title=''>, <Element 'a' href='http://brochure.getpython.info/' title=''>]

Render an Element as Markdown:

Installation

$ pipenv install requests-html
✨🍰✨

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
Pipfile		Pipfile
Pipfile.lock		Pipfile.lock
README.rst		README.rst
requests_html.py		requests_html.py
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Requests-HTML: HTML Parsing for Humans™

Usage

Installation

About

Uh oh!

Releases

Packages

Languages

MarkLuro/requests-html

Folders and files

Latest commit

History

Repository files navigation

Requests-HTML: HTML Parsing for Humans™

Usage

Installation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages