Skip to content

Conversation

@EmilStenstrom
Copy link

@EmilStenstrom EmilStenstrom commented Dec 9, 2025

What is this Python project?

JustHTML is a dependency-free, pure python, html5 parser. That means it takes a string of html, and returns a python tree structure, that you can then query and manipulate.

Comparison (A brief comparison explaining how it differs from existing alternatives.)

See comparison table.

What's the difference between this Python project and similar ones?

It's the only html5 parser available in python that passes all html5 tests. It is very well tested, with 100% test coverage, fuzz testing done.

It's fast enough, parses Wikipedia's homepage in 0.1s. Rust and C parsers are of course faster, but not as correct, and tricky to install.

It has a very nice query API, where you pass in a CSS selector and get back all elements that match that query.

--

Anyone who agrees with this pull request could submit an Approve review to it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant