Skip to content

Trafilatura cannot read gzipped pages? #781

@LaundroMat

Description

@LaundroMat

When I'm trying to fetch and extract the Dramatiq.io homepage, Trafilatura cannot extract its contents.

>>> downloaded = fetch_url("https://dramatiq.io/")
>>> downloaded
(�/��X,P�Z��BD0g��������taP�-�������I��$���I"g��0ī����?�L�����������6�Aϫ�pP�����4�������0��mʖF���1�li	�x�����M����[�a���g�\�^�F�/�y�<ѿ���[�7zc��՝�+[��*c2דӡǩ�K�:fH���8�i��Q...)

>>> extract(downloaded, output_format="markdown", include_comments=False, with_metadata=False) is None
True

How can I fix this?

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions