Skip to content

Tags: nimaarek/python-goose

Tags

1.0.29

Toggle 1.0.29's commit message
Draft new release 1.0.29

* Requests used for images. Same http session is used for all requests.
* Analyze all possible text root nodes and select best one, do not stop on first text root node candidate
* Improve text selection filters

1.0.28

Toggle 1.0.28's commit message
Draft new release

1.0.28:

  * Move to requests as network library

1.0.27

Toggle 1.0.27's commit message
Merge pull request Lol4t0#1 from Lol4t0/python_3

Python 3 support

1.0.26

Toggle 1.0.26's commit message
Fix unicode processing + ` ` support

* As STOP_WORDS are stored in unicode format we should keep our words candidates in unicode also to be able to compare candidates against dictionary correctly
* With some languages, short stopwords are linked to the next word in the sentance with no-breakable-space. To designate those stop words we should support nbsp when tokenizing.
 Russian is an example. So this fixes grangier#223