Tags · nma/python-goose

1.0.29

Draft new release 1.0.29

* Requests used for images. Same http session is used for all requests.
* Analyze all possible text root nodes and select best one, do not stop on first text root node candidate
* Improve text selection filters

Jan 21, 2016
9632746
zip
tar.gz

1.0.28

Draft new release

1.0.28:

  * Move to requests as network library

Jan 13, 2016
87808d2
zip
tar.gz

1.0.27

Merge pull request Lol4t0#1 from Lol4t0/python_3

Python 3 support

Jan 12, 2016
40cdd84
zip
tar.gz

1.0.26

Fix unicode processing + `&nbsp;` support

* As STOP_WORDS are stored in unicode format we should keep our words candidates in unicode also to be able to compare candidates against dictionary correctly
* With some languages, short stopwords are linked to the next word in the sentance with no-breakable-space. To designate those stop words we should support nbsp when tokenizing.
 Russian is an example. So this fixes grangier#223