GitHub - ronybot/NLP_HebrewEmbeding: NLP Hebrew Embedding

Yap word2vec model comparison to Simple wiki word2vec model Yap model evaluation compares result of fasttext model created on basis of yap morphological preprocessing of Hebrew wikipedia versus fasttext model created without yap morphological preprocessing on same input corpus(Hebrew wikipedia).

Here we provide:

A python implementation of the method A suite of matching datasets in Hebrew Requirements Python 3.7 PyCharm gensim (only for the example script) Example Run the following file in PyCharm: gensim.py

The code in gensim.py loads a gensim word2vec Yap model and simple model without Yap and runs evaluation on the different datasets. Notice that the models it uses (model.vec and simpleHewiki2017.bin) covers hebrew wikipedia from 2017 July. For access to all results you have use our Azure cloud Linux server. Please call to Rony or Marina.

The provided dataset

dataResult.xlxs contains 5180 words from similar groups. all experiment results and intermediate data exist in directory eval. You can see all the experiments results in file eval/outputResultsAll.txt All the histogram pictures exists in directory eval results images related to experiments exists in eval directory. For example: yap_adjectives.png for adjectives similar group for yap model and simple_hewiki_2017_adjectives.png for simple model without yap.

References If you make use of this software for research purposes, we'll appreciate citing the following:

@InProceedings{ author = {Marina Voloshin, Rony Boter}, title = {Improving Reliability of Word Similarity Evaluation by using of Yap preprocessing}, month = {August}, year = {2018}, address = {Tel Aviv Israel} document = {NLP 2018 course} }

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
CyberW3.ipynb		CyberW3.ipynb
Data_augmentation_resnet50.ipynb		Data_augmentation_resnet50.ipynb
Find_Similarity_ResNet50.ipynb		Find_Similarity_ResNet50.ipynb
README.md		README.md
ResNet50_Similarity_train.ipynb		ResNet50_Similarity_train.ipynb
dataResult.xlsx		dataResult.xlsx
gensim.py		gensim.py
makeHist.py		makeHist.py
outputResultsAll.txt		outputResultsAll.txt
resultsSimilarGroups.txt		resultsSimilarGroups.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

ronybot/NLP_HebrewEmbeding

Folders and files

Latest commit

History

Repository files navigation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages