Skip to content

huajiechen/CWS

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

CWS

This code implements the word segmentation algorithm proposed in the following paper.

Deng Cai and Hai Zhao, Neural Word Segmentation Learing for Chinese. (ACL 2016)

##Usage: ###- train python train.py. To train a model, first check hyperparameter settings in train.py. The training procedure will result a config file which preserves your hyperparameter settings and trained model parameters will be saved in *.npz.

###- test python test.py params.npz input_file output_path config_file. To test a trained model whose parameters is in params.npz . The corresponding configuration should be in config_file. The test procedure will read data from input_file and output result to output_path.

###- evaluate
E.g., To see the best result on PKU dataset reported in our paper, first generate the output file through our trained model ( python test.py best_pku.npz ../data/pku_test somepath best_pku_config), and then use the command ./score ../data/dic ../data/pku_test somepath.

##Dependencies: Thanks for those excellent computing tools: Theano, Numpy, Gensim

##Author: Deng Cai. Any question, feel free to contact me through my email

##Citation: If you find this code useful, please cite our paper.

@InProceedings{cai-zhao:2016:P16-1,
  author    = {Cai, Deng  and  Zhao, Hai},
  title     = {Neural Word Segmentation Learning for Chinese},
  booktitle = {Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)},
  month     = {August},
  year      = {2016},
  address   = {Berlin, Germany},
  publisher = {Association for Computational Linguistics},
  pages     = {409--420},
  url       = {http://www.aclweb.org/anthology/P16-1039}
}

About

For an ACL2016 paper

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 79.7%
  • Perl 19.9%
  • Shell 0.4%