Skip to content

A hadoop programm analyze xml file containing large corpus of wikipedia pages and filter the pages with certain keywords.

License

Notifications You must be signed in to change notification settings

sawfish/textfilter

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

85 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

textfilter

A hadoop programm analyze xml file containing large corpus of wikipedia pages and filter the pages with certain keywords(case insensitive).

hadoop jar textfilter-0.0.1-SNAPSHOT.jar input outpu keyword1 keyword2 keyword3

About

A hadoop programm analyze xml file containing large corpus of wikipedia pages and filter the pages with certain keywords.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •