Skip to content

mlhorizon/extractor

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

extractor

中文网页正文内容提取 基于《基于行块分布函数的通用网页正文抽取算法》实现

安装

	go get github.com/yqingp/extractor

使用

	import (
		"github.com/yqingp/extractor"
	)
	....

	extract_worker := extractor.NewExtractor(url)
	content, err := extract_worker.Extract()
	
	if err != nil {
		fmt.Println(content)
	}

server方式启动

	go run  example/server.go
	require 'rest_client'
	RestClient.post("http://localhost:8000/work", {:url => "http://www.baidu.com"})

About

网页正文内容提取

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages