www.zhihu.com spider
Simple Spider for zhihu in china
Lib use :
requests
gevent
You can use pip to install:
pip install requests
pip install gevent
for use it ,you should create a file named user_headers.py
#!/usr/bin/env python
# -*- coding: utf-8 -*-
# the content of browser's request headers, after you login zhihu.com
headers = {
'Accept-Language': 'zh-CN,zh;q=0.8,en-US;q=0.5,en;q=0.3',
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64; rv:50.0) Gecko/20100101 Firefox/50.0',
'Accept-Encoding': 'gzip, deflate, br',
'Origin': 'https://www.zhihu.com',
'Connection': 'keep-alive',
'X-Xsrftoken': 'your xsrf token',
'Cookie': 'your cookies'
}you will find cookie press f12 when you use Firefox. X-Xsrftoken you can delete this key.
if you want get someone followers
you can use it for this:
from followers import Followers
flers = Followers(url_token="chen-er-bai-18")
for i in flers.get_api_object():
print i["name"], i["url_token"]the same following.py, question_api.py, voter_profile.py, comments.py, following_topic.py
the question.py use old api it return html then parse pyquery.
use gevent improve efficiency becouse Coroutines.