Skip to content

Cha 2 -编写你的第一个网络爬虫.ipynb #2

@Github-Minghui

Description

@Github-Minghui

当我试图运行下面这段编码的时候,编译器报错。在想是否因为我在用的是英文系统,无法encode中文。请教
import requests
from bs4 import BeautifulSoup #从bs4这个库中导入BeautifulSoup

link = "http://www.santostang.com/"
headers = {'User-Agent' : 'Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US; rv:1.9.1.6) Gecko/20091201 Firefox/3.5.6'}
r = requests.get(link, headers= headers)

soup = BeautifulSoup(r.text, "html.parser") #使用BeautifulSoup解析这段代码
title = soup.find("h1", class_="post-title").a.text.strip()
print (title)

with open('title_test.txt', "a+") as f:
f.write(title)
f.close()

===================================================================
4.3 通过selenium 模拟浏览器抓取

UnicodeEncodeError Traceback (most recent call last)
in
11
12 with open('title_test.txt', "a+") as f:
---> 13 f.write(title)
14 f.close()

~\Anaconda3\lib\encodings\cp1252.py in encode(self, input, final)
17 class IncrementalEncoder(codecs.IncrementalEncoder):
18 def encode(self, input, final=False):
---> 19 return codecs.charmap_encode(input,self.errors,encoding_table)[0]
20
21 class IncrementalDecoder(codecs.IncrementalDecoder):

UnicodeEncodeError: 'charmap' codec can't encode characters in position 4-5: character maps to

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions