主页 分类 关于

爬虫 爬小说(牢饭吃到饱)未完待续

每天一个入狱小技巧

爬小说的详细步骤

from bs4.element import PageElement
import requests
import re
from lxml import etree

# 要爬的链接
a = "12739599504227601#Catalog"
url = "https://www.readnovel.com/book/" + a

# 请求头
headers = {
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/90.0.4430.72 Safari/537.36"
}
# .decode()指定编码
paqv = requests.get(url, headers=headers ).content.decode()
sdq1 = etree.HTML(paqv)
hrefs = sdq1.xpath('//div[@class = "volume"]/ul/li/a/@href')
for href in hrefs:
it2 = 'https:'+href
resp = requests.get(it2).content.decode()
sdq2 = etree.HTML(resp)
bcsj = sdq2.xpath('//div[@class = "ywskythunderfont"]/p/text()')
# 把爬下来的数据保存
for neirong in bcsj:
s =''.join(neirong.split())
with open('1.txt','a',encoding='UTF-8') as f:
f.write(s+'\n')









作者: 我叫史迪奇
本文来自于: https://sdq3.link/reptile-novel.html博客内容遵循 署名-非商业性使用-相同方式共享 4.0 国际 (CC BY-NC-SA 4.0) 协议