爬虫爬小说(牢饭吃到饱)未完待续

爬小说的详细步骤

每天一个入狱小技巧

爬小说的详细步骤

from bs4.element import PageElement
import requests
import re
from lxml import etree

# 要爬的链接
a = "12739599504227601#Catalog"
url = "https://www.readnovel.com/book/" + a

# 请求头
headers = {
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/90.0.4430.72 Safari/537.36"
}
# .decode()指定编码
paqv = requests.get(url, headers=headers ).content.decode()
sdq1 = etree.HTML(paqv)
hrefs = sdq1.xpath('//div[@class = "volume"]/ul/li/a/@href')
for href in hrefs:
    it2 = 'https:'+href
    resp = requests.get(it2).content.decode()
    sdq2 = etree.HTML(resp)
    bcsj = sdq2.xpath('//div[@class = "ywskythunderfont"]/p/text()')
    # 把爬下来的数据保存
    for neirong in bcsj:
        s =''.join(neirong.split())        
        with open('1.txt','a',encoding='UTF-8') as f:
            f.write(s+'\n')

作者: 我叫史迪奇
本文来自于: https://sdq3.link/reptile-novel.html博客内容遵循署名-非商业性使用-相同方式共享 4.0 国际 (CC BY-NC-SA 4.0) 协议

爬虫 爬小说(牢饭吃到饱)未完待续

爬小说的详细步骤

爬虫爬小说(牢饭吃到饱)未完待续