爬虫 爬小说(牢饭吃到饱)未完待续
每天一个入狱小技巧
爬小说的详细步骤
from bs4.element import PageElement import requests import re from lxml import etree
a = "12739599504227601#Catalog" url = "https://www.readnovel.com/book/" + a
headers = { "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/90.0.4430.72 Safari/537.36" }
paqv = requests.get(url, headers=headers ).content.decode() sdq1 = etree.HTML(paqv) hrefs = sdq1.xpath('//div[@class = "volume"]/ul/li/a/@href') for href in hrefs: it2 = 'https:'+href resp = requests.get(it2).content.decode() sdq2 = etree.HTML(resp) bcsj = sdq2.xpath('//div[@class = "ywskythunderfont"]/p/text()') for neirong in bcsj: s =''.join(neirong.split()) with open('1.txt','a',encoding='UTF-8') as f: f.write(s+'\n')
|
作者: 我叫史迪奇
本文来自于:
https://sdq3.link/reptile-novel.html博客内容遵循 署名-非商业性使用-相同方式共享 4.0 国际 (CC BY-NC-SA 4.0) 协议