利用feapder解析知乎文章标题和url

行云流水

2022-07-19 / 0 评论 / 262 阅读 / 正在检测是否收录...

07/19

前言

想获取某个博主下的文章列表,通过浏览器查看页面源码，然后保存到文件内。然后通过feader的Selector模块将标题和url解析出来。

python脚本内容

from feapder.network.selector import Selector

with open('a.html', 'r') as f:
    text = f.read()

selector = Selector(text)
r_list = selector.xpath('//div[@class="List-item"]')
for r in r_list:
    title = r.xpath('./div/div/h2/span/a/text()').extract_first()
    url = r.xpath('./div/div/h2/span/a/@href').extract_first()
    aurl = "https:{}".format(url)
    print("{},{}".format(title,aurl))

feapder