前言
想获取某个博主下的文章列表,通过浏览器查看页面源码,然后保存到文件内。然后通过feader的Selector模块将标题和url解析出来。
python脚本内容
from feapder.network.selector import Selector
with open('a.html', 'r') as f:
text = f.read()
selector = Selector(text)
r_list = selector.xpath('//div[@class="List-item"]')
for r in r_list:
title = r.xpath('./div/div/h2/span/a/text()').extract_first()
url = r.xpath('./div/div/h2/span/a/@href').extract_first()
aurl = "https:{}".format(url)
print("{},{}".format(title,aurl))
评论 (0)