- Published on
使用praw获取reddit内容
- Authors
- Name
- Shelton Ma
reddit本身提供官方接口访问Reddit Data API Wiki, 并且从2023-07开始对请求做了限制
这里借助PRAW请求
reddit app
注册地址: https://www.reddit.com/prefs/apps
限制: 600次/10min, 通过接口headers的x-ratelimit-used/x-ratelimit-reset/x-ratelimit-remaining
限制访问速度, 注册多个app可以提高速度
PRAW
关于429报错
当时临界点处理不太严谨, 还是可能触发429, 所以需要修改下源码
# /lib/python3.9/site-packages/prawcore/rate_limit.py
# 当剩余次数小于3次, 就等待重置, 避免触发429请求
def update(self, response_headers):
...
self.reset_timestamp = now + seconds_to_reset
# 增加以下内容
if self.remaining <= 3:
self.next_request_timestamp = self.reset_timestamp
return
关于并发
因为reddit通过app限制并发, 可以创建多个app, 多进程的形式进行, praw不是线程安全
关于评论数
reddit会对一些评论做隐藏/折叠操作, 所以实际获取到评论数可能会少于展示的评论数
Demo
# 创建实例
import praw
reddit = praw.Reddit(
client_id="xxx",
client_secret="xxx",
password="",
user_agent="xxx (by u/USERNAME)",
username="",
)
# 获取popular最新的100个帖子 hot/top
for submission in reddit.subreddit("popular").new(limit=100):
print(submission.title)
# 返回新讨论, 最开始返回100
for submission in reddit.subreddit("popular").stream.submissions():
print(submission)
# 持续取回新评论
# https://praw.readthedocs.io/en/stable/code_overview/other/subredditstream.html#praw.models.reddit.subreddit.SubredditStream
for comment in reddit.subreddit("popular").stream.comments():
print(comment)
# 批量获取评论
submission = reddit.submission("15ztk07")
# submission.comments 包含MoreComments, 使用replace_more可以替换MoreComments为comments
submission.comments.replace_more(limit=None)
comments = submission.comments.list()
# 评论数验证, 实际获取到评论数可能会少于展示的评论数
submission.num_comments == len(comments)