Using Python & Reddit API to Generate Content Ideas at Scale

Photo of author
Written By Ed Roberts

A so-called SEO expert. 

It always amazes me how underutilised Reddit is by marketers and SEOs when it comes to keyword research and content planning. 

Quite often subreddits can provide an untapped goldmine of ideas for a niche. However, manually checking subreddits for popular posts can be time-consuming and repetitive. So let’s automate it.

In this guide, I’m going to show you a quick and simple way of using Python and the Reddit API to generate a list of top-performing posts from any subreddit or group of subreddits. 

This can be incredibly useful for rapidly generating content ideas without the need to trawl Reddit manually. Even better it’s completely free!

The code I’m going to walk you through is very simple, but don’t worry if you’re still put off, as I’ve included a copy of the Google Colab file, (just make a copy) so all you need to do is input your Reddit API info.

Step 1: Setup Your Reddit API

For this script to work the first thing we’re going to need to do is get ourselves set up with the Reddit API. If you haven’t already got a Reddit account you’ll need to go and get yourself signed up.

Now you’ll want to head over to this link https://www.reddit.com/prefs/apps

A screenshot showing where the create app button is located on the bottom lefthand of the page underneath the list of options

Here you’ll want to scroll to the bottom and click create app.

A screenshot showing the form you need to fill out to create an application in Reddit. Each field is described below.

First, let’s name our application, this can be anything, but I have simply opted for testscript.

Then you’ll want to select ‘script’ from the list of options. 

Give a brief description of the purpose of the script, in this case,/ the Python script is designed to quickly pull the top posts from groups of subreddits into a CSV.

You can leave the about field blank and simply populate the redirect field with http://www.example.com/unused/redirect/uri

Then just hit ‘create app’. You’ll want to make note of your client ID which is located in the top left under your app name and your client secret which should be labelled. We’ll be using these later. Now we have registered for the API, we’re ready to move on to the next step.

Step 2: Install and Import Python Modules

Now it’s time to get our script set up in Google Colab. To do this we’ll be installing the Asynchronous Python Reddit API Wrapper (or Async PRAW for short).

!pip install asyncpraw

Next, we’ll want to import Async Praw and Pandas which we’ll be using to format the data we need and export it as a CSV.

import asyncpraw
import pandas as pd

And that’s it in terms of modules, we don’t really need anything else as PRAW will be doing a lot of the work.

Step 3: Create a read-only Reddit instance

In order to use the Reddit API we’re going to need to create a Reddit instance. To do this we need to pass three pieces of information, our client ID, our client secret and a user agent.

The client ID and client secret can be found if you click on edit on your app in Reddit.

A screenshot showing the location of the Client ID which is just below the text saying personal use script. It also shows the location of the client secret which is next to the text that says secret. While these are hidden in the image, they will be a string of letters and numbers.

The ID is located just underneath where it says personal use script and the secret will be funnily enough next to where it says secret. Both codes will be a long string of text, numbers and special characters.

For the user agent you will want to add something descriptive using the recommended format which is –

<platform>:<app ID>:<version string> (by u/<Reddit username>)

So this could be something as simple as python:testscript:v1 (by u/username)

Once you have these three things it’s time to add the instance.

reddit = asyncpraw.Reddit(
    client_id="Paste your client ID from Reddit here",
    client_secret="Paste your client secret from Reddit here",
    user_agent="add a user agent - for example the name of your app followed by your reddit username",
)

Step 4: Select what data we want to pull

First, we want to select the data we want to pull from Reddit. In this case, we want the title of each post, the URL to the post, the number of upvotes and the number of comments.

This info will give us an idea of which topics generated the most engagement and would make good potential pieces for our content plan.

We’re going to store this information in a dictionary so that we can easily transform it into a data frame later on.

top_threads = {'URL':[],'Title':[],'Upvotes':[],'Comments':[]}

Step 5: Choose the subreddits we want to pull data from

Next, we’re going to select the subreddit or group of subreddits we want to pull data from and the criteria for that data.

subreddit = await reddit.subreddit("add the subreddits here")

Here you will want to add the name of the subreddit you want to pull the data from. If you want to pull from multiple subreddits then just add a + between the names, like this –

subreddit = await reddit.subreddit("seo+techseo")

Then we need to decide what criteria we want to set for the data we’re pulling. Do we want the hottest posts, the top-performing ones or maybe the most controversial?

If you want the top posts then use this code –

async for submission in subreddit.top(time_filter="year",limit=25):

You can set the time filter to be  “all”, “day”, “hour”, “month”, “week”, or “year”. By default, it is set to “all”. You can also set any limit on the number of results which are returned or set it to none if you want every post in the subreddit over the timeframe.

If you want the hottest posts, then use this instead –

async for submission in subreddit.hot(limit=25):

For the most controversial over a given time period use this –

async for submission in subreddit.controversial(time_filter="hour")

Now we’ve set the criteria we’re going to want to iterate through each post in the subreddit and store the URL, title, number of upvotes and number of comments in the dictionary we created earlier.

subreddit = await reddit.subreddit("seo+techseo")
async for submission in subreddit.top(time_filter="year",limit=25):
  top_posts["URL"].append(submission.url)
  top_posts["Title"].append(submission.title)
  top_posts["Upvotes"].append(submission.score)
  top_posts["Comments"].append(submission.num_comments)

Step 6: Export the data from Reddit as a CSV

Now we’ve stored the data we want it’s time to export it into a CSV so we can use it in our spreadsheet software of choice.

To do this, we will be using everyone’s favourite module Pandas to convert our dictionary into a data frame.

df = pd.DataFrame.from_dict(top_posts)

Now that the Reddit information is stored in a data frame we can easily export it to a CSV using Pandas to_csv function.

df.to_csv('top-posts.csv', index=False)

You can name the csv file whatever you want. You can then download it from the files section in the top left of Google Colab if you’re using the script I linked.

Once you open it should look something like this –

A spreadsheet with the top posts from the r/SEO subreddit over the last year. The first column contains the URL, the second contains the title of the post, the third contains the number of upvotes and the last contains the number of comments.

Now you can use this to quickly generate content ideas for your particular niche. Simple right? If you’re looking for more free resources for content creation, check out my guide on using the Youtube insights tool for keyword research. If you want to dig more into using Python to automate SEO tasks then take a look at my guide to automating alt text with Python or my tutorial on extracting URLs from sitemaps.