AI and Machine Learning

Reddit Scraper with Bittensor POC

This article explores how Bittensor, a blockchain platform that incentivizes creation of digital goods, can be used in conjunction with data scraping services. The presented Proof of Concept (POC) demonstrates how Bittensor subnets, which are self-contained economic systems, can be utilized to reward the creation of valuable data within a Reddit scraping service. While the POC only retrieves basic post metadata and comments, it paves the way for future integration of Bittensor with various services.

Daniel Rodriguez
March 20, 2024

Overview

In this blog post, we introduce a Proof of Concept (POC) that combines the functionality of Reddit scraping with Bittensor, a blockchain network designed to incentivize the creation of digital commodities. This project aims to showcase the integration of Bittensor's self-contained incentive mechanisms, known as subnets, with a Reddit scraping service.

The primary goal of this project is to demonstrate how Bittensor subnets can be utilized to incentivize the creation of value within the context of a Reddit scraping service. The functionality of this POC is limited to searching and retrieving metadata from subreddit posts and some top-level comments for each post.

What is Bittensor?

In a nutshell, Bittensor is a blockchain platform that hosts multiple self-contained incentive mechanisms known as subnets. These subnets act as arenas where subnet miners generate value, and subnet validators establish consensus.This collaboration determines the fair distribution of TAO tokens, incentivizing the creation of digital commodities like intelligence or data within each subnet.

Each subnet comprises subnet miners and subnet validators, interacting with each other through a specific protocol that forms part of the incentive mechanism. The Bittensor API facilitates this interaction between subnet miners, subnet validators, and Bittensor's on-chain consensus engine called Yuma Consensus. The Yuma Consensus is specifically designed to foster agreement among subnet validators and subnet miners regarding value creation and its corresponding worth within the ecosystem.

Prerequisites

  1. Python >=3.10: Using a lower python version is possible, but it may require some modifications, as we make use of match/case statements which were introduced on this version.
  2. Linux environment: At the time of writing this, the prerequisite instructions for running bitensor are only available for Linux, and although this may work on other environments it is not guaranteed, if you don’t have a Linux installation, you may want to use a VM, WSL2 on Windows is also a great alternative.
  3. Bittensor: We need to install the bittensor package for python, for this POC I’m using the version 6.8.2.
  4. Install Substrate Dependencies: Begin by installing the required dependencies for running a Substrate node:
sudo apt update

sudo apt install --assume-yes make build-essential git clang curl libssl-dev llvm libudev-dev protobuf-compiler
  1. Install Rust and Cargo: Rust is the programming language used in Substrate development. Cargo is Rust's package manager
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh source "$HOME/.cargo/env"
  1. Clone the Subtensor Repository: Fetch the subtensor codebase to your local machine, we will use this to run our local test environment (We just have to do this step once, after which we just run the last command to star the Subtensor)
git clone https://github.com/opentensor/subtensor.git
./subtensor/scripts/init.sh
cargo build --release --features pow-faucet
BUILD_BINARY=0 ./scripts/localnet.sh
  1. Reddit’s API envs: we will also need an API client token and a secret, we can get this by registering in Reddit and then creating an app here: https://reddit.com/prefs/apps/ I suggest moving these to an .env file and using a virtual environment such as pipenv to load them dynamically, otherwise you may load them manually using these commands:
export CLIENT_ID=<your_client_id>
export CLIENT_SECRET=<your_client_secret>
UILD_BINARY=0 ./scripts/localnet.sh
  1. Set up wallets and faucet tokens: Finally we will need to set up a wallet and faucet tokens, luckily bittensor offers a guide on how we can achieve this, which also includes the step where we register our subnet in a local environment so we can test it.

Implementation

Now we get into the juicy part, for this POC we are going to use the official subnet template, in this template we will have, among many other things, 3 very important files which we can modify in order to create our own subnet and incentive mechanism. These files are:

  • template/protocol.py: Contains the definition of the protocol used by subnet miners and subnet validators.
  • neurons/miner.py: Script that defines the subnet miner's behavior, i.e., how the subnet miner responds to requests from subnet validators.
  • neurons/validator.py: This script defines the subnet validator's behavior, i.e., how the subnet validator requests information from the subnet miners and determines the scores.

All the code for this POC will be available on Github.

For our template/protocol.py we will create a simple class which will represent our input parameters and when the request is fulfilled it will also contain the output, this object is called a Synapse:

class RedditProtocol(bt.Synapse):
    # Required request input, filled by sending dendrite caller.
    subreddit: str
    # Optional request input, filled by sending dendrite caller.
    sort_by: Optional[Literal['hot', 'new', 'rising', 'random_rising']] = 'new'
    limit: Optional[int] = 10

    # Optional request output, filled by recieving axon.
    output: Optional[List[dict]] = None

For this example we will keep it simple, we take 1 required parameter which is the subreddit we are going to scrape and two optional parameters which are the sorting field and the amount of posts that we want as a result of the scraping.

In our neurons/miner.py we find the main logic on how we handle the requests. At its core, mining in the context of Bittensor refers to the process of contributing computational resources to validate transactions, secure the network, and earn rewards in return. Unlike traditional mining in Proof-of-Work (PoW) blockchains like Bitcoin, where miners solve complex mathematical puzzles, Bittensor's mining is intricately tied to the validation and consensus mechanisms within its unique ecosystem. For now, we will only focus on the forward method:

from reddit_data import process_reddit
/.../
class Miner(BaseMinerNeuron):
/.../
    async def forward(self, synapse) -> template.protocol.RedditProtocol:
        posts = process_reddit(synapse.subreddit, synapse.sort_by, synapse.limit)
        synapse.output = posts
        return synapse

Here we take the synapse as an argument, get the parameters and pass them to a function which handles the scraping, the output is then updated on our synapse and returned. The process_reddit function goes as follows:

import praw
from praw.models import Comment
/.../
reddit_client = praw.Reddit(
    client_id=os.getenv('CLIENT_ID'),
    client_secret=os.getenv('CLIENT_SECRET'),
    user_agent="Mozilla/5.0 (Windows NT 6.2; rv:20.0) Gecko/20121202 Firefox/20.0",
)
/.../
def process_reddit(subreddit_name: str, sort_by: str = "new", limit: int = 10):
    start_time = datetime.now()

    try:
        subreddit = reddit_client.subreddit(subreddit_name)
        match sort_by:
            case 'hot':
                result = [submission_to_dict(submission) for submission in subreddit.hot(limit=limit)]
            case 'new':
                result = [submission_to_dict(submission) for submission in subreddit.new(limit=limit)]
            case 'rising':
                result = [submission_to_dict(submission) for submission in subreddit.rising(limit=limit)]
            case 'random_rising':
                result = [submission_to_dict(submission) for submission in subreddit.random_rising(limit=limit)]

        bt.logging.success(
            f'Process finished. Elapsed {(datetime.now() - start_time)}.'
        )
    except praw.exceptions.NotFound:
        bt.logging.error("Subreddit not found", subreddit_name)

    return result

There is a LOT to unpack here, so bear with me:

  • First we initialize our API Wrapper, which is praw with our Reddit’s credentials.
  • After that we declare a function which then initializes a timer we later use for logging purposes.
  • We then use a match/case statement to determine which method to use for scrapping
  • We store the result using  a list comprehension expression.
  • Additionally we use a function called submission_to_dict which serializes our response from the API into a python dictionary.
def submission_to_dict(submission) -> dict:
    return {
        "author": submission.author.name if submission.author else "Anonymous",
        "author_flair_text": submission.author_flair_text,
        "clicked": submission.clicked,
    }

We are almost ready, all we just need is to make a client to start making requests to our subnet miners, create a template/client.py:

async def query_synapse(subreddit, category, limit, uid, wallet_name, hotkey, network, netuid):
    syn = RedditProtocol(
        subreddit=subreddit,
        sort_by=category,
        limit=limit,
    )

    # create a wallet instance with provided wallet name and hotkey
    wallet = bt.wallet(name=wallet_name, hotkey=hotkey)

    # instantiate the metagraph with provided network and netuid
    metagraph = bt.metagraph(
        netuid=netuid, network=network, sync=True, lite=False
    )

    # Grab the axon you're serving
    axon = metagraph.axons[uid]

    # Create a Dendrite instance to handle client-side communication.
    dendrite = bt.dendrite(wallet=wallet)

  async def main():
        responses = await dendrite(
            [axon], syn, deserialize=False, streaming=False, timeout=30
        )

        print(responses)

    # Run the main function with asyncio
    await main()

This is less intimidating than what it looks like, essentially we first instantiate our synapse with the necessary arguments so that the Miner can later fulfill our request, followed by some boilerplate code which instantiates the necessary parameters (such as the wallet, metagraph and axon) to create a dendrite.

Before we continue, let’s understand, what are dendrites and axons? The bittensor documentation explains “Axon is a server instance. Hence a subnet validator will instantiate a dendrite client on itself to transmit information to axons that are on the subnet miners." You may think of dendrites as a distribution center, where it will distribute our synapse objects to axons, our Miner will then instantiate its own axon in order to receive and process our synapse.

As for metagraph it is a very useful object which contains metadata information about the subnet such as registered miners, validators and axons, you may inspect a metagraph instance without participating in a subnet (which can be useful to get information on the mainnet or testnet).

Conclusion

This blog post introduced a Proof of Concept (POC) project that combines Reddit scraping functionality with Bittensor subnets. While this POC is limited in functionality, it serves as a foundation for future developments and expansions in integrating Bittensor with various other services and applications. 

We are Azumo
and we get it

We understand the struggle of finding the right software development team to build your service or solution.

Since our founding in 2016 we have heard countless horror stories of the vanishing developer, the never-ending late night conference calls with the offshore dev team, and the mounting frustration of dealing with buggy code, missed deadlines and poor communication. We built Azumo to solve those problems and offer you more. We deliver well trained, senior developers, excited to work, communicate and build software together that will advance your business.

Want to see how we can deliver for you?

schedule my call

Benefits You Can Expect

Release software features faster and maintain apps with Azumo. Our developers are not freelancers and we are not a marketplace. We take pride in our work and seat dedicated Azumo engineers with you who take ownership of the project and create valuable solutions for you.

Industry Experts

Businesses across industries trust Azumo. Our expertise spans industries from healthcare, finance, retail, e-commerce, media, education, manufacturing and more.

Illustration of globe for technology nearshore software development outsourcing

Real-Time Collaboration

Enjoy seamless collaboration with our time zone-aligned developers. Collaborate, brainstorm, and share feedback easily during your working hours.

vCTO Solution Illustration

Boost Velocity

Increase your development speed. Scale your team up or down as you need with confidence, so you can meet deadlines and market demand without compromise.

Illustration of bullseye for technology nearshore software development outsourcing

Agile Approach

We adhere to strict project management principles that guarantee outstanding software development results.

Quality Code

Benefits from our commitment to quality. Our developers receive continuous training, so they can deliver top-notch code.

Flexible Models

Our engagement models allow you to tailor our services to your budget, so you get the most value for your investment.

Client Testimonials

Zynga

Azumo has been great to work with. Their team has impressed us with their professionalism and capacity. We have a mature and sophisticated tech stack, and they were able to jump in and rapidly make valuable contributions.

Zynga
Drew Heidgerken
Director of Engineering
Zaplabs

We worked with Azumo to help us staff up our custom software platform redevelopment efforts and they delivered everything we needed.

Zaplabs
James Wilson
President
Discovery Channel

The work was highly complicated and required a lot of planning, engineering, and customization. Their development knowledge is impressive.

Discovery Channel
Costa Constantinou
Senior Product Manager
Twitter

Azumo helped my team with the rapid development of a standalone app at Twitter and were incredibly thorough and detail oriented, resulting in a very solid product.

Twitter
Seth Harris
Senior Program Manager
Wine Enthusiast

Azumo's staff augmentation service has greatly expanded our digital custom publishing capabilities. Projects as diverse as Skills for Amazon Alexa to database-driven mobile apps are handled quickly, professionally and error free.

Wine Enthusiast Magazine
Greg Remillard
Executive Director
Zemax

So much of a successful Cloud development project is the listening. The Azumo team listens. They clearly understood the request and quickly provided solid answers.

Zemax
Matt Sutton
Head of Product

How it Works

schedule my call

Step 1: Schedule your call

Find a time convenient for you to discuss your needs and goals

Step 2: We review the details

We estimate the effort, design the team, and propose a solution for you to collaborate.

Step 3: Design, Build, Launch, Maintain

Seamlessly partner with us to confidently build software nearshore

We Deliver Every Sprint

Time Zone Aligned Developers

Our nearshore developers collaborate with you throughout your working day.

Experienced Engineers

We hire mid-career software development professionals and invest in them.

Transparent Communication

Good software is built on top of honest, english-always communication.

We Build Like Owners

We boost velocity by taking a problem solvers approach to software development.

You Get Consistent Results

Our internal quality assurance process ensures we push good working code.

Agile Project Management

We follow strict project management principles so we remain aligned to your goals