RFetcher - Reddit Data Fetcher 🚀

░█████████  ░██████████              ░██               ░██
░██     ░██ ░██                      ░██               ░██
░██     ░██ ░██         ░███████  ░████████  ░███████  ░████████   ░███████  ░██░████ 
░█████████  ░█████████ ░██    ░██    ░██    ░██    ░██ ░██    ░██ ░██    ░██ ░███     
░██   ░██   ░██        ░█████████    ░██    ░██        ░██    ░██ ░█████████ ░██      
░██    ░██  ░██        ░██           ░██    ░██    ░██ ░██    ░██ ░██        ░██      
░██     ░██ ░██         ░███████      ░████  ░███████  ░██    ░██  ░███████  ░██

RFetcher is a powerful Python CLI tool for scraping and categorizing Reddit content with intelligent filtering. Designed for researchers, data scientists, and content analysts, it provides structured access to Reddit discussions while filtering out noise and irrelevant content.

Key Features ✨

Smart Content Filtering 🧠
Automatically skip Reddit-specific references (subreddit links, meta-discussions)
Custom Category Management 🗂️
Define keyword-based categories and filter content dynamically
Multi-Mode Scraping ⚙️
Supports hot/new/top/rising posts with pagination
Comment Processing 💬
Recursive comment scraping with nested replies
Data Organization 📂
Automatic JSON output with timestamps to data/ folder
API-Friendly 🤝
Built-in rate limiting and error handling

Installation & Setup 🛠️

Prerequisites

Python 3.9+
Reddit API credentials

Clone repository:

git clone https://github.com/NouroGhoul/rfetcher.git
cd rfetcher

Install dependencies:

pip install -r requirements.txt

Get Reddit API credentials:
1. Go to Reddit App Preferences
2. Click "Create App" (select "script" type)
3. Note these values:
  - Client ID (under app name)
  - Client Secret (next to "secret")
  - Your Reddit username
  - Your Reddit password
Create .env file:

REDDIT_CLIENT_ID=your_client_id_here
REDDIT_CLIENT_SECRET=your_client_secret_here
REDDIT_USERNAME=your_reddit_username
REDDIT_PASSWORD=your_reddit_password

Usage 🖥️

Start the application:

python fetcher.py

Typical Workflow:

Configure categories (optional):
- Define keyword groups for content filtering
- Example: Programming: python,java,rust
Run fetcher:

==================================================
Reddit Fetcher - Configuration
==================================================
Enter subreddit URL or name: programming

Select post type [1-4]: 1

Number of posts: 50

Fetch Mode [1-3]: 1

Select category:

Available Categories:
1. Programming
2. Technology
3. Web Development

Output Files:

All data saved to data/ folder
Filename format: data/{subreddit}_{category}_{timestamp}.json
Example: data/programming_web_development_20230815_143022.json

Output Structure:

{
  "category": "Web Development",
  "posts": [
    {
      "id": "t3_abc123",
      "title": "React 18 performance improvements",
      "author": "js_dev",
      "selftext": "Discussion about new features...",
      "score": 142,
      "url": "https://reddit.com/...",
      "created_utc": 1689264000,
      "num_comments": 38,
      "comments": [
        {
          "id": "t1_def456",
          "author": "react_fan",
          "body": "This update is game-changing!",
          "score": 42,
          "created_utc": 1689264120,
          "replies": [...]
        }
      ]
    }
  ]
}

File Structure 📁

rfetcher/
├── data/           # Output directory (auto-created)
├── fetcher.py      # Main application
├── .gitignore      # Ignores sensitive data
├── requirements.txt
└── .env            # For API credentials (EXAMPLE - create your own)

Technologies Used 🧰

Core:
Data Handling:
Environment:

Contribution Guidelines 🤝

We welcome contributions! Please follow these steps:

Setup environment:

python -m venv venv
source venv/bin/activate
pip install -r requirements.txt

Development workflow:

git checkout -b feature/your-feature
# Make changes
git commit -m 'Add new feature'
git push origin feature/your-feature

Testing:

Place tests in tests/ directory
Maintain consistent coding style
Include docstrings for new functions
Test edge cases and error handling

License 📄

This project is licensed under the MIT License - see the LICENSE file for details.

Created by: https://github.com/NouroGhoul
For educational purposes only

Important Notes:

Respect Reddit's API Rules
Data is saved in data/ folder - ensure directory exists
Never commit your .env file with credentials
The tool includes rate limiting to comply with API guidelines

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

RFetcher - Reddit Data Fetcher 🚀

Key Features ✨

Installation & Setup 🛠️

Prerequisites

Usage 🖥️

Start the application:

Typical Workflow:

Output Files:

Output Structure:

File Structure 📁

Technologies Used 🧰

Contribution Guidelines 🤝

License 📄

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
data		data
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
fetcher.py		fetcher.py
requirements.txt		requirements.txt

License

NouroGhoul/rfetcher

Folders and files

Latest commit

History

Repository files navigation

RFetcher - Reddit Data Fetcher 🚀

Key Features ✨

Installation & Setup 🛠️

Prerequisites

Usage 🖥️

Start the application:

Typical Workflow:

Output Files:

Output Structure:

File Structure 📁

Technologies Used 🧰

Contribution Guidelines 🤝

License 📄

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages