RedditMiner

Scrape images, galleries, and top-level comments from any subreddit

Lightweight Python tool using your browser session cookies — no Reddit API credentials required. Works with public, private, and NSFW subreddits (subject to your session permissions).

Quick Start

Quick Start

  1. Install from PyPI
    pip install redditminer
  2. Export your Reddit cookies
    Save your Reddit browser cookies as cookies.txt in the working directory (see the Installation section below).
  3. Run example commands
    redditminer --subreddit funny --limit 200 --sort top

Features

  • Cookie Authentication: use your browser session cookies for seamless access.
  • Image & Gallery Support: extracts direct image links and all gallery images.
  • Deep Pagination: efficiently fetches large numbers of posts with Reddit pagination.
  • CLI: specify subreddit and options directly from the command line.
  • Rate-limit friendly: retries and backoff for 429 responses.

Installation

Clone the repository for development, or install from PyPI for regular use.

From PyPI

pip install redditminer

From source (development)

git clone https://github.com/MisbahKhan0009/RedditMiner.git cd RedditMiner pip install -e .

Export your Reddit cookies

Log into Reddit, use a browser extension (e.g., "Get cookies.txt") to export cookies for reddit.com, and save the file as cookies.txt in your working directory.

Usage

Run the scraper with your desired subreddit:

redditminer --subreddit EarthPorn

Optional output modes

  • post (default): Full post data (JSON).
  • image_url: Only image URLs (TXT file).
  • post_with_comments: Full post data with top-level comments (JSON).

Command-line options

  • --subreddit : Subreddit name to scrape (required)
  • --limit : Number of posts to scrape (default: 100)
  • --sort : Sort order (new, hot, top; default: new)
  • --output-mode : Output format (post, image_url, post_with_comments)
  • --with-comment : Include top-level comments (JSON outputs only)
  • --download-images : Download all found images
  • --output-dir : Directory to save images (default: images)
  • --max-workers : Number of parallel downloads (default: 8)

Rate Limiting

If Reddit returns 429 responses, the scraper will wait and retry (default backoff ~60s). For best results avoid running many concurrent scrapers and consider refreshing cookies if rate limiting persists.

Examples

  • Scrape 200 top posts and save as JSON
    redditminer --subreddit funny --limit 200 --sort top
  • Scrape only image URLs (TXT)
    redditminer --subreddit funny --output-mode image_url
  • Scrape posts with top-level comments
    redditminer --subreddit funny --output-mode post --with-comment
  • Scrape and download images
    redditminer --subreddit funny --output-mode image_url --download-images --output-dir my_images --max-workers 16

Downloaded images are organized by subreddit (e.g., images/EarthPorn/ or <output-dir>/EarthPorn/).

Project Structure

RedditMiner/
│
├── redditminer/
│   ├── __init__.py
│   └── scraper.py         # Core scraping logic and RedditImageScraper class
│
├── main.py                # Command-line entry point
├── cookies.txt            # Your exported Reddit cookies (not checked in)
├── README.md
└── ...
        

Outputs

  • JSON: output/images_[subreddit]_[timestamp].json
  • TXT (image URLs): output/images_[subreddit]_[timestamp].txt
  • Downloaded images: images/<subreddit>/ (or <output-dir>)

Contributing

Contributions are welcome — open issues or submit pull requests for new features, bug fixes, or improvements.

License & Disclaimer

This project is licensed under the MIT License. See the LICENSE file for details.

This tool is intended for personal and educational use. Please respect Reddit's Terms of Service and do not use this tool for spamming or violating site rules.