RedditMiner — Subreddit Image Scraper

Scrape images, galleries, and top-level comments from any subreddit

Lightweight Python tool using your browser session cookies — no Reddit API credentials required. Works with public, private, and NSFW subreddits (subject to your session permissions).

Quick Start

Install from PyPI
pip install redditminer
Export your Reddit cookies
Save your Reddit browser cookies as cookies.txt in the working directory (see the Installation section below).
Run example commands

redditminer --subreddit funny --limit 200 --sort top

Features

Cookie Authentication: use your browser session cookies for seamless access.
Image & Gallery Support: extracts direct image links and all gallery images.
Deep Pagination: efficiently fetches large numbers of posts with Reddit pagination.
CLI: specify subreddit and options directly from the command line.
Rate-limit friendly: retries and backoff for 429 responses.

Installation

Clone the repository for development, or install from PyPI for regular use.

From PyPI

pip install redditminer

From source (development)

git clone https://github.com/MisbahKhan0009/RedditMiner.git
cd RedditMiner
pip install -e .

Export your Reddit cookies

Log into Reddit, use a browser extension (e.g., "Get cookies.txt") to export cookies for reddit.com, and save the file as cookies.txt in your working directory.

Usage

Run the scraper with your desired subreddit:

redditminer --subreddit EarthPorn

Optional output modes

post (default): Full post data (JSON).
image_url: Only image URLs (TXT file).
post_with_comments: Full post data with top-level comments (JSON).

Command-line options

--subreddit : Subreddit name to scrape (required)
--limit : Number of posts to scrape (default: 100)
--sort : Sort order (new, hot, top; default: new)
--output-mode : Output format (post, image_url, post_with_comments)
--with-comment : Include top-level comments (JSON outputs only)
--download-images : Download all found images
--output-dir : Directory to save images (default: images)
--max-workers : Number of parallel downloads (default: 8)

Rate Limiting

If Reddit returns 429 responses, the scraper will wait and retry (default backoff ~60s). For best results avoid running many concurrent scrapers and consider refreshing cookies if rate limiting persists.

Examples

Scrape 200 top posts and save as JSON
redditminer --subreddit funny --limit 200 --sort top
Scrape only image URLs (TXT)
redditminer --subreddit funny --output-mode image_url
Scrape posts with top-level comments
redditminer --subreddit funny --output-mode post --with-comment
Scrape and download images
redditminer --subreddit funny --output-mode image_url --download-images --output-dir my_images --max-workers 16

Downloaded images are organized by subreddit (e.g., images/EarthPorn/ or <output-dir>/EarthPorn/).

Project Structure

RedditMiner/
│
├── redditminer/
│   ├── __init__.py
│   └── scraper.py         # Core scraping logic and RedditImageScraper class
│
├── main.py                # Command-line entry point
├── cookies.txt            # Your exported Reddit cookies (not checked in)
├── README.md
└── ...

Outputs

JSON: output/images_[subreddit]_[timestamp].json
TXT (image URLs): output/images_[subreddit]_[timestamp].txt
Downloaded images: images/<subreddit>/ (or <output-dir>)

Contributing

Contributions are welcome — open issues or submit pull requests for new features, bug fixes, or improvements.

License & Disclaimer

This project is licensed under the MIT License. See the LICENSE file for details.

This tool is intended for personal and educational use. Please respect Reddit's Terms of Service and do not use this tool for spamming or violating site rules.