Scrape images, galleries, and top-level comments from any subreddit
Lightweight Python tool using your browser session cookies — no Reddit API credentials required. Works with public, private, and NSFW subreddits (subject to your session permissions).
Quick StartQuick Start
- Install from PyPI
pip install redditminer - Export your Reddit cookies
Save your Reddit browser cookies ascookies.txtin the working directory (see the Installation section below). - Run example commands
redditminer --subreddit funny --limit 200 --sort top
Features
- Cookie Authentication: use your browser session cookies for seamless access.
- Image & Gallery Support: extracts direct image links and all gallery images.
- Deep Pagination: efficiently fetches large numbers of posts with Reddit pagination.
- CLI: specify subreddit and options directly from the command line.
- Rate-limit friendly: retries and backoff for 429 responses.
Installation
Clone the repository for development, or install from PyPI for regular use.
From PyPI
pip install redditminerFrom source (development)
git clone https://github.com/MisbahKhan0009/RedditMiner.git
cd RedditMiner
pip install -e .Export your Reddit cookies
Log into Reddit, use a browser extension (e.g., "Get cookies.txt") to export cookies for reddit.com, and save the file as cookies.txt in your working directory.
Usage
Run the scraper with your desired subreddit:
redditminer --subreddit EarthPornOptional output modes
post(default): Full post data (JSON).image_url: Only image URLs (TXT file).post_with_comments: Full post data with top-level comments (JSON).
Command-line options
--subreddit: Subreddit name to scrape (required)--limit: Number of posts to scrape (default: 100)--sort: Sort order (new, hot, top; default: new)--output-mode: Output format (post, image_url, post_with_comments)--with-comment: Include top-level comments (JSON outputs only)--download-images: Download all found images--output-dir: Directory to save images (default: images)--max-workers: Number of parallel downloads (default: 8)
Rate Limiting
If Reddit returns 429 responses, the scraper will wait and retry (default backoff ~60s). For best results avoid running many concurrent scrapers and consider refreshing cookies if rate limiting persists.
Examples
-
Scrape 200 top posts and save as JSON
redditminer --subreddit funny --limit 200 --sort top -
Scrape only image URLs (TXT)
redditminer --subreddit funny --output-mode image_url -
Scrape posts with top-level comments
redditminer --subreddit funny --output-mode post --with-comment -
Scrape and download images
redditminer --subreddit funny --output-mode image_url --download-images --output-dir my_images --max-workers 16
Downloaded images are organized by subreddit (e.g., images/EarthPorn/ or <output-dir>/EarthPorn/).
Project Structure
RedditMiner/
│
├── redditminer/
│ ├── __init__.py
│ └── scraper.py # Core scraping logic and RedditImageScraper class
│
├── main.py # Command-line entry point
├── cookies.txt # Your exported Reddit cookies (not checked in)
├── README.md
└── ...
Outputs
- JSON:
output/images_[subreddit]_[timestamp].json - TXT (image URLs):
output/images_[subreddit]_[timestamp].txt - Downloaded images:
images/<subreddit>/(or <output-dir>)
Contributing
Contributions are welcome — open issues or submit pull requests for new features, bug fixes, or improvements.
License & Disclaimer
This project is licensed under the MIT License. See the LICENSE file for details.
This tool is intended for personal and educational use. Please respect Reddit's Terms of Service and do not use this tool for spamming or violating site rules.