Adaptive web crawler designed to bypass bot detection and enable comprehensive AI-assisted research.
- Multiple fetch strategies - Automatically tries different methods to successfully fetch content
- Basic HTTP fetch - Fast, lightweight fetching for unprotected sites
- Stealth mode - Playwright-based browser automation with anti-detection measures
- Wayback Machine fallback - Fetch cached versions when live sites are blocked
- Content extraction - Clean text extraction with CSS selector support
- Metadata extraction - Extract page titles, descriptions, OpenGraph data
- Link extraction - Extract all links from a page
cd ~/code/chameleon
npm install
npm link # Makes 'chameleon' available globally# Simple fetch (auto-selects best method)
chameleon fetch https://example.com
# Save to file
chameleon fetch https://example.com -o output.txt
# Get HTML instead of text
chameleon fetch https://example.com --format html
# Get JSON output with metadata
chameleon fetch https://example.com --format json --metadata# Force basic HTTP fetch
chameleon fetch https://example.com --basic
# Force stealth mode (Playwright)
chameleon fetch https://example.com --stealth
# Force Wayback Machine
chameleon fetch https://g2.com/products/crunchtime/reviews --wayback# Extract specific elements
chameleon fetch https://example.com --extract "h1"
chameleon fetch https://example.com --extract ".review-content"
chameleon fetch https://example.com --extract "blockquote"
# Extract with metadata
chameleon fetch https://example.com --extract "article" --metadata
# Extract all links
chameleon fetch https://example.com --links# List available snapshots
chameleon snapshots https://g2.com/products/crunchtime/reviews
# Fetch from Wayback
chameleon fetch https://g2.com/products/crunchtime/reviews --wayback# Test all fetch methods for a URL
chameleon test https://example.com# Run browser in visible mode
chameleon fetch https://example.com --stealth --headful
# Take screenshot
chameleon fetch https://example.com --stealth --screenshot debug.png
# Increase timeout
chameleon fetch https://slow-site.com --timeout 60000
# Increase wait time for JS rendering
chameleon fetch https://spa-site.com --stealth --wait 5000Clean, extracted text content without HTML tags.
Raw HTML content as received.
Structured output including:
url- Final URL after redirectsstatus- HTTP status codemethod- Fetch method used (basic/stealth/wayback)text- Extracted text contentmetadata- Page metadata (with --metadata flag)links- Extracted links (with --links flag)extracted- Content matching CSS selector (with --extract flag)
- Uses
node-fetchwith browser-like headers - Fastest method, lowest resource usage
- Works for sites without bot protection
- Includes automatic retry with backoff
- Uses Playwright with anti-detection measures
- Bypasses basic JavaScript challenges
- Handles SPAs and JS-rendered content
- Overrides
navigator.webdriverdetection - Randomizes viewport and user agent
- Fetches cached versions from archive.org
- Useful when live site has heavy protection
- May have outdated content
- Doesn't include JS-rendered content
| Detection Method | Our Counter-Measure |
|---|---|
| User-Agent check | Realistic UA rotation |
navigator.webdriver |
Override to undefined |
| Headless detection | Chrome args to mask headless |
| JavaScript challenge | Playwright execution |
| Rate limiting | Retry with backoff |
| IP blocking | Wayback fallback |
| CAPTCHA | Wayback fallback (manual intervention needed) |
| Site | Protection | Recommended Method |
|---|---|---|
| example.com | None | basic |
| crunchtime.com | Low | basic |
| trustpilot.com | Medium | stealth |
| g2.com | High | wayback |
| capterra.com | High | wayback |
| indeed.com | High | wayback |
| glassdoor.com | Very High | wayback (limited) |
This tool is intended for:
- Educational purposes
- Research with proper authorization
- Accessing your own content
- Fetching publicly archived content
Please:
- Respect robots.txt
- Don't hammer servers (built-in delays help)
- Check site Terms of Service
- Consider using official APIs when available
# Run directly without global install
node src/index.js fetch https://example.com
# Run tests
npm test"Bot detection triggered"
- Try
--stealthmode - Try
--waybackfor cached version - Site may require residential proxies (not yet implemented)
"Timeout"
- Increase timeout:
--timeout 60000 - Site may be slow or blocking
"No Wayback snapshot"
- URL may not have been archived
- Try
chameleon snapshots <url>to see available dates
Stealth mode fails
- Try
--headfulto see what's happening - Take
--screenshotfor debugging - Some sites detect even stealth browsers
- Proxy rotation support
- CAPTCHA solving integration
- Cookie persistence between sessions
- Rate limiting controls
- Parallel fetching
- Site-specific extractors (G2, Capterra, etc.)