Chameleon

Adaptive web crawler designed to bypass bot detection and enable comprehensive AI-assisted research.

Features

Multiple fetch strategies - Automatically tries different methods to successfully fetch content
Basic HTTP fetch - Fast, lightweight fetching for unprotected sites
Stealth mode - Playwright-based browser automation with anti-detection measures
Wayback Machine fallback - Fetch cached versions when live sites are blocked
Content extraction - Clean text extraction with CSS selector support
Metadata extraction - Extract page titles, descriptions, OpenGraph data
Link extraction - Extract all links from a page

Installation

cd ~/code/chameleon
npm install
npm link  # Makes 'chameleon' available globally

Usage

Basic fetch

# Simple fetch (auto-selects best method)
chameleon fetch https://example.com

# Save to file
chameleon fetch https://example.com -o output.txt

# Get HTML instead of text
chameleon fetch https://example.com --format html

# Get JSON output with metadata
chameleon fetch https://example.com --format json --metadata

Force specific fetch method

# Force basic HTTP fetch
chameleon fetch https://example.com --basic

# Force stealth mode (Playwright)
chameleon fetch https://example.com --stealth

# Force Wayback Machine
chameleon fetch https://g2.com/products/crunchtime/reviews --wayback

Content extraction

# Extract specific elements
chameleon fetch https://example.com --extract "h1"
chameleon fetch https://example.com --extract ".review-content"
chameleon fetch https://example.com --extract "blockquote"

# Extract with metadata
chameleon fetch https://example.com --extract "article" --metadata

# Extract all links
chameleon fetch https://example.com --links

Wayback Machine

# List available snapshots
chameleon snapshots https://g2.com/products/crunchtime/reviews

# Fetch from Wayback
chameleon fetch https://g2.com/products/crunchtime/reviews --wayback

Test which methods work

# Test all fetch methods for a URL
chameleon test https://example.com

Debugging

# Run browser in visible mode
chameleon fetch https://example.com --stealth --headful

# Take screenshot
chameleon fetch https://example.com --stealth --screenshot debug.png

# Increase timeout
chameleon fetch https://slow-site.com --timeout 60000

# Increase wait time for JS rendering
chameleon fetch https://spa-site.com --stealth --wait 5000

Output Formats

Text (default)

Clean, extracted text content without HTML tags.

HTML

Raw HTML content as received.

JSON

Structured output including:

url - Final URL after redirects
status - HTTP status code
method - Fetch method used (basic/stealth/wayback)
text - Extracted text content
metadata - Page metadata (with --metadata flag)
links - Extracted links (with --links flag)
extracted - Content matching CSS selector (with --extract flag)

Fetch Methods

Basic (`--basic`)

Uses node-fetch with browser-like headers
Fastest method, lowest resource usage
Works for sites without bot protection
Includes automatic retry with backoff

Stealth (`--stealth`)

Uses Playwright with anti-detection measures
Bypasses basic JavaScript challenges
Handles SPAs and JS-rendered content
Overrides navigator.webdriver detection
Randomizes viewport and user agent

Wayback (`--wayback`)

Fetches cached versions from archive.org
Useful when live site has heavy protection
May have outdated content
Doesn't include JS-rendered content

Bot Detection Methods (and how we handle them)

Detection Method	Our Counter-Measure
User-Agent check	Realistic UA rotation
`navigator.webdriver`	Override to undefined
Headless detection	Chrome args to mask headless
JavaScript challenge	Playwright execution
Rate limiting	Retry with backoff
IP blocking	Wayback fallback
CAPTCHA	Wayback fallback (manual intervention needed)

Site Compatibility

Site	Protection	Recommended Method
example.com	None	basic
crunchtime.com	Low	basic
trustpilot.com	Medium	stealth
g2.com	High	wayback
capterra.com	High	wayback
indeed.com	High	wayback
glassdoor.com	Very High	wayback (limited)

Legal & Ethical Notes

This tool is intended for:

Educational purposes
Research with proper authorization
Accessing your own content
Fetching publicly archived content

Please:

Respect robots.txt
Don't hammer servers (built-in delays help)
Check site Terms of Service
Consider using official APIs when available

Development

# Run directly without global install
node src/index.js fetch https://example.com

# Run tests
npm test

Troubleshooting

"Bot detection triggered"

Try --stealth mode
Try --wayback for cached version
Site may require residential proxies (not yet implemented)

"Timeout"

Increase timeout: --timeout 60000
Site may be slow or blocking

"No Wayback snapshot"

URL may not have been archived
Try chameleon snapshots <url> to see available dates

Stealth mode fails

Try --headful to see what's happening
Take --screenshot for debugging
Some sites detect even stealth browsers

Future Enhancements

Proxy rotation support
CAPTCHA solving integration
Cookie persistence between sessions
Rate limiting controls
Parallel fetching
Site-specific extractors (G2, Capterra, etc.)

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
config		config
src		src
tests		tests
.gitignore		.gitignore
LEARNINGS.md		LEARNINGS.md
README.md		README.md
package.json		package.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Chameleon

Features

Installation

Usage

Basic fetch

Force specific fetch method

Content extraction

Wayback Machine

Test which methods work

Debugging

Output Formats

Text (default)

HTML

JSON

Fetch Methods

Basic (`--basic`)

Stealth (`--stealth`)

Wayback (`--wayback`)

Bot Detection Methods (and how we handle them)

Site Compatibility

Legal & Ethical Notes

Development

Troubleshooting

Future Enhancements

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

hacktoolkit/chameleon

Folders and files

Latest commit

History

Repository files navigation

Chameleon

Features

Installation

Usage

Basic fetch

Force specific fetch method

Content extraction

Wayback Machine

Test which methods work

Debugging

Output Formats

Text (default)

HTML

JSON

Fetch Methods

Basic (--basic)

Stealth (--stealth)

Wayback (--wayback)

Bot Detection Methods (and how we handle them)

Site Compatibility

Legal & Ethical Notes

Development

Troubleshooting

Future Enhancements

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Basic (`--basic`)

Stealth (`--stealth`)

Wayback (`--wayback`)

Packages