AI-powered codebase intelligence. Ask questions about any GitHub repo in plain English.
Point CodeLens at any GitHub repository and instantly:
- 💬 Ask questions in natural language
- 🔍 Search code semantically (not just keywords)
- 📖 Explain functions with full context
- 🔗 Understand dependencies between files
# 1. Clone & install
git clone [repo link]
cd codelens && pip install -r requirements.txt
# 2. Add your Groq API key
echo "GROQ_API_KEY=your_key" > .env
# 3. Run
streamlit run streamlit_app.py| Metric | Value |
|---|---|
| Retrieval Latency | 38ms |
| Answer Generation | 1.9s |
| Retrieval Accuracy | 85% |
| Chunks Supported | 2,000+ |
Tested on tiangolo/typer (605 files, 2,117 chunks)
GitHub Repo → AST Parser → Chunker → Embeddings → Vector DB
↓
User Question → Hybrid Search (Dense + BM25) → Reranker → LLM → Answer
Key techniques:
- Hybrid retrieval: 70% semantic + 30% keyword search
- AST-based chunking: Preserves function/class boundaries
- Dependency expansion: Adds related files automatically
- Reciprocal Rank Fusion: Combines multiple search strategies
| Component | Technology |
|---|---|
| LLM | Groq (Llama 3.3 70B) |
| Embeddings | MiniLM-L6-v2 (384 dim) |
| Vector Store | ChromaDB |
| Sparse Search | BM25 |
| Backend | FastAPI |
| Frontend | Streamlit |
Ask anything about the codebase:
- "How does authentication work?"
- "What does the process_data function do?"
- "Where is error handling implemented?"
Combines semantic understanding with keyword matching for best results.
- Explain Function: Detailed breakdown of any function
- Find Similar: Discover similar code patterns
- Usage Analysis: Track where symbols are used
- Auto-Documentation: Generate docs for any file
- Python 3.10+
- Groq API Key (free)
# Clone
git clone https://github.com/amr-khalil/codelens.git
cd codelens
# Install
python -m venv venv
source venv/bin/activate # Windows: venv\Scripts\activate
pip install -r requirements.txt
# Configure
echo "GROQ_API_KEY=your_key" > .env
# Run
streamlit run streamlit_app.pystreamlit run streamlit_app.pypython cli.py ingest https://github.com/tiangolo/typer
python cli.py query "How do I create a CLI command?"
python cli.py chat # Interactive modeuvicorn src.api.main:app --reload
# Index
curl -X POST http://localhost:8000/api/v1/ingest \
-H "Content-Type: application/json" \
-d '{"repo_url": "https://github.com/tiangolo/typer"}'
# Query
curl -X POST http://localhost:8000/api/v1/query \
-H "Content-Type: application/json" \
-d '{"query": "How does argument parsing work?"}'codelens/
├── src/
│ ├── api/ # FastAPI REST API
│ ├── ingestion/ # GitHub loader, AST parser
│ ├── chunking/ # AST & semantic chunkers
│ ├── embeddings/ # Embedding model
│ ├── retrieval/ # Vector store, BM25, hybrid search
│ ├── generation/ # LLM integration, prompts
│ └── utils/ # Config, logging, dependency graph
├── streamlit_app.py # Web UI
├── cli.py # CLI interface
└── api.py # Standalone API
- Hybrid retrieval (dense + sparse)
- AST-based chunking
- Dependency graph expansion
- Multi-language AST (tree-sitter)
- Streaming responses
- Redis caching
- Evaluation metrics (RAGAS)
# Setup
pip install pytest black isort
# Test
pytest tests/ -v
# Format
black src/ && isort src/MIT © 2025 Amr
Built with Groq, ChromaDB, HuggingFace, Streamlit
