Skip to content

TomzxCode/code-lod

Repository files navigation

Code LoD

Code description targeted for LLMs.

Code LoD (Levels of Detail) is a CLI tool that generates, manages, and updates detailed descriptions of code entities such as functions, classes, modules, packages, and projects. By leveraging LLMs and AST-based change detection, Code LoD provides multi-level descriptions that help LLMs understand codebases more effectively.

Features

  • Multi-language support: Parses 20+ languages via tree-sitter
  • AST-based hashing: Detects semantic changes, ignores cosmetic formatting
  • Staleness tracking: Knows exactly which descriptions need updating
  • Revert detection: Recognizes when code reverts to a previous version
  • Git hooks: Pre-commit integration to ensure descriptions stay fresh
  • LLM-consumable output: Export descriptions in text, JSON, or markdown

Installation

uv add code-lod

Quick Start

# Initialize in your project
code-lod init

# Generate descriptions
code-lod generate

# Check status
code-lod status

# Output for LLM consumption
code-lod read --format json

Why Code LoD?

Reading a project's README and source code works for small projects, but becomes impractical as codebases grow. Code LoD provides several advantages:

Incremental understanding at scale

  • LLMs have finite context windows. A large codebase won't fit entirely in context.
  • Code LoD provides hierarchical summaries (project → package → module → class → function) that let you load only the relevant detail level.
  • Drill down from high-level architecture to specific implementation as needed.

Targeted descriptions for LLMs

  • READMEs are written for humans. Code LoD descriptions are written for LLMs—focusing on structure, dependencies, contracts, and behavior.
  • Avoids conversational fluff and marketing language that wastes tokens.

Semantic change detection

  • AST-based hashing means descriptions only update when code behavior changes, not when you add whitespace or reformat.
  • Revert detection recognizes when code returns to a previous state, avoiding unnecessary regeneration.

Staleness tracking

  • Know exactly which descriptions are out-of-date without regenerating everything.
  • Pre-commit hooks ensure descriptions never become stale.

Dual storage

  • Database enables fast queries and staleness tracking.
  • .lod files alongside source code let you version-control descriptions and read them inline with the code they describe.

Commands

Command Description
init Initialize code-lod in the current directory
generate Generate descriptions for code entities
status Show freshness status of descriptions
validate Validate description freshness
update Update stale descriptions
read Output descriptions in LLM-consumable format
config Get or set configuration values
config set-model Configure LLM models per scope
install-hook Install git pre-commit hook
uninstall-hook Remove git hook
clean Remove all code-lod data

Architecture

Code LoD generates, manages, and updates code descriptions through a multi-layered pipeline:

  1. Parsing (parsers/): Tree-sitter based parsers extract code entities (functions, classes, modules) with AST hashes
  2. Hashing (hashing.py): AST hashes are computed on normalized source to detect semantic changes
  3. Staleness Tracking (staleness.py): Uses the hash index to determine if descriptions need regeneration
  4. Generation (llm/description_generator/): LLM provider implementations (OpenAI, Anthropic, Ollama, Mock) with auto-detection and scope-specific model selection
  5. Storage (db.py, lod_file/): Dual storage system with SQLite database and .lod files

Storage

Code LoD uses a dual storage system:

  1. SQLite database (hash_index.db) - Stores metadata, hashes, and descriptions with staleness tracking (should not be version controlled)
  2. .lod files - Structured comment files alongside source code with @lod annotations

Descriptions are organized by hierarchical scope: project > package > module > class > function.

LLM Provider Configuration

Code LoD supports multiple LLM providers for generating descriptions:

Supported Providers

  • OpenAI: GPT-4, GPT-4o, GPT-3.5-turbo
  • Anthropic: Claude Sonnet, Claude Haiku, Claude Opus
  • Ollama: Local models (e.g., llama2, mistral, codellama)
  • Mock: Placeholder descriptions for testing (no API key required)

Configuration

Set your API key via environment variables:

# For OpenAI
export OPENAI_API_KEY="sk-..."

# For Anthropic
export ANTHROPIC_API_KEY="sk-ant-..."

Code LoD auto-detects the available provider from environment variables. Configure different models for different scopes:

# Set model for all scopes
code-lod config set-model --provider openai --model gpt-4o

# Set model for specific scope
code-lod config set-model --scope function --provider openai --model gpt-4o
code-lod config set-model --scope project --provider anthropic --model claude-sonnet

For Ollama (local models):

code-lod config set-model --provider ollama --model codellama

Development

# Install dependencies
uv sync

# Run tests
uv run pytest

# Lint and format
uv run ruff check .
uv run ruff format .

License

MIT

About

Your code at different levels of detail

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages