This project aims to help programmers write documentation for their code, using a fine-tuned T5 model to generate readable documentation from code chunks. The training is done on the command line and the documentation is generated on a web interface through Streamlit.
- 🔧 Advanced Configuration System: Centralized configuration management with environment variable support
- ⚙️ Flexible Model Parameters: Configurable temperature, beam search, and token limits
- 🎛️ Interactive Web Controls: Real-time parameter adjustment in the web interface
- 📁 File Upload Support: Upload code files directly with validation
- 🛠️ CLI Configuration Tools: Command-line utilities for configuration management
- 📊 Enhanced Evaluation: Multiple metrics and better performance tracking
- 🐳 Production Ready: Environment-specific configurations and Docker support
- Documentation generation for code chunks with configurable AI parameters
- Interactive Web Interface made using Streamlit with real-time controls
- Model fine-tuned on code-documentation pairs
- Comprehensive configuration management system
- Support for multiple programming languages
- File upload and download capabilities
git clone https://github.com/Nebu0528/CodeDocAIRun the command below to install all the dependencies needed:
pip install -r requirements.txtThe application comes with sensible defaults, but you can customize it:
Option A: Using Environment Variables
export MODEL_NAME=t5-base
export TEMPERATURE=1.2
export NUM_BEAMS=6
export WEB_PORT=8502Option B: Using Configuration Files
# Copy and edit the example configuration
cp config.json my_config.json
# Edit my_config.json with your preferred settings
# Or use the CLI tool
python3 config_manager.py show # View current config
python3 config_manager.py validate # Validate settingsOption C: Using Environment File
cp .env.example .env
# Edit .env with your settingsThis step is necessary to teach the model how to interpret code and generate corresponding documentation based on that training. The model is being trained on this:
https://huggingface.co/datasets/jtatman/python-code-dataset-500k-
- Train the model using:
python src/model.py-
- After training, the model will be saved in the models/codetext_t5/ directory.
Once the model is trained, run the app:
streamlit run interface/website.pyOnce the app is running, open the URL provided by Streamlit
http://localhost:8501The application now includes a powerful configuration system. Use the CLI tool to manage settings:
# View all current settings
python3 config_manager.py show
# Validate your configuration
python3 config_manager.py validate
# Export current config to file
python3 config_manager.py export my_settings.json
# Import config from file
python3 config_manager.py import production_settings.json
# Create environment template
python3 config_manager.py templateThe web interface now provides real-time controls for:
- Temperature: Controls creativity vs focus (0.1-2.0)
- Number of Beams: Quality vs speed trade-off (1-10)
- Max Tokens: Length of generated documentation (50-500)
- Upload code files directly (supports .py, .js, .java, .cpp, .c, .go, .rs)
- Automatic file validation and size limits
- Download generated documentation as markdown files
# Development
export LOG_LEVEL=DEBUG
export BATCH_SIZE=4
# Production
export MODEL_PATH=/opt/models/production_model
export WEB_HOST=0.0.0.0
export WEB_PORT=80- Web Interface: Paste a code snippet into the text area or upload a code file
- Adjust Parameters: Use the sidebar to customize generation settings
- Generate: Click "Generate Documentation" to create documentation
- Download: Save the generated documentation as a markdown file
def multiply(a, b):
"""This function returns the product of variables a and b"""
return a * bGenerated Documentation Example:
Function: multiply
Description:
Calculates the product of two input parameters and returns the result.
Parameters:
- a: First numeric value for multiplication
- b: Second numeric value for multiplication
Returns:
The mathematical product of parameters a and bBuild and run with Docker:
# Build the image
docker build -t codedocai .
# Run the container
docker run -p 8501:8501 codedocai
# Run with custom config
docker run -p 8501:8501 -v $(pwd)/my_config.json:/app/config.json codedocaiCodeDocAI/
├── src/ # Core application code
│ ├── config.py # Configuration management
│ ├── model.py # T5 model training
│ ├── inference.py # Documentation generation
│ ├── evaluate.py # Model evaluation
│ └── data_loader.py # Data loading utilities
├── interface/
│ └── website.py # Streamlit web interface
├── models/ # Trained model storage
├── data/ # Training data
├── logs/ # Application logs
├── config.json # Example configuration
├── .env.example # Environment template
├── config_manager.py # CLI configuration tool
├── test_system.py # System tests
├── requirements.txt # Python dependencies
├── CONFIGURATION.md # Configuration documentation
└── README.md # This file
- ✅ Advanced configuration system (implemented)
- ✅ File upload support (implemented)
- ✅ Download generated documentation (implemented)
- Handle different programming languages beyond Python
- Integrate additional models (BART, CodeBERT) for enhanced generation
- Support for batch processing multiple files
- API endpoint for programmatic access
- Integration with IDEs and code editors
- Train on larger datasets like The Stack
- Add support for different documentation formats (JSDoc, Sphinx, etc.)
To evaluate the model's performance using BLEU scores, run the following command:
python src/evaluate.pyThis will calculate the average BLEU score for the model's generated documentation compared to the reference documentation.
Configuration Errors:
# Check configuration validity
python3 config_manager.py validate
# View current settings
python3 config_manager.py showImport Errors:
- Ensure you're running commands from the project root directory
- Verify all dependencies are installed:
pip install -r requirements.txt
Model Not Found:
- Train the model first:
python src/model.py - Check model path in configuration:
echo $MODEL_PATH
Port Already in Use:
# Use different port
export WEB_PORT=8502
streamlit run interface/website.py- Fork the repository
- Create a feature branch:
git checkout -b feature-name - Make your changes and test:
python3 test_system.py - Validate configuration:
python3 config_manager.py validate - Submit a pull request
This project is open source and available under the MIT License.
For detailed configuration options, see CONFIGURATION.md