CodeDocAI: An NLP-Powered Code Documentation Generator

Project Overview

This project aims to help programmers write documentation for their code, using a fine-tuned T5 model to generate readable documentation from code chunks. The training is done on the command line and the documentation is generated on a web interface through Streamlit.

✨ New Features (v2.0)

🔧 Advanced Configuration System: Centralized configuration management with environment variable support
⚙️ Flexible Model Parameters: Configurable temperature, beam search, and token limits
🎛️ Interactive Web Controls: Real-time parameter adjustment in the web interface
📁 File Upload Support: Upload code files directly with validation
🛠️ CLI Configuration Tools: Command-line utilities for configuration management
📊 Enhanced Evaluation: Multiple metrics and better performance tracking
🐳 Production Ready: Environment-specific configurations and Docker support

Features

Documentation generation for code chunks with configurable AI parameters
Interactive Web Interface made using Streamlit with real-time controls
Model fine-tuned on code-documentation pairs
Comprehensive configuration management system
Support for multiple programming languages
File upload and download capabilities

Getting Started

1. Clone Repository

git clone https://github.com/Nebu0528/CodeDocAI

2. Install Dependencies

Run the command below to install all the dependencies needed:

pip install -r requirements.txt

3. Configure the Application (Optional)

The application comes with sensible defaults, but you can customize it:

Option A: Using Environment Variables

export MODEL_NAME=t5-base
export TEMPERATURE=1.2
export NUM_BEAMS=6
export WEB_PORT=8502

Option B: Using Configuration Files

# Copy and edit the example configuration
cp config.json my_config.json
# Edit my_config.json with your preferred settings

# Or use the CLI tool
python3 config_manager.py show     # View current config
python3 config_manager.py validate # Validate settings

Option C: Using Environment File

cp .env.example .env
# Edit .env with your settings

4. Train the Model

This step is necessary to teach the model how to interpret code and generate corresponding documentation based on that training. The model is being trained on this:

https://huggingface.co/datasets/jtatman/python-code-dataset-500k

1. Train the model using:

python src/model.py

1. After training, the model will be saved in the models/codetext_t5/ directory.

5. Run the Streamlit App

Once the model is trained, run the app:

streamlit run interface/website.py

6. Access App

Once the app is running, open the URL provided by Streamlit

http://localhost:8501

Configuration Management

The application now includes a powerful configuration system. Use the CLI tool to manage settings:

# View all current settings
python3 config_manager.py show

# Validate your configuration
python3 config_manager.py validate

# Export current config to file
python3 config_manager.py export my_settings.json

# Import config from file
python3 config_manager.py import production_settings.json

# Create environment template
python3 config_manager.py template

Advanced Usage

Customizable Parameters

The web interface now provides real-time controls for:

Temperature: Controls creativity vs focus (0.1-2.0)
Number of Beams: Quality vs speed trade-off (1-10)
Max Tokens: Length of generated documentation (50-500)

File Upload Support

Upload code files directly (supports .py, .js, .java, .cpp, .c, .go, .rs)
Automatic file validation and size limits
Download generated documentation as markdown files

Environment-Specific Configurations

# Development
export LOG_LEVEL=DEBUG
export BATCH_SIZE=4

# Production  
export MODEL_PATH=/opt/models/production_model
export WEB_HOST=0.0.0.0
export WEB_PORT=80

Usage

Web Interface: Paste a code snippet into the text area or upload a code file
Adjust Parameters: Use the sidebar to customize generation settings
Generate: Click "Generate Documentation" to create documentation
Download: Save the generated documentation as a markdown file

Example Code Snippet

def multiply(a, b):
    """This function returns the product of variables a and b"""
    return a * b

Generated Documentation Example:

Function: multiply

Description:
Calculates the product of two input parameters and returns the result.

Parameters:
- a: First numeric value for multiplication
- b: Second numeric value for multiplication  

Returns:
The mathematical product of parameters a and b

Docker Support

Build and run with Docker:

# Build the image
docker build -t codedocai .

# Run the container
docker run -p 8501:8501 codedocai

# Run with custom config
docker run -p 8501:8501 -v $(pwd)/my_config.json:/app/config.json codedocai

Project Structure

CodeDocAI/
├── src/                          # Core application code
│   ├── config.py                # Configuration management
│   ├── model.py                 # T5 model training
│   ├── inference.py             # Documentation generation
│   ├── evaluate.py              # Model evaluation
│   └── data_loader.py           # Data loading utilities
├── interface/
│   └── website.py               # Streamlit web interface
├── models/                      # Trained model storage
├── data/                        # Training data
├── logs/                        # Application logs
├── config.json                  # Example configuration
├── .env.example                 # Environment template
├── config_manager.py            # CLI configuration tool
├── test_system.py               # System tests
├── requirements.txt             # Python dependencies
├── CONFIGURATION.md             # Configuration documentation
└── README.md                    # This file

Future Improvements

✅ Advanced configuration system (implemented)
✅ File upload support (implemented)
✅ Download generated documentation (implemented)
Handle different programming languages beyond Python
Integrate additional models (BART, CodeBERT) for enhanced generation
Support for batch processing multiple files
API endpoint for programmatic access
Integration with IDEs and code editors
Train on larger datasets like The Stack
Add support for different documentation formats (JSDoc, Sphinx, etc.)

Evaluate the Model

To evaluate the model's performance using BLEU scores, run the following command:

python src/evaluate.py

This will calculate the average BLEU score for the model's generated documentation compared to the reference documentation.

Troubleshooting

Common Issues

Configuration Errors:

# Check configuration validity
python3 config_manager.py validate

# View current settings
python3 config_manager.py show

Import Errors:

Ensure you're running commands from the project root directory
Verify all dependencies are installed: pip install -r requirements.txt

Model Not Found:

Train the model first: python src/model.py
Check model path in configuration: echo $MODEL_PATH

Port Already in Use:

# Use different port
export WEB_PORT=8502
streamlit run interface/website.py

Contributing

Fork the repository
Create a feature branch: git checkout -b feature-name
Make your changes and test: python3 test_system.py
Validate configuration: python3 config_manager.py validate
Submit a pull request

License

This project is open source and available under the MIT License.

For detailed configuration options, see CONFIGURATION.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CodeDocAI: An NLP-Powered Code Documentation Generator

Project Overview

✨ New Features (v2.0)

Features

Getting Started

1. Clone Repository

2. Install Dependencies

3. Configure the Application (Optional)

4. Train the Model

5. Run the Streamlit App

6. Access App

Configuration Management

Advanced Usage

Customizable Parameters

File Upload Support

Environment-Specific Configurations

Usage

Example Code Snippet

Docker Support

Project Structure

Future Improvements

Evaluate the Model

Troubleshooting

Common Issues

Contributing

License

About

Uh oh!

Releases

Packages

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 49 Commits
data		data
interface		interface
src		src
.env.example		.env.example
.gitignore		.gitignore
CONFIGURATION.md		CONFIGURATION.md
README.md		README.md
config.json		config.json
config_manager.py		config_manager.py
requirements.txt		requirements.txt

Nebu0528/CodeDocAI

Folders and files

Latest commit

History

Repository files navigation

CodeDocAI: An NLP-Powered Code Documentation Generator

Project Overview

✨ New Features (v2.0)

Features

Getting Started

1. Clone Repository

2. Install Dependencies

3. Configure the Application (Optional)

4. Train the Model

5. Run the Streamlit App

6. Access App

Configuration Management

Advanced Usage

Customizable Parameters

File Upload Support

Environment-Specific Configurations

Usage

Example Code Snippet

Docker Support

Project Structure

Future Improvements

Evaluate the Model

Troubleshooting

Common Issues

Contributing

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages