A specialized automation system and AI assistant built for the IIT Madras Tools in Data Science (TDS) course.
The most accurate TDS assignment solver with guaranteed results
π Live Demo β’ API Documentation β’ GitHub Repository
- π― Overview
- β¨ Key Features
- ποΈ Architecture
- π― How It Works
- π Assignment Coverage
- π οΈ Technology Stack
- π Project Structure
- π Quick Start
- π³ Docker Deployment
- π API Reference
- π§ Advanced Features
- π€ Contributing
- π License
- π¨βπ» Author
Vicky is a hybrid intelligent system designed to assist students with the IIT Madras Tools in Data Science (TDS) course. Unlike generic AI wrappers, this project utilizes a deterministic pattern-matching engine to ensure 100% accuracy for assignment submissions, while leveraging Groq LLaMA 3.1-70B for general conversational assistance.
| π― Deterministic Accuracy | β‘ Sub-second Response | π Zero Hallucinations |
| Guaranteed correct answers | Lightning-fast processing | Rule-based execution |
- π Assignment Solver: Automatically solves questions from GA1 to GA5 using 55+ hardcoded logical functions
- π¬ Intelligent Chat: Separate module using Groq LLaMA 3.1-70B for conceptual doubts and feedback
- π’ Multi-Platform Notifications: Real-time integration with Discord, Slack, and Telegram
- β‘ High Performance: Sub-second query processing with a localized backend
- π³ Containerized: Ready-to-deploy Docker setup
- π Web Interface: Responsive HTML5 frontend with vanilla JavaScript
- π§ Sophisticated Pattern Matching: Hierarchical matching system with domain classification
- π Intelligent File Management: Content-based file identification with signature verification
- π Real-time Notifications: Webhook integrations for instant feedback
- π‘οΈ Robust Error Handling: Graceful fallbacks and comprehensive logging
- π Performance Monitoring: Built-in metrics and processing time tracking
- π Secure Processing: Isolated execution environment with proper validation
The core of this application is NOT a hallucinating AI. It is a strict rule-based engine for assignments, ensuring deterministic and accurate results.
graph TB
subgraph "User Interface"
A[Web Frontend] --> B[API Gateway]
C[REST API] --> B
end
subgraph "Core Engine"
B --> D[Question Router]
D --> E[Pattern Matcher]
E --> F{Question Type?}
end
subgraph "Solver Functions"
F -->|GA1-GA5| G[55+ Specialized Functions]
F -->|Unknown| H[LLM Fallback]
end
subgraph "Data Processing"
G --> I[File Manager]
I --> J[Content Signatures]
G --> K[HTTP Client]
end
subgraph "Output"
G --> L[Deterministic Answer]
H --> M[Conversational Response]
L --> N[Notification System]
end
- π― Deterministic Execution: Every question maps to a specific, tested function
- π Hierarchical Matching: Multi-stage pattern recognition for accuracy
- π¦ Modular Design: Clean separation between routing, processing, and output
- π‘οΈ Error Resilience: Comprehensive error handling with graceful degradation
- π Performance Optimized: Sub-second response times with efficient algorithms
graph TD
A[User Question] --> B[Input Validation]
B --> C[Pattern Analysis]
C --> D[Domain Classification]
D --> E[Similarity Scoring]
E --> F{Match Found?}
F -->|Yes| G[Execute Specific Solver]
F -->|No| H[Return Error Message]
G --> I[Process with File Manager]
I --> J[Return Deterministic Answer]
- π₯ Input Reception: Question received via API endpoint with optional file upload
- π Pattern Matching: Hierarchical analysis using domain classification and similarity scoring
- π― Function Routing: Question mapped to one of 55+ specialized solver functions
- βοΈ Execution: Deterministic processing with proper error handling
- π€ Response: Formatted JSON output with metadata and processing statistics
The system uses a sophisticated multi-stage matching algorithm:
- Stage 1: Direct pattern detection (high-confidence matches)
- Stage 2: Domain classification with weighted scoring
- Stage 3: Semantic similarity with keyword analysis
- Stage 4: Fallback to conversational AI for unmatched queries
The system natively supports the following graded assignments:
| Assignment | Solvers | Key Topics | Accuracy |
|---|---|---|---|
| GA1 | 18 Functions | VS Code, Git, JSON/CSV sorting, File processing | β 100% |
| GA2 | 10 Functions | Image compression, Docker, API integration | β 100% |
| GA3 | 9 Functions | Web scraping, HTTP requests, Data extraction | β 100% |
| GA4 | 10 Functions | BeautifulSoup (IMDb), Wikipedia API, Weather data | β 100% |
| GA5 | 10 Functions | Advanced Data Cleaning, PDF extraction, Excel automation | β 100% |
ga1_first_solution()- VS Code command executionga1_second_solution()- HTTP requests with parametersga1_third_solution()- File hashing with Prettierga1_fourth_solution()- Google Sheets formulasga1_fifth_solution()- Excel formula calculationsga1_sixth_solution()- Hidden input extractionga1_seventh_solution()- Date range calculationsga1_eighth_solution()- ZIP file CSV extraction- Plus 10 more specialized functions...
- Image processing and compression algorithms
- Web scraping with BeautifulSoup and requests
- Data analysis with pandas and openpyxl
- API integrations with proper error handling
- File processing for multiple formats (PDF, Excel, JSON, etc.)
| Component | Technology | Purpose |
|---|---|---|
| Core Backend | Python 3.11+ | Primary programming language |
| Web Framework | FastAPI | High-performance API server |
| ASGI Server | Uvicorn | Production-ready server |
| Frontend | HTML5, CSS3, Vanilla JS | Responsive web interface |
| Containerization | Docker | Deployment and scaling |
| Pattern Matching | Custom Regex Engine | Question classification |
| File Processing | Multiple Libraries | ZIP, PDF, Excel, Image handling |
| HTTP Client | Requests | API integrations |
| Notifications | Webhooks | Discord, Slack, Telegram |
# Core Dependencies (requirements.txt)
fastapi==0.104.1 # Web framework
uvicorn==0.24.0 # ASGI server
python-multipart==0.0.6 # File uploads
requests==2.31.0 # HTTP client
beautifulsoup4==4.12.2 # HTML parsing
pandas==2.1.4 # Data analysis
openpyxl==3.1.2 # Excel processing
Pillow==10.1.0 # Image processing
groq==0.4.1 # LLM integrationThis project contains over 22,000 lines of code across multiple specialized modules.
assistant_chatbot/
βββ π config/ # Configuration files
β βββ azure.yaml # Azure deployment config
β βββ docker-entrypoint.sh # Container startup
β βββ gunicorn.conf.py # Production server config
β βββ nginx.conf # Reverse proxy config
βββ π docs/ # Documentation
β βββ AZURE_DEPLOYMENT.md # Azure deployment guide
β βββ DOCKER.md # Docker setup guide
β βββ step_analysis.md # Development notes
βββ π infra/ # Infrastructure as Code
β βββ main.bicep # Azure Bicep templates
βββ π src/ # Source code
β βββ core/ # Core utilities
β βββ solvers/ # Assignment solvers
β βββ utils/ # Helper functions
βββ π static/ # Frontend assets
β βββ index.html # Main web interface
β βββ css/styles.css # Styling
β βββ js/main.js # Frontend logic
βββ π templates/ # HTML templates
βββ π tests/ # Test suite
β βββ test_api.py # API endpoint tests
β βββ test_assignment_solver.py # Solver function tests
βββ π§ vicky_app.py # Main FastAPI application (7,700+ lines)
βββ π§ vicky_server.py # Core engine & 55 solvers (14,200+ lines)
βββ π vickys.json # Question pattern database
βββ π¦ requirements.txt # Python dependencies
βββ π³ Dockerfile # Container configuration
βββ π³ docker-compose.yml # Multi-container setup
βββ π .env.example # Environment template
βββ π README.md # This file
vicky_app.py: FastAPI application with 3 main endpoints (/ask,/api/,/api/vicky)vicky_server.py: The "brain" - pattern matching engine + 55+ solver functionsvickys.json: Database of question patterns and expected solutionsFileManager: Advanced file handling system with content signatures
- β Python 3.11 or higher
- β Git
- β Docker (optional, for containerized deployment)
-
Clone the Repository
git clone https://github.com/algsoch/assistant_chatbot.git cd assistant_chatbot -
Setup Virtual Environment (Critical for dependency isolation)
python -m venv venv source venv/bin/activate # On Windows: venv\Scripts\activate
-
Install Dependencies
pip install -r requirements.txt
-
Configure Environment
Create a
.envfile in the root directory:# Required GROQ_API_KEY=your_groq_api_key # Optional - Notification Services DISCORD_WEBHOOK_URL=your_discord_url SLACK_WEBHOOK_URL=your_slack_url TELEGRAM_BOT_TOKEN=your_telegram_token
-
Run the Server
uvicorn vicky_app:app --host 0.0.0.0 --port 8000
Visit
http://localhost:8000to access the interface.
# Build the image
docker build -t vicky-assistant .
# Run the container
docker run -p 8000:8000 --env-file .env vicky-assistantdocker-compose up -d# Using docker-compose.prod.yml for production
docker-compose -f docker-compose.prod.yml up -dThe API provides three main endpoints for solving TDS assignment questions:
Method: POST
Content-Type: application/x-www-form-urlencoded
Request:
curl -X POST "http://localhost:8000/ask" \
-F "question=Your TDS assignment question here"Method: POST
Content-Type: application/x-www-form-urlencoded
Request:
curl -X POST "http://localhost:8000/api/" \
-F "question=Your TDS assignment question here" \
-F "file=@path/to/your/file" # Optional file uploadMethod: POST
Content-Type: application/x-www-form-urlencoded
Request:
curl -X POST "http://localhost:8000/api/vicky" \
-F "question=Your TDS assignment question here" \
-F "file=@path/to/your/file" # Optional file upload \
-F "format=json" # Response format \
-F "notify=true" # Enable notificationsQuestion: Send a HTTPS request to httpbin.org with email parameter
API Call:
curl -X POST "http://localhost:8000/api/vicky" \
-F "question=Running uv run --with httpie -- https [URL] installs the Python package httpie and sends a HTTPS request to the URL.
Send a HTTPS request to https://httpbin.org/get with the URL encoded parameter email set to 24f2006438@ds.study.iitm.ac.in
What is the JSON output of the command? (Paste only the JSON body, not the headers)"Response:
{
"answer": "{\n \"args\": {\n \"email\": \"24f2006438@ds.study.iitm.ac.in\"\n },\n \"headers\": {\n \"host\": \"postman-echo.com\",\n \"accept-encoding\": \"gzip, br\",\n \"accept\": \"*/*\",\n \"x-forwarded-proto\": \"https\",\n \"user-agent\": \"python-requests/2.32.5\"\n },\n \"url\": \"https://postman-echo.com/get?email=24f2006438%40ds.study.iitm.ac.in\"\n}",
"metadata": {
"processing_time_seconds": 0.43,
"timestamp": "2025-11-26T00:49:38.112735",
"api_version": "1.0"
}
}All endpoints return JSON responses with the following structure:
{
"answer": "The solution to your TDS question",
"metadata": {
"processing_time_seconds": 0.43,
"timestamp": "2025-11-26T00:49:38.112735",
"api_version": "1.0"
}
}The system automatically recognizes and solves questions from:
- GA1-GA5 Assignments (55+ specific functions)
- File Processing (ZIP, CSV, PDF, images)
- Web Scraping (HTTP requests, API calls)
- Data Analysis (Excel, JSON, SQL)
- Image Processing (compression, pixel analysis)
If a question cannot be matched to a known assignment:
{
"answer": "I couldn't find a matching question in the TDS assignment system. This might be a new question or the query needs to be rephrased. Please check if your question matches one of the existing TDS assignments.",
"metadata": {
"processing_time_seconds": 0.02,
"timestamp": "2025-11-26T00:49:01.885728",
"api_version": "1.0"
}
}The system uses a sophisticated hierarchical pattern matching algorithm:
- Direct Pattern Detection: High-confidence matches for specific question types
- Domain Classification: Categorizes questions by topic (VS Code, Git, Excel, etc.)
- Weighted Scoring: Combines multiple similarity metrics for accuracy
- Semantic Analysis: Understands context and intent beyond keyword matching
Advanced file handling with content-based identification:
- Content Signatures: MD5 hashing for file verification
- Multi-format Support: ZIP, PDF, Excel, CSV, JSON, images
- Remote File Handling: Automatic download and caching
- Path Resolution: Intelligent file location across multiple directories
Real-time notifications via multiple platforms:
- Discord Webhooks: Instant notifications to Discord channels
- Slack Integration: Team notifications with rich formatting
- Telegram Bots: Direct messaging capabilities
- Configurable Triggers: Notifications on success/failure events
- Response Time: Sub-second processing for most queries
- Memory Efficient: Optimized algorithms for large datasets
- Concurrent Processing: Handles multiple requests simultaneously
- Caching System: Intelligent result caching for repeated queries
- Input Validation: Comprehensive sanitization of user inputs
- File Type Verification: Strict checking of uploaded files
- Rate Limiting: Protection against abuse
- Isolated Execution: Sandboxed processing environment
We welcome contributions! Please follow these steps:
- Fork the repository
- Create a feature branch (
git checkout -b feature/AmazingFeature) - Commit your changes (
git commit -m 'Add some AmazingFeature') - Push to the branch (
git push origin feature/AmazingFeature) - Open a Pull Request
# Install development dependencies
pip install -r requirements-dev.txt
# Run tests
pytest
# Run linting
flake8To add support for new assignment questions:
- Add the question pattern to
vickys.json - Implement the solver function in
vicky_server.py - Update the routing logic in
find_best_question_match() - Add appropriate tests in
tests/
This project is licensed under the MIT License - see the LICENSE file for details.
Vicky Kumar
Built with Curiosity for the IIT Madras TDS Course