This project is a multimodal voice-activated AI assistant named "Ringo", built with Python. It uses natural speech recognition, image analysis, clipboard extraction, and generative AI models to respond to user prompts.
You activate it by saying "Ringo" followed by your command.
- Voice command recognition with wake word ("Ringo")
- Automatic webcam image capture, screenshot, or clipboard extraction based on prompt context
- Image context interpretation using Google Gemini vision API
- Clipboard text analysis
- Generative responses using Groq (LLaMA3) or OpenAI (GPT-4)
- Text-to-speech response with OpenAI TTS (
echovoice) - Real-time whisper speech-to-text with FasterWhisper
-
Install dependencies:
Run this command in your virtual environment: pip install -r requirements.txt
-
Install system-level dependencies:
- Windows users may need to install
portaudioandPyAudio:- Download PyAudio wheel: https://pypi.org/project/pyaudio-wheels
- Install with pip:
pip install path_to_downloaded_whl_file
- Set your API keys: Edit the main Python file and replace the placeholders with your actual API keys:
- GROQ API Key
- Google Gemini API Key
- OpenAI API Key
- Run the assistant: python your_script_name.py
Once it starts, say "Ringo" followed by your prompt.
- "Ringo, what’s on my clipboard?"
- "Ringo, what do you see on my screen?"
- "Ringo, describe my current appearance from webcam."
- "Ringo, summarize this code from clipboard."
main.py.................... Main assistant logicrequirements.txt.......... All required Python packagesREADME.txt................ This filescreenshot.jpg............ Temporary screenshot imagewebcam.jpg................ Temporary webcam imageprompt.wav................ Temporary audio file for whisper STT
- Python 3.10+
- Working microphone and optionally a webcam
- Stable internet connection
- Compatible audio drivers (for PyAudio and SpeechRecognition)
- Microphone errors can occur if the device is in use or misconfigured.
- Webcam must be accessible by OpenCV (
cv2.VideoCapture(0)). - Ensure clipboard contains text before using "clipboard" mode.
- OpenAI API (TTS + GPT)
- Google Gemini Vision API
- Groq LLaMA3
- FasterWhisper
- PyAudio, SpeechRecognition, OpenCV, Pillow
MIT License