DocuFlow-AI is a high-performance Retrieval-Augmented Generation (RAG) engine designed for speed and precision.
Unlike complex agentic workflows that prioritize reasoning and web search, DocuFlow is built on the streamlined ConversationalRetrievalChain. It focuses strictly on your uploaded Knowledge Base (PDF, CSV, TXT, MD) to deliver instant, hallucination-free answers with lower latency. It features transparent source verification, incremental indexing, and strict language control.
Powered by ConversationalRetrievalChain, DocuFlow removes the overhead of "Agent reasoning loops."
- Direct Retrieval: Connects your question directly to the most relevant document chunks.
- Low Latency: Optimized for rapid Q&A, making it ideal for quick document analysis and data extraction.
- Stability: Reduces the risk of "looping" errors common in complex agents.
Trust but verify. DocuFlow provides transparency for every answer:
- Source Citations: Click the "🔍 View Reference Context" expander to see exactly which document and page the answer came from.
- Similarity Scores: Displays the relevance score for each retrieved chunk, allowing you to gauge the confidence of the AI's retrieval.
- Smart Processing: The app tracks filenames (
processed_filesstate). If you add a new file to an existing batch, it only processes the new file without re-embedding the old ones. - Optimization: Saves time and API quota by appending to the existing FAISS vector store instead of rebuilding it from scratch.
Features a custom "Translator-Researcher" prompt injection.
- User Control: Select English or Indonesian in the sidebar.
- Behavior: The AI ignores the source document's language and forces the final output into your selected language (e.g., querying an English Journal but getting the answer in Indonesian).
- Chat History Export: Download your entire conversation analysis as a JSON file.
- Robust Error Handling: Specifically handles API Quotas (429), Invalid Keys, and Corrupt Files with user-friendly toast notifications and error messages.
- LLM: Google Gemini 2.5 Flash.
- Framework: Streamlit.
- Orchestration: LangChain (
ConversationalRetrievalChain). - Vector Database: FAISS (In-memory).
- Embeddings: GoogleGenerativeAIEmbeddings (
models/gemini-embedding-001). - File Handling:
PyPDFLoader,CSVLoader,TextLoader.
-
Clone the Repository
git clone https://github.com/viochris/DocuFlow-AI.git cd DocuFlow-AI -
Install Dependencies
pip install -r requirements.txt
-
Run the Application
streamlit run app.py
- Setup:
- Get your API Key from Google AI Studio.
- Enter the key in the sidebar.
- Configuration:
- Choose Response Language (English/Indonesian).
- Adjust Creativity Level (Lower for facts, Higher for creative summaries).
- Build Knowledge Base:
- Upload your documents (PDF, CSV, MD, or TXT).
- Click "🚀 Process & Embed Documents".
- Wait for the "Successfully added" toast notification.
- Chat & Verify:
- Ask questions like "What are the main risks mentioned?" or "Summarize the table in the CSV".
- Expand "🔍 View Reference Context" below the answer to audit the source evidence.
- Export:
- Click "📥 Download Chat History" to save your insights JSON.
- Session Volatility: Since FAISS is stored in RAM (Session State), refreshing the browser will clear the document index.
- Context Window: Extremely large files may be split into chunks; answers depend on the relevance of the retrieved
k=3chunks. - No Web Search: This engine is strictly limited to your documents for maximum privacy and accuracy. It does not search the internet.

The clean, high-performance landing page. Designed for quick file uploads and immediate processing.

Comprehensive sidebar for API security, model tuning, and multi-file ingestion.

The core experience: Instant answers accompanied by the "View Reference Context" panel, showing source text and similarity scores for full transparency.
A detailed view of the "View Reference Context" expander. It provides full transparency by showing the exact document excerpts and similarity scores used by the AI to generate its answer.
Author: Silvio Christian, Joe "Instant answers from your files. No overhead."