⚡ DocuFlow-AI (Fast Document Insights)

📌 Overview

DocuFlow-AI is a high-performance Retrieval-Augmented Generation (RAG) engine designed for speed and precision.

Unlike complex agentic workflows that prioritize reasoning and web search, DocuFlow is built on the streamlined ConversationalRetrievalChain. It focuses strictly on your uploaded Knowledge Base (PDF, CSV, TXT, MD) to deliver instant, hallucination-free answers with lower latency. It features transparent source verification, incremental indexing, and strict language control.

✨ Key Features

⚡ Streamlined RAG Architecture

Powered by ConversationalRetrievalChain, DocuFlow removes the overhead of "Agent reasoning loops."

Direct Retrieval: Connects your question directly to the most relevant document chunks.
Low Latency: Optimized for rapid Q&A, making it ideal for quick document analysis and data extraction.
Stability: Reduces the risk of "looping" errors common in complex agents.

🔍 Explainable AI (Verification)

Trust but verify. DocuFlow provides transparency for every answer:

Source Citations: Click the "🔍 View Reference Context" expander to see exactly which document and page the answer came from.
Similarity Scores: Displays the relevance score for each retrieved chunk, allowing you to gauge the confidence of the AI's retrieval.

🧠 Cost-Efficient Incremental Indexing

Smart Processing: The app tracks filenames (processed_files state). If you add a new file to an existing batch, it only processes the new file without re-embedding the old ones.
Optimization: Saves time and API quota by appending to the existing FAISS vector store instead of rebuilding it from scratch.

🌐 Strict Language Enforcement

Features a custom "Translator-Researcher" prompt injection.

User Control: Select English or Indonesian in the sidebar.
Behavior: The AI ignores the source document's language and forces the final output into your selected language (e.g., querying an English Journal but getting the answer in Indonesian).

🛠️ Advanced Session Management

Chat History Export: Download your entire conversation analysis as a JSON file.
Robust Error Handling: Specifically handles API Quotas (429), Invalid Keys, and Corrupt Files with user-friendly toast notifications and error messages.

🛠️ Tech Stack

LLM: Google Gemini 2.5 Flash.
Framework: Streamlit.
Orchestration: LangChain (ConversationalRetrievalChain).
Vector Database: FAISS (In-memory).
Embeddings: GoogleGenerativeAIEmbeddings (models/gemini-embedding-001).
File Handling: PyPDFLoader, CSVLoader, TextLoader.

📦 Installation

Clone the Repository

git clone https://github.com/viochris/DocuFlow-AI.git
cd DocuFlow-AI

Install Dependencies
```
pip install -r requirements.txt
```
Run the Application
```
streamlit run app.py
```

🚀 Usage Guide

Setup:
- Get your API Key from Google AI Studio.
- Enter the key in the sidebar.
Configuration:
- Choose Response Language (English/Indonesian).
- Adjust Creativity Level (Lower for facts, Higher for creative summaries).
Build Knowledge Base:
- Upload your documents (PDF, CSV, MD, or TXT).
- Click "🚀 Process & Embed Documents".
- Wait for the "Successfully added" toast notification.
Chat & Verify:
- Ask questions like "What are the main risks mentioned?" or "Summarize the table in the CSV".
- Expand "🔍 View Reference Context" below the answer to audit the source evidence.
Export:
- Click "📥 Download Chat History" to save your insights JSON.

⚠️ Limitations

Session Volatility: Since FAISS is stored in RAM (Session State), refreshing the browser will clear the document index.
Context Window: Extremely large files may be split into chunks; answers depend on the relevance of the retrieved k=3 chunks.
No Web Search: This engine is strictly limited to your documents for maximum privacy and accuracy. It does not search the internet.

📷 Gallery

1. Landing Interface

The clean, high-performance landing page. Designed for quick file uploads and immediate processing.

2. Configuration & Uploads

Comprehensive sidebar for API security, model tuning, and multi-file ingestion.

3. Analysis with Verification

The core experience: Instant answers accompanied by the "View Reference Context" panel, showing source text and similarity scores for full transparency.

4. Source Verification Details

A detailed view of the "View Reference Context" expander. It provides full transparency by showing the exact document excerpts and similarity scores used by the AI to generate its answer.

Author: Silvio Christian, Joe "Instant answers from your files. No overhead."

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
assets		assets
LICENSE		LICENSE
README.md		README.md
app.py		app.py
function.py		function.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

⚡ DocuFlow-AI (Fast Document Insights)

📌 Overview