A demonstration project featuring a beautiful MkDocs documentation site with an embedded chat assistant powered by a custom RAG (Retrieval-Augmented Generation) pipeline using Google Gemini.
Overview
This project showcases an intelligent documentation assistant that allows users to ask questions in natural language and receive answers sourced directly from the documentation, complete with citations. The system combines modern web documentation with advanced AI retrieval techniques to create an interactive learning experience.
Key Features:
- 📚 Beautiful Documentation Site - MkDocs with Material theme
- đź’¬ AI Chat Assistant - Natural language Q&A powered by Gemini
- 🔍 RAG Pipeline - Custom retrieval using embeddings and vector search
- 📎 Source Citations - Every answer includes cited documentation sections
- 🎯 Multiple Models - Support for Gemini, Groq Llama, and Mixtral
Architecture
The system implements a complete RAG (Retrieval-Augmented Generation) pipeline with the following flow:
Technology Stack
Frontend:
- MkDocs with Material for MkDocs theme
- Vanilla JavaScript for chat interface
- Responsive design with light/dark mode
Backend:
- FastAPI (Python 3.12+)
- Google Gemini API (embeddings + generation)
- ChromaDB for vector storage
- PostgreSQL + pgvector for production
Infrastructure:
- Google Cloud Platform (Cloud Run, Firebase Hosting)
- Docker containerization
- Automated CI/CD deployment
Key Components
Document Ingestion
The system processes markdown documentation by:
- Scanning all
.mdfiles in the configured docs directory - Chunking documents by headers with configurable overlap
- Generating embeddings using Gemini’s
embedding-001model - Storing vectors and metadata in ChromaDB
Retrieval System
When a user asks a question:
- The query is embedded using the same Gemini model
- Semantic search finds the top-k most relevant chunks
- Retrieved context is formatted with source metadata
- The complete prompt is sent to Gemini 2.5 Flash
- The response includes both the answer and citations
Chat Interface
The frontend provides:
- Clean, intuitive chat interface
- Model selection (Gemini, Groq Llama, Mixtral)
- Real-time streaming responses
- Clickable source citations linking back to documentation
- Mobile-responsive design
Learning Resources
The project includes comprehensive Jupyter notebooks for learning RAG concepts:
1. Local RAG (No Cloud Required) - Build RAG from scratch using HuggingFace embeddings and FAISS, running entirely locally to understand fundamentals.
2. Vertex AI RAG Engine - Leverage Google Cloud’s managed RAG service for production-ready deployments with minimal code.
These notebooks provide a progressive learning path from basic concepts to production deployment.
API Endpoints
The FastAPI backend exposes several endpoints:
POST /api/chat- Chat with documentationGET /api/models- List available AI modelsPOST /api/reindex- Rebuild vector index when docs changeGET /health- Health checkGET /docs- Interactive API documentation (Swagger UI)
Reindexing System
A key feature is the ability to automatically reindex documentation when content changes. The reindexing process:
- Clears the existing vector store
- Scans for all markdown files
- Generates fresh embeddings
- Updates the searchable index
This can be triggered via API endpoint or scheduled as part of CI/CD pipelines.
Deployment
The project includes production deployment configurations for:
Backend (Cloud Run):
gcloud builds submit --tag gcr.io/PROJECT_ID/mkdocs-rag-backend
gcloud run deploy mkdocs-rag-backend \
--image gcr.io/PROJECT_ID/mkdocs-rag-backend \
--platform managed \
--allow-unauthenticated
Frontend (Firebase Hosting):
mkdocs build
firebase deploy
Technical Highlights
Semantic Search
Unlike traditional keyword search, the system uses semantic embeddings to understand the meaning of queries and retrieve contextually relevant information, even when exact keywords don’t match.
Chunking Strategy
Documents are split intelligently by headers with overlap to maintain context. This ensures that retrieved chunks contain complete, coherent information rather than arbitrary text fragments.
Context Window Management
The system carefully manages the context window by:
- Retrieving only the top-k most relevant chunks
- Formatting context efficiently
- Including metadata for proper citation
- Balancing between context richness and token limits
Multi-Model Support
The architecture supports multiple LLM providers, allowing users to choose between:
- Gemini 2.5 Flash - Fast, cost-effective, Google’s latest
- Groq Llama 3.1 - Open-source alternative with fast inference
- Mixtral - Mixture-of-experts model for complex reasoning
Future Enhancements
The HybridRetriever class provides an extension point for:
- Web-grounded search - Fallback to Google Search when docs lack information
- User feedback loop - Improve retrieval based on user ratings
- Conversation history - Multi-turn conversations with context awareness
- Advanced chunking - Document hierarchy and relationship preservation
Key Learnings
Building this project provided hands-on experience with:
- RAG Pipeline Design - Understanding the trade-offs between retrieval quality, latency, and cost
- Vector Databases - Practical experience with embedding storage and similarity search
- LLM Integration - Working with modern language models and prompt engineering
- Full-Stack Development - Integrating AI capabilities into a production web application
- Cloud Deployment - Deploying and scaling ML-powered applications on GCP
Project Links
- GitHub Repository: https://github.com/ThamuMnyulwa/mkdocs_rag
- Documentation: Includes comprehensive README and Jupyter notebooks
- Technologies: Python, FastAPI, MkDocs, Google Gemini, ChromaDB, Docker, GCP
Conclusion
This project demonstrates a practical implementation of RAG technology, making technical documentation more accessible through conversational AI. It showcases the entire pipeline from document ingestion through embedding generation to intelligent retrieval and response generation, providing a template for building similar documentation assistants.
The combination of beautiful static documentation with dynamic AI-powered assistance creates a superior user experience, especially for large documentation sites where finding specific information can be challenging.
Reference: