MkDocs RAG Documentation Assistant

A demonstration project featuring a beautiful MkDocs documentation site with an embedded chat assistant powered by a custom RAG (Retrieval-Augmented Generation) pipeline using Google Gemini.

Overview

This project showcases an intelligent documentation assistant that allows users to ask questions in natural language and receive answers sourced directly from the documentation, complete with citations. The system combines modern web documentation with advanced AI retrieval techniques to create an interactive learning experience.

Key Features:

  • 📚 Beautiful Documentation Site - MkDocs with Material theme
  • đź’¬ AI Chat Assistant - Natural language Q&A powered by Gemini
  • 🔍 RAG Pipeline - Custom retrieval using embeddings and vector search
  • 📎 Source Citations - Every answer includes cited documentation sections
  • 🎯 Multiple Models - Support for Gemini, Groq Llama, and Mixtral

Architecture

The system implements a complete RAG (Retrieval-Augmented Generation) pipeline with the following flow:

graph TD; A("User Question") --> B("Frontend (MkDocs)"); B --> C("Backend API (FastAPI)"); C --> D("Query Embedding (Gemini)"); D --> E("Vector Search (ChromaDB)"); E --> F("Retrieve Top-K Chunks"); F --> G("Build Prompt + Context"); G --> H("Gemini Generate Answer"); H --> I("Return Answer + Citations");

Technology Stack

Frontend:

  • MkDocs with Material for MkDocs theme
  • Vanilla JavaScript for chat interface
  • Responsive design with light/dark mode

Backend:

  • FastAPI (Python 3.12+)
  • Google Gemini API (embeddings + generation)
  • ChromaDB for vector storage
  • PostgreSQL + pgvector for production

Infrastructure:

  • Google Cloud Platform (Cloud Run, Firebase Hosting)
  • Docker containerization
  • Automated CI/CD deployment

Key Components

Document Ingestion

The system processes markdown documentation by:

  1. Scanning all .md files in the configured docs directory
  2. Chunking documents by headers with configurable overlap
  3. Generating embeddings using Gemini’s embedding-001 model
  4. Storing vectors and metadata in ChromaDB

Retrieval System

When a user asks a question:

  1. The query is embedded using the same Gemini model
  2. Semantic search finds the top-k most relevant chunks
  3. Retrieved context is formatted with source metadata
  4. The complete prompt is sent to Gemini 2.5 Flash
  5. The response includes both the answer and citations

Chat Interface

The frontend provides:

  • Clean, intuitive chat interface
  • Model selection (Gemini, Groq Llama, Mixtral)
  • Real-time streaming responses
  • Clickable source citations linking back to documentation
  • Mobile-responsive design

Learning Resources

The project includes comprehensive Jupyter notebooks for learning RAG concepts:

1. Local RAG (No Cloud Required) - Build RAG from scratch using HuggingFace embeddings and FAISS, running entirely locally to understand fundamentals.

2. Vertex AI RAG Engine - Leverage Google Cloud’s managed RAG service for production-ready deployments with minimal code.

These notebooks provide a progressive learning path from basic concepts to production deployment.

API Endpoints

The FastAPI backend exposes several endpoints:

  • POST /api/chat - Chat with documentation
  • GET /api/models - List available AI models
  • POST /api/reindex - Rebuild vector index when docs change
  • GET /health - Health check
  • GET /docs - Interactive API documentation (Swagger UI)

Reindexing System

A key feature is the ability to automatically reindex documentation when content changes. The reindexing process:

  • Clears the existing vector store
  • Scans for all markdown files
  • Generates fresh embeddings
  • Updates the searchable index

This can be triggered via API endpoint or scheduled as part of CI/CD pipelines.

Deployment

The project includes production deployment configurations for:

Backend (Cloud Run):

gcloud builds submit --tag gcr.io/PROJECT_ID/mkdocs-rag-backend
gcloud run deploy mkdocs-rag-backend \
  --image gcr.io/PROJECT_ID/mkdocs-rag-backend \
  --platform managed \
  --allow-unauthenticated

Frontend (Firebase Hosting):

mkdocs build
firebase deploy

Technical Highlights

Unlike traditional keyword search, the system uses semantic embeddings to understand the meaning of queries and retrieve contextually relevant information, even when exact keywords don’t match.

Chunking Strategy

Documents are split intelligently by headers with overlap to maintain context. This ensures that retrieved chunks contain complete, coherent information rather than arbitrary text fragments.

Context Window Management

The system carefully manages the context window by:

  • Retrieving only the top-k most relevant chunks
  • Formatting context efficiently
  • Including metadata for proper citation
  • Balancing between context richness and token limits

Multi-Model Support

The architecture supports multiple LLM providers, allowing users to choose between:

  • Gemini 2.5 Flash - Fast, cost-effective, Google’s latest
  • Groq Llama 3.1 - Open-source alternative with fast inference
  • Mixtral - Mixture-of-experts model for complex reasoning

Future Enhancements

The HybridRetriever class provides an extension point for:

  • Web-grounded search - Fallback to Google Search when docs lack information
  • User feedback loop - Improve retrieval based on user ratings
  • Conversation history - Multi-turn conversations with context awareness
  • Advanced chunking - Document hierarchy and relationship preservation

Key Learnings

Building this project provided hands-on experience with:

  1. RAG Pipeline Design - Understanding the trade-offs between retrieval quality, latency, and cost
  2. Vector Databases - Practical experience with embedding storage and similarity search
  3. LLM Integration - Working with modern language models and prompt engineering
  4. Full-Stack Development - Integrating AI capabilities into a production web application
  5. Cloud Deployment - Deploying and scaling ML-powered applications on GCP

Project Links

Conclusion

This project demonstrates a practical implementation of RAG technology, making technical documentation more accessible through conversational AI. It showcases the entire pipeline from document ingestion through embedding generation to intelligent retrieval and response generation, providing a template for building similar documentation assistants.

The combination of beautiful static documentation with dynamic AI-powered assistance creates a superior user experience, especially for large documentation sites where finding specific information can be challenging.

Reference:

MkDocs RAG GitHub Repository