A demonstration project featuring a beautiful MkDocs documentation site with an embedded chat assistant powered by a custom RAG (Retrieval-Augmented Generation) pipeline using Google Gemini.

Overview

This project showcases an intelligent documentation assistant that allows users to ask questions in natural language and receive answers sourced directly from the documentation, complete with citations. The system combines modern web documentation with advanced AI retrieval techniques to create an interactive learning experience.

Key Features:

📚 Beautiful Documentation Site - MkDocs with Material theme
💬 AI Chat Assistant - Natural language Q&A powered by Gemini
🔍 RAG Pipeline - Custom retrieval using embeddings and vector search
📎 Source Citations - Every answer includes cited documentation sections
🎯 Multiple Models - Support for Gemini, Groq Llama, and Mixtral

Architecture

The system implements a complete RAG (Retrieval-Augmented Generation) pipeline with the following flow:

graph TD; A("User Question") --> B("Frontend (MkDocs)"); B --> C("Backend API (FastAPI)"); C --> D("Query Embedding (Gemini)"); D --> E("Vector Search (ChromaDB)"); E --> F("Retrieve Top-K Chunks"); F --> G("Build Prompt + Context"); G --> H("Gemini Generate Answer"); H --> I("Return Answer + Citations");

Technology Stack

Frontend:

MkDocs with Material for MkDocs theme
Vanilla JavaScript for chat interface
Responsive design with light/dark mode

Backend:

FastAPI (Python 3.12+)
Google Gemini API (embeddings + generation)
ChromaDB for vector storage
PostgreSQL + pgvector for production

Infrastructure:

Google Cloud Platform (Cloud Run, Firebase Hosting)
Docker containerization
Automated CI/CD deployment

Key Components

Document Ingestion

The system processes markdown documentation by:

Scanning all .md files in the configured docs directory
Chunking documents by headers with configurable overlap
Generating embeddings using Gemini’s embedding-001 model
Storing vectors and metadata in ChromaDB

Retrieval System

When a user asks a question:

The query is embedded using the same Gemini model
Semantic search finds the top-k most relevant chunks
Retrieved context is formatted with source metadata
The complete prompt is sent to Gemini 2.5 Flash
The response includes both the answer and citations

Chat Interface

The frontend provides:

Clean, intuitive chat interface
Model selection (Gemini, Groq Llama, Mixtral)
Real-time streaming responses
Clickable source citations linking back to documentation
Mobile-responsive design

Learning Resources

The project includes comprehensive Jupyter notebooks for learning RAG concepts:

1. Local RAG (No Cloud Required) - Build RAG from scratch using HuggingFace embeddings and FAISS, running entirely locally to understand fundamentals.

2. Vertex AI RAG Engine - Leverage Google Cloud’s managed RAG service for production-ready deployments with minimal code.

These notebooks provide a progressive learning path from basic concepts to production deployment.

API Endpoints

The FastAPI backend exposes several endpoints:

POST /api/chat - Chat with documentation
GET /api/models - List available AI models
POST /api/reindex - Rebuild vector index when docs change
GET /health - Health check
GET /docs - Interactive API documentation (Swagger UI)

Reindexing System

A key feature is the ability to automatically reindex documentation when content changes. The reindexing process:

Clears the existing vector store
Scans for all markdown files
Generates fresh embeddings
Updates the searchable index

This can be triggered via API endpoint or scheduled as part of CI/CD pipelines.

Deployment

The project includes production deployment configurations for:

Backend (Cloud Run):

gcloud builds submit --tag gcr.io/PROJECT_ID/mkdocs-rag-backend
gcloud run deploy mkdocs-rag-backend \
  --image gcr.io/PROJECT_ID/mkdocs-rag-backend \
  --platform managed \
  --allow-unauthenticated

Frontend (Firebase Hosting):

mkdocs build
firebase deploy

Technical Highlights

Semantic Search

Unlike traditional keyword search, the system uses semantic embeddings to understand the meaning of queries and retrieve contextually relevant information, even when exact keywords don’t match.

Chunking Strategy

Documents are split intelligently by headers with overlap to maintain context. This ensures that retrieved chunks contain complete, coherent information rather than arbitrary text fragments.

Context Window Management

The system carefully manages the context window by:

Retrieving only the top-k most relevant chunks
Formatting context efficiently
Including metadata for proper citation
Balancing between context richness and token limits

Multi-Model Support

The architecture supports multiple LLM providers, allowing users to choose between:

Gemini 2.5 Flash - Fast, cost-effective, Google’s latest
Groq Llama 3.1 - Open-source alternative with fast inference
Mixtral - Mixture-of-experts model for complex reasoning

Future Enhancements

The HybridRetriever class provides an extension point for:

Web-grounded search - Fallback to Google Search when docs lack information
User feedback loop - Improve retrieval based on user ratings
Conversation history - Multi-turn conversations with context awareness
Advanced chunking - Document hierarchy and relationship preservation

Key Learnings

Building this project provided hands-on experience with:

RAG Pipeline Design - Understanding the trade-offs between retrieval quality, latency, and cost
Vector Databases - Practical experience with embedding storage and similarity search
LLM Integration - Working with modern language models and prompt engineering
Full-Stack Development - Integrating AI capabilities into a production web application
Cloud Deployment - Deploying and scaling ML-powered applications on GCP

Project Links

GitHub Repository: https://github.com/ThamuMnyulwa/mkdocs_rag
Documentation: Includes comprehensive README and Jupyter notebooks
Technologies: Python, FastAPI, MkDocs, Google Gemini, ChromaDB, Docker, GCP

Conclusion

This project demonstrates a practical implementation of RAG technology, making technical documentation more accessible through conversational AI. It showcases the entire pipeline from document ingestion through embedding generation to intelligent retrieval and response generation, providing a template for building similar documentation assistants.

The combination of beautiful static documentation with dynamic AI-powered assistance creates a superior user experience, especially for large documentation sites where finding specific information can be challenging.

Reference:

MkDocs RAG GitHub Repository

MkDocs RAG Documentation Assistant