bonolive ~/portfolio
PT
Back to all projects

RAG Knowledge Assistant

A self-hostable assistant that answers questions over internal docs with citations — OpenAI in the cloud, Ollama on-prem.

Problem
Teams kept re-asking the same questions because answers were buried across wikis, PDFs and chat history.
Solution
A retrieval-augmented pipeline over PostgreSQL/pgvector with a FastAPI service, swappable between OpenAI and local Ollama models.
Impact
Cut time-to-answer for common internal questions from minutes of searching to a single grounded reply with sources.
Stack
  • Python
  • FastAPI
  • PostgreSQL
  • pgvector
  • OpenAI
  • Ollama
  • Redis

Context

Internal knowledge was spread across wikis, exported PDFs and months of chat history. People burned real time re-discovering answers that already existed somewhere.

Architecture

Documents are chunked, embedded and stored in PostgreSQL with pgvector. A FastAPI service handles retrieval (similarity search + reranking) and generation, with a thin abstraction that swaps the model provider between OpenAI and a local Ollama runtime — so the same deployment works for cloud or fully on-prem. Redis caches hot queries and embeddings to keep latency low.

Details

  • Grounded answers always cite their source chunks, so users can verify.
  • Ingestion is incremental — only changed documents are re-embedded.
  • Provider abstraction means no lock-in: switch to local models for sensitive data.

What I’d do next

Add evaluation harnesses for answer quality and a feedback loop that promotes frequently-confirmed answers into a faster cache.