Back to Projects

03 Case Study

Enterprise RAG Knowledge Assistant

RAG-style retrieval over structured technical documents — embeddings, vector search, and prompt design that surfaces source citations on every answer.

  • Python
  • JavaScript
  • Embeddings
  • Vector Search
  • REST APIs
  • RAG

/ Outcomes

  • Reduced manual document lookup time by 35%
  • Embedding-based retrieval over a structured technical-doc corpus
  • Every answer cites the source document and page — no ungrounded responses

Overview

Engineers were losing real hours every week digging through a deep technical document corpus to answer questions a prompt could resolve in seconds — if it had access to the right context. The trap is the obvious one: a model with no grounding will confidently make things up, and once an engineer is burned by one bad answer they stop trusting the tool entirely.

I built a RAG (Retrieval Augmented Generation) assistant that retrieves passages from the corpus first, then asks the model to answer only from what it retrieved, citing the source document on every response.

Approach

The architecture is three layers, each independently testable:

  • Ingestion. Parse the technical docs, split them into semantically meaningful chunks (not by character count — by section / paragraph), embed each chunk with a sentence-embedding model, and store the vectors alongside their source metadata.
  • Retrieval. At query time, embed the question, run vector search against the chunk index, return the top-k most relevant chunks with their source pointers.
  • Generation. Compose a prompt that gives the model the question and the retrieved chunks with instructions: answer from these chunks only, cite the source on each claim, say “I don’t know” when the chunks don’t cover the question.

What I built

  • Document ingestion pipeline in Python that handles the corpus formats, splits intelligently, embeds, and writes to the vector store
  • REST API in JavaScript exposing a /query endpoint that takes a natural-language question and returns the answer + citations
  • Prompt engineering focused on grounding — system prompts that explicitly forbid ungrounded answers, examples that show the model what citation format to use, fallback messaging when retrieval returns nothing
  • Quality measurement loop — track which queries return “I don’t know” so we can identify gaps in the corpus rather than accepting low confidence as an answer

Results

  • Manual document lookup time down 35% for the engineers using it
  • Citation requirement caught the model attempting to answer beyond its grounding multiple times during testing — those cases became corpus gaps to fill, not bugs to debug
  • The retrieval layer is the lever that matters most: prompt tuning on top of bad retrieval is wasted effort. We invested where it counted.

Lessons

RAG quality is dominated by retrieval quality. The prompt engineering matters but it’s a multiplier on what retrieval gives you, not a substitute. If the right chunks aren’t in the top-k, no prompt rescues that. Keep the chunking semantically aware, keep the embeddings up to date with the corpus, and the rest of the system gets a lot easier.