← Back to Case Studies

Production AI Agent System

Case Study — AI/ML Integration

Role: AI Architect
Duration: 6+ months
Team: 3 developers
Status: In production

The Challenge

An enterprise SaaS product needed AI-powered features to stay competitive. The team had no AI/ML expertise. Previous attempts at "adding AI" had resulted in a basic ChatGPT wrapper that hallucinated domain-specific answers and provided no real value to users. Leadership wanted AI that actually understood the product's domain.

The Approach

I architected a production AI agent system from scratch, building domain-aware intelligence layer by layer:

  1. Domain Analysis — Mapped the product's knowledge base: documentation, help articles, support tickets, feature specs. Identified what users actually asked about vs. what generic AI couldn't answer
  2. RAG Pipeline — Built a retrieval-augmented generation pipeline: document ingestion, chunking, embedding, vector storage, semantic search, context-augmented prompting
  3. MCP Server Configuration — Set up Model Context Protocol servers to give the AI agent structured access to product APIs, user data, and business logic — not just document search
  4. Structured Prompting — Designed a prompt engineering system using claude.md files, skills, and plans to ensure consistent, accurate, and on-brand responses
  5. Evaluation & Tuning — Built an evaluation pipeline to measure response quality, relevance, and accuracy. Iteratively tuned retrieval parameters, chunk sizes, and prompting strategies
  6. Production Deployment — Streaming responses, error handling, cost monitoring, rate limiting, and usage analytics

Technical Decisions

  • Claude API over OpenAI for better reasoning and instruction following on complex domain queries
  • MCP servers for structured tool use instead of function calling — more reliable for multi-step workflows
  • Hybrid retrieval: semantic search + keyword search for better recall on technical terms
  • Cost monitoring dashboard to track per-user and per-feature AI spend

Code: RAG Pipeline

The retrieval-augmented generation pipeline — semantic search with keyword fallback for technical terms:

// Simplified RAG pipeline
async function answerQuery(query: string, context: UserContext) {
  // 1. Hybrid retrieval: semantic + keyword
  const [semantic, keyword] = await Promise.all([
    vectorStore.similaritySearch(query, { k: 5 }),
    fullTextSearch(query, { boost: ['title', 'code_refs'] }),
  ])

  // 2. Deduplicate and rank by relevance
  const chunks = deduplicateAndRank([...semantic, ...keyword], {
    maxTokens: 4000,
    minScore: 0.7,
  })

  // 3. Build context-aware prompt
  const prompt = buildPrompt({
    systemPrompt: await loadClaudeMd(context.feature),
    retrievedContext: chunks,
    userQuery: query,
    userRole: context.role,
    conversationHistory: context.history.slice(-5),
  })

  // 4. Stream response with cost tracking
  const stream = await claude.messages.stream({
    model: 'claude-sonnet-4-20250514',
    messages: prompt,
    max_tokens: 1024,
  })

  return {
    stream,
    sources: chunks.map(c => c.metadata.source),
    estimatedCost: estimateTokenCost(prompt, 1024),
  }
}

Results

85%

Query resolution without human help

< 2s

Average response time

40%

Reduction in support tickets

$0.02

Average cost per query

Key Takeaways

  • RAG is not "just add embeddings" — chunk size, overlap, and retrieval strategy make or break accuracy
  • MCP servers are a game-changer for giving AI structured access to your product's data and APIs
  • An evaluation pipeline is non-negotiable — you can't improve what you can't measure
  • Cost monitoring from day one prevents surprises when usage scales

Looking to add production AI to your product?

Let's Talk