Case Study

Generative AI

Multimodal AI Chat Platform

A chat platform handling text, image, voice and document inputs across three frontier models — built in eight weeks.

Next.jsTypeScriptOpenAIAnthropicLangChain

At a glance

What this shipped

The numbers that mattered to the client — measured before and after.

<800ms
First-token latency
−38%
Cost / conversation
8 weeks
Build duration
4
Models supported

The problem

What we were called in to fix

The client, an AI startup, needed a single product where users could chat across modalities — paste an image and ask about it, transcribe a voice note, upload a PDF and query it — using whichever of GPT-4o, Claude or Gemini was best for the job.

Their existing prototype could do one model with text-only. Latency was several seconds, conversation memory was broken, and adding a new modality meant rewriting half the stack.

Our approach

How we actually built it

No magic — just the right architectural calls in the right order.

We rebuilt the message pipeline around a provider-agnostic interface so model selection, model swap and cost routing all happen in one place.

Streaming-first architecture: tokens render the moment the LLM emits them. No spinner-on-spinner experience.

Modality handlers (text, image, audio transcription, document RAG) compose cleanly — each one is a small, testable module rather than a special case in the chat loop.

Conversation memory is durable and queryable in Postgres, with a summary-on-overflow strategy so long threads stay coherent without blowing the context window.

The outcome

What changed for the client

First-token latency under 800ms across all three providers, end-to-end.

Added Claude 3.5 Sonnet as a fourth model behind the same interface in under two days.

Reduced inference cost per conversation by 38% via smart model routing — cheap models for cheap turns, frontier models when complexity demands it.

Tech stack

Every meaningful piece

Next.js 15TypeScripttRPCOpenAIAnthropicGoogle GeminiLangChainpgvectorPostgreSQLRedisAWS

We don't do generic case-study writeups. Want the unredacted version with names, screenshots and architecture diagrams? We share those on a call.

Yours could be next

Have a project that needs shipping?

Send us a short brief and you'll have a clear scope, fixed quote and timeline within 24 hours.