Quantal AI
A RAG assistant over millions of documents with evaluation and guardrails.
- Client
- Quantal
- Year
- 2025
- Duration
- 4 months
- Our role
- AI Strategy · Engineering
The project
Quantal's teams were drowning in documents. They needed an AI assistant that could answer accurately from millions of internal files — safely and economically.
The challenge
Generic AI hallucinated and leaked context. Quantal needed grounded, cited answers, strict access control, and predictable cost at enterprise scale.
Our solution
A retrieval-augmented assistant with hybrid search, citations, an automated evaluation harness, guardrails, and per-team access control — deployed privately on Quantal's cloud.
Services
Technologies
The stack behind it
Frontend
Next.js · React · TypeScript
Backend
Python · FastAPI
Database
PostgreSQL · pgvector
Cloud
AWS (private VPC)
DevOps
Docker · GitHub Actions
AI Integration
OpenAI · LangChain · evals
Third-party APIs
Internal document stores
Security
Per-team access · no public training
Architecture
RAG with hybrid search + guardrails
How it came together
Business problem
Teams couldn't find answers across millions of documents — and generic AI wasn't trustworthy.
Research
We profiled the document corpus, access rules and the questions teams actually asked.
Strategy
Ground every answer in retrieval, cite sources, and measure quality continuously.
Design process
A chat UI built around trust — inline citations and clear sourcing.
Development
Hybrid RAG, guardrails, an evaluation harness and per-team access control.
Testing
Automated evals on a labelled question set, plus red-teaming for safety.
Deployment
Private deployment on Quantal's AWS VPC with monitoring.
Results
99.2% eval pass, sub-500ms answers across 4M+ documents.
Lessons learned
Evaluation-as-code turned 'it feels right' into a measurable, improvable number.
- A trustworthy chat UI with inline citations
- Clear sourcing so users can verify every answer
- Admin view for evaluation and cost monitoring
Documents chunked and embedded into pgvector; hybrid (vector + keyword) retrieval feeds the model with cited context; guardrails and evals run on every response; access is enforced per team.
Quantal teams now get accurate, cited answers in seconds — with measured quality and controlled cost.
99.2%
Eval pass rate
<500ms
Median response
4M+
Documents indexed
4 mo
To production
Inside the product
“Their AI work is the real thing: evaluated, guard-railed and in production.”
Have a project like Quantal AI?
Let's talk — a senior engineer replies within one business day.