About Quelm
We built Quelm because the tooling layer for LLM-powered products has a gap that no observability dashboard, format validator, or gateway product closes: the gap between what happened and whether it was correct.
The problem we're solving
Building on LLM APIs is fundamentally different from building on any other API. A REST endpoint either works or it doesn't. An LLM endpoint returns something plausible-looking every time — even when the behaviour has silently changed, the output contradicts itself, or the model you're calling today is not the model you called last month.
In April 2025, a silent update to GPT-4o introduced extreme sycophantic behaviour within 48 hours — without a changelog entry, without a version bump, without any alert to the teams whose products broke. In August 2025, Anthropic infrastructure bugs degraded Claude's quality across 30% of production calls for several weeks before any official acknowledgement. Google silently redirected a dated model endpoint to a different build with no notice.
These are not edge cases. They are the normal operating conditions of LLM production systems in 2026. The teams building on these APIs deserve the same reliability infrastructure that every other part of the software stack takes for granted: regression testing, live monitoring, and real-time output validation. That infrastructure does not exist yet. Quelm is building it.
How the platform works
Quelm operates across three layers, each targeting a different failure mode at a different point in time. Layer 1 fires on a schedule before any user sees a change. Layer 2 fires asynchronously after every live response. Layer 3 fires inline at the exact instant of generation.
Quelm reliability stack
Three layers · one SDK · no proxy
last 7 runs
baseline · cosine similarity
prompt_001 cos_sim 0.97
prompt_002 cos_sim 0.71 !
... ...
prompt_003 cos_sim 0.37 !
regression suite
golden set: 47 prompts
passed: 45
failed: 2
+ auto-promoted
3 new from traffic
↓
SDK integration
npm install 'quelm'
import { quelm } from '@quelm/sdk'
const client = quelm.wrap(
new Anthropic({ ... })
)
live traffic
→
agent
24h traffic summary
provider drift signal:
fleet advisory issued
detected 5.4h before alerts
↓
output fields
duration_months: 12
monthly_fee: 2500
total_value: 30000
date_start: 2026-01-15
date_end: 2026-04-15
12 × 2500 = 30000 → ✓
date_start < date_end → ✓
span = 3 months ≠ 12 declared → ✗
certification engine
recomputing declared relationships...
✓ arithmetic: 12×2500=30000
✗ date span ≠ declared months
fingerprint:
sha256: 9f3a...c7e1match
Our approach
SDK, not proxy
Quelm runs as a lightweight agent inside your infrastructure. No traffic routes through our servers. Your API keys stay with you. GDPR, HIPAA, and SOC 2 compliance is met by architecture, not policy.
Cross-provider
Native provider tooling cannot instrument its own silent updates. Quelm aggregates anonymised signals across customers and providers simultaneously — detecting fleet-wide drift hours before individual teams notice.
External verification
Layer 3 certification is not a further model call. It is a deterministic computation. The same system that generates errors cannot generate the certificate that catches them — that is the architectural guarantee.
Who's behind Quelm
Georges Lieben
Co-founder
Georges has spent his career building companies at the intersection of technology, energy, and automation. He co-founded June Energy, a smart energy platform that automatically switches 20,000+ Belgian households to the best available tariff — an early exercise in deploying reliable, autonomous decision-making at scale in a regulated, high-stakes environment.
He has followed the evolution of generative AI from the beginning and has been integrating LLMs into product workflows since the early API era. His writing focuses on what he calls the shift from AI-as-conversation to AI-as-execution: the move from chatbots to autonomous operational layers. That shift is precisely what makes LLM reliability a first-order infrastructure problem — and what convinced him to build Quelm.
His companies today employ over 100 people. He is based between Antwerp and Porto.
Tiemen Schotsaert
Co-founder
Tiemen is Operations Director Property BeLux at CED, one of Europe's leading independent claims management organisations. In that role he oversees large-scale property claims processing across Belgium and Luxembourg — managing expert networks, insurer relationships, and the operational workflows that turn damage reports into settled claims.
Claims processing is an ideal stress test for LLM reliability: outputs are structured, errors have direct financial and legal consequences, and silent failures — a misread date, a misattributed amount, an internally inconsistent summary — can propagate through insurer systems for weeks before anyone notices. Tiemen brings the domain perspective that shapes Quelm's design from the use case inward rather than from the technology outward.
He holds a degree from KU Leuven and is based in the Ghent metropolitan area.