December 15, 2025

The Model Constellation Gambit

Because generalist models are non-deterministic, AI application layer companies cannot trust the output. To compensate, they build Constellations, complex model chains where a router classifies the input, a frontier model creates a draft, a supervisor model grades it, a reasoning model critiques it and reprompts to fix errors (as one example).

Some present this Rube Goldberg machine as a flex, proof of their sophisticated technology and a reason to persuade enterprise customers they can’t possibly do this themselves. In reality, It is an admission that their engine is unreliable. Here is why the Constellation approach is an architectural trap:

‍The Physics of Stacked Error Rates‍

When you chain probabilistic models, errors do not cancel; they compound. A workflow with 5 steps where each model is 95% reliable (being generous), the math is unforgiving: 0.95^5=77%.

The Latency Spiral

In a live CX system, latency is the enemy. The ear easily detects pauses of 500ms. The router, foundation, supervisor and reprompting models all add latency. The network hops between the private lab hosted endpoint and wherever they’re hosting the other models adds time. Suddenly, the customer is waiting multiple seconds or longer for a reply.

‍Economic Implications (Tokens & Compute)‍

Reprompting is the most expensive and least certain way to try to address reliability. When a supervisor model detects a problem, the system must discard the first answer and pay for another model (which burns 3x+ the tokens) to try again. These systems are paying for the mistake and the correction (if it can even be corrected). Over millions of transactions, the cost delta between one-shot-correct and generate-check-regenerate-route is the difference between software margins and no margins.

Infrastructure Fragility‍

The Constellation relies on a fragile web of disparate providers. The base model might be an OpenAI endpoint. The supervisor model might be running in a separate tenant on Azure or AWS. If any single API in this chain degrades, the entire workflow fails. The system has introduced multiple points of failure.

The Prompt Maintenance Nightmare

Finally, there is the human cost. In a Constellation, you are not just prompting one model; you are maintaining a delicate equilibrium between many models. When one provider updates the model weights it will start confusing the others. The engineering team is trapped in an endless cycle of updating prompts for huge numbers of models to keep the Constellation aligned. It is a fragile equilibrium that breaks at scale.

The Constellation is a gambit, not a moat; an attempt to cast technological weakness as a reason customers should fear in-sourcing. And a fragile attempt to force a probabilistic poet to act like a deterministic banker.

Dan Roth

Co-Founder & CEO Scaled Cognition

Dan is Co-Founder and CEO of Scaled Cognition. Before starting Scaled Cognition, he was CVP of Conversational AI at Microsoft, Co-Founder and CEO of Semantic Machines (acquired by Microsoft), Shaser BioScience (acquired by Spectrum Brands), Voice Signal Technologies (acquired by Nuance).