A great voice agent lives or dies on latency. If a human has to wait more than a heartbeat for a reply, the conversation stops feeling natural and starts feeling like an IVR. We obsess over this internally, and over the last year we have pushed the MediaBloom Agent Runtime to consistent sub-second turns across every deployment, from real estate intake bots to insurance renewal calls at scale.
The runtime is structured as a streaming pipeline. Audio chunks flow into a speech recognizer the moment they are captured, partial transcripts flow into a planner while the caller is still speaking, and the planner streams tokens back out while tools run in parallel. Nothing waits for a full turn. The moment we have enough signal to act, we act. That single design choice is responsible for most of the latency wins our customers see versus the legacy dialer stacks they replace.
Barge-in was the hardest piece. Humans interrupt. They interrupt to correct, to redirect, to agree, and to argue. An agent that keeps talking over them is unusable. We built a duplex audio path that can detect voice activity while the agent is still speaking, cancel the in-flight generation without losing state, and smoothly pivot to the new intent. When we nailed barge-in, our user testing scores for "felt natural" jumped more than any other change we have shipped.
The tool loop is where real business value gets created. Agents call typed actions — book a tour, create a case, charge a card, update an opportunity — and the runtime handles retries, idempotency, and audit logging. Every tool call is signed and replayable. If a call fails halfway through, the runtime knows exactly which step to resume from without double-charging a customer or double-booking a calendar.
Under the hood, we treat the LLM as one participant in a larger system, not as the system itself. A deterministic state machine sits above the model and owns the conversation flow: who is speaking, what turn we are on, which tools are in-flight, and what invariants must hold before we move forward. The model contributes intelligence; the state machine contributes reliability.
Observability is baked in at every layer. Every turn produces a structured trace with the recognized transcript, the planner’s reasoning, the tools attempted, their arguments, their responses, and the final reply. Teams using MediaBloom do not debug production agents the way they debug LLM prototypes — they open a single trace and see the full story.
The payoff is a product that feels less like software and more like a coworker. Our customers deploy these agents to take calls at 2am, qualify leads over SMS at 11am, and close deals inside the CRM by lunch, and the humans on the other end keep saying the same thing: it did not feel like AI. That is the bar we hold the runtime to, and everything we ship is measured against it.



