- •Connecting an AI voice agent to the PSTN is a carrier infrastructure problem, not a software problem.
- •Twilio Programmable Voice is the default PSTN layer, but it is not unified carrier infrastructure.
- •Latency, connection reliability, and five-nines uptime at the carrier layer directly affect AI voice performance.
- •An independent carrier layer keeps numbers in the network so the AI platform can change without a rebuild.
AI voice agents are software. The public telephone network is infrastructure that has been built, regulated, and operated for over a century. Getting the first to work reliably on the second is not a software problem - it is a carrier infrastructure problem. And most organizations deploying AI voice agents do not realize they have made a carrier infrastructure decision until they try to change something.
What is the gap between an AI voice model and the phone network?
An AI voice model - OpenAI, Google, Grok, Azure, or a specialized platform - operates in the cloud. It processes audio, generates responses, and manages conversation logic. What it does not natively do is make and receive phone calls on the public switched telephone network.
The PSTN runs on SIP - Session Initiation Protocol - and requires carrier relationships, number provisioning, and call routing infrastructure that exists entirely outside the AI model's architecture. Connecting an AI agent to a real phone number requires a carrier layer between the AI platform and the telephone network. That carrier layer is where most of the long-term consequences of an AI voice deployment get determined.
How do most AI voice agents connect to the PSTN today?
The dominant approach in the AI voice developer ecosystem is Twilio Programmable Voice. Vapi, Bland, Retell, ElevenLabs, and most emerging AI voice platforms use Twilio as their PSTN layer. You provision a number inside Twilio, wire it to your AI platform through the API, and calls flow. Fast to set up, well-documented - which is why it became the default.
| Twilio Programmable Voice | Independent Carrier Layer | |
|---|---|---|
| Setup speed | Fast: API-driven number provisioning | More setup required upfront |
| Unified routing | Not unified: Programmable Voice & Elastic SIP Trunking are separate products | Single routing layer via SBC across all voice applications |
| AI-to-live-agent failover | Requires custom development | Handled at the carrier layer, no custom dev |
| Number portability when switching AI platforms | Numbers tied to Twilio; platform change requires rebuild | Numbers stay in the network; routing update points at new platform |
| Best fit | Single AI use case, one phone number, low complexity | AI voice alongside contact center, Teams Phone, & outbound dialer |
For straightforward deployments - a single AI use case, one phone number, no integration with other voice applications - Twilio Programmable Voice works. The problem emerges when the deployment grows or the organization's voice environment becomes more complex.
Twilio offers two separate products that serve voice: Programmable Voice for AI and application-layer voice, and Elastic SIP Trunking for UCaaS and contact center connectivity. These are not the same product and they do not share a routing layer. Moving a call from one to the other - say, failing over from an AI agent to a live Teams agent - requires custom development. There is no native cross-product routing built in.
For organizations running AI voice alongside a contact center, Teams Phone, and an outbound dialer - which increasingly describes the enterprise voice environment - the absence of a unified routing layer is a real operational constraint.
Why does carrier-layer performance matter for AI voice?
Regardless of which PSTN layer an organization uses, three infrastructure variables directly affect AI voice performance.
Latency. Conversation quality is affected by the delay between the AI generating a response and that audio reaching the caller. The carrier path contributes to this delay. Low-latency routing is a baseline requirement for natural-sounding AI conversation.
Connection reliability. An AI voice agent for inbound order taking or appointment scheduling is only useful if calls connect. STIR/SHAKEN attestation, route quality, and number reputation affect inbound delivery just as they affect outbound answer rates.
Uptime. An AI voice agent handling the majority of inbound calls at peak hours is a core operational dependency, not a supplemental feature. The reliability standard is five nines. Downtime during peak hours has a direct, calculable cost.
What architecture scales AI voice beyond a single use case?
The alternative to Twilio Programmable Voice as a PSTN layer is an independent carrier infrastructure where numbers live in the network - not in any application - and routing is controlled at the carrier layer.
In this architecture, the AI platform connects to the carrier network as one of several applications. Teams Phone, a Five9 contact center, and an outbound dialer all connect to the same network. The routing layer - an SBC - directs traffic to the right application based on defined rules. When an AI agent transfers to a live agent, that happens at the carrier layer without custom development. When the AI platform changes, numbers stay in the network and routing updates point at the new platform.
It requires more setup than pointing Twilio at an AI platform. But for organizations where AI voice is a durable operational capability rather than a pilot project, it is the architecture that does not need to be rebuilt every time something changes.
The Bottom Line
Twilio Programmable Voice is a reasonable starting point for AI voice PSTN connectivity. It is not a unified carrier infrastructure. For organizations running AI voice alongside other voice applications - and expecting those applications to share routing, failover, and carrier-layer visibility - the gap between a CPaaS API and a carrier-grade network is the gap between a working pilot and a scalable deployment.
Frequently Asked Questions
What is the PSTN and why does it matter for AI voice agents?
The PSTN (Public Switched Telephone Network) is the infrastructure that carries real phone calls. AI voice models operate in the cloud and do not natively connect to it, so a carrier layer must sit between the AI platform and the phone network to provision numbers, handle SIP routing, and deliver calls.
Why do most AI voice platforms use Twilio Programmable Voice?
Twilio Programmable Voice is fast to set up and well-documented, which is why platforms like Vapi, Bland, Retell, and ElevenLabs adopted it as their default PSTN layer. Developers can provision a number inside Twilio, wire it to their AI platform through the API, and calls flow quickly.
What are the limits of Twilio Programmable Voice for enterprise AI voice deployments?
Twilio offers two separate products: Programmable Voice for AI and application-layer voice, and Elastic SIP Trunking for UCaaS and contact center connectivity. These products do not share a routing layer, so moving a call from an AI agent to a live Teams agent requires custom development rather than native cross-product routing.
How does latency at the carrier layer affect AI voice quality?
The carrier path contributes to the delay between the AI generating a response and that audio reaching the caller. Low-latency routing at the carrier layer is a baseline requirement for natural-sounding AI conversation.
What uptime standard should an AI voice carrier layer meet?
An AI voice agent handling the majority of inbound calls at peak hours is a core operational dependency, not a supplemental feature. The reliability standard is five nines. Downtime during peak hours has a direct, calculable cost.
How does an independent carrier layer make it easier to switch AI voice platforms?
When numbers live in the network rather than inside any single application, switching AI platforms does not require a rebuild. Routing updates at the carrier layer point at the new platform, and the phone numbers stay in place.
