Every "we built a phone system" blog post eventually arrives at the same question: did you roll your own SBC, or did you build on top of someone else's network? We didn't roll our own. This is the post explaining why that was the right call, and what it actually meant in practice.
The choice you're making
When you build voice, the spectrum looks roughly like this:
| Approach | What you operate | When it makes sense |
|---|---|---|
| Bare-metal SBC | SIP routing, RTP, peering, fraud controls, failover | You have a voice team and >100M minutes/year |
| Cloud SBC + carrier PSTN | Asterisk or FreeSWITCH, BYOC trunks, cloud networking | You want control without metal |
| CPaaS (Telnyx/Twilio/Plivo) | Their API, their network, your product workflow | You're building a product, not a phone company |
The mistake we see startups make is believing tier 1 makes them special. It doesn't. Customers don't care which kernel handled their RTP. They care whether the call connects, sounds clean, and routes to the right person. The tier-3 path gets you to all three faster.
Why Telnyx specifically
We tried the obvious three. The decision came down to four things that matter when you're multi-tenant from day one:
- Programmable Voice that's actually programmable. Telnyx's Call Control gives us a TCP/WebSocket event stream and per-call commands we drive from a Node worker. Our flow engine is essentially a state machine reacting to those events. No proprietary scripting language to learn, no vendor-flavored XML.
- Real-time billing data. Every call leg returns a CDR with carrier costs broken out. We mark those up per-tenant and bill in our own UI without reconciling against a bill that arrives 30 days later. Twilio buries this.
- Number portability without drama. LOAs go through their portal, port-out completes get webhook'd back, and rejected ports come back with a rejection code we can map to plain English for the customer.
- Wholesale-friendly pricing. When we resell, our agency partners pay close to wholesale carrier rates rather than a "communications tax." That's the difference between a 70% partner margin and a 35% one.
What we actually run
The runtime stack is easier to understand as two planes:
Media plane: Browser softphone, desk phone, or PSTN endpoint connects through Telnyx. The audio path stays off our servers.
Control plane: Telnyx emits call-control events, our worker turns those events into flow-engine transitions, and Postgres stores tenant state, flow definitions, and billing facts.
The worker is the interesting part. Every call event - call.initiated,
call.answered, call.hangup, call.dtmf.received - lands on a queue
and a state-machine processor decides what happens next based on the
flow definition for that tenant. The flow editor in the UI compiles
down to that same definition.
We deliberately do not keep call media on our infrastructure. RTP flows between Telnyx's network and the endpoint. We're a control plane, not a data plane. That means:
- No SBC to operate, no DDoS surface for SIP scanners.
- No PCI-style scope expansion when calls get recorded - recordings are Telnyx-stored, fetched on demand by our worker.
- We can add a region tomorrow by changing a config, not by deploying bare metal.
The hard parts the docs don't tell you
Three things ate more time than we expected:
Idempotency on call events. Telnyx will redeliver a call.answered
event if the worker doesn't ACK in time. The naive thing - "respond to
every event" - leads to double-processed states. We key every state
transition by call_control_id + sequence. Cheap, prevents the worst
class of bug.
Multi-tenant DID hygiene. When you have 200 tenants and 8,000
numbers, "this number rang, who does it belong to?" needs to be O(1) on
the hot path. We cache the DID-to-tenant map in the worker process,
hydrate on event, and reload it on number.assigned webhooks. Doing
this in Postgres on the hot path was our first 99th-percentile latency
issue.
10DLC throughput. SMS deliverability is a function of campaign trust score, not raw API capacity. A new campaign starts at low throughput and ramps as carriers see clean traffic. We surface throughput limits and historic delivery rates in the dashboard so customers don't blame the platform when really they need to age their campaign.
What rolling our own would have cost
We did the math. To replicate the Telnyx pieces we use - global PSTN peering, 10DLC submission, LRN dipping, carrier escalation contacts - the floor is roughly:
- 1 senior voice engineer ($300k+ fully loaded)
- ~$8-15k/month in transit + colo for two regions
- 6-9 months before parity with what we'd ship in week one on Telnyx
For a customer base under tens of millions of monthly minutes, that math never closes. Above that, you re-evaluate. We're nowhere near that inflection.
What we'd tell another team
If you're building a voice product and you don't have a voice team, the honest answer is: pick a CPaaS, build on top, treat the carrier as a boring dependency, and spend your engineering budget on the product your customers actually see - the softphone, the flow editor, the analytics, the white-label experience.
The hard problems in business telephony in 2026 aren't on the wire. They're in the UX.
