Cartesia
Cartesia
Real-time voice agent TTS built on a state-space model, with Sonic 3 hitting roughly 40–90ms time-to-first-audio and 3-second voice cloning.
Our editorial assessment as of Jun 4, 2026, based on the public sources listed below.
Rank in Voice / TTS
#2
Tier
A
Score
84.0
Underlying model
—
Strengths
- Industry-leading time-to-first-audio (~40–90ms) makes it well-suited to live voice agents
- Free plan plus Pro starting around $4/month (annual) keeps entry cost low for developers
- Instant voice cloning from as little as 3 seconds of reference audio
Considerations
- Multilingual coverage is narrower than major commercial-language competitors
- Out-of-the-box agent personas and prebuilt voices are less extensive than ElevenLabs
- Higher-volume Startup ($39/mo) and Scale ($239/mo) tiers can ramp up quickly
Price
Free plan; Pro from $4/month (annual); Startup $39/month, Scale $239/month
Score breakdown
Performance
92.0
Reputation
80.0
Price
80.0
Recency
95.0
Verification status
Each field carries its own re-verification date. Amber means older than 90 days — treat those figures as stale until re-checked.
Sources
- Cekura — 7 Best TTS APIs for AI Voice Agents 2026Accessed: 2026-06-04
- Deepgram Learn — 10 Best Text-to-Speech APIs in 2025/2026Accessed: 2026-06-04