AI Rankings
Back to task

Cartesia

Cartesia

Real-time voice agent TTS built on a state-space model, with Sonic 3 hitting roughly 40–90ms time-to-first-audio and 3-second voice cloning.

Our editorial assessment as of Jun 4, 2026, based on the public sources listed below.

Rank in Voice / TTS
#2
Tier
A
Score
84.0
Underlying model

Strengths

  • Industry-leading time-to-first-audio (~40–90ms) makes it well-suited to live voice agents
  • Free plan plus Pro starting around $4/month (annual) keeps entry cost low for developers
  • Instant voice cloning from as little as 3 seconds of reference audio

Considerations

  • Multilingual coverage is narrower than major commercial-language competitors
  • Out-of-the-box agent personas and prebuilt voices are less extensive than ElevenLabs
  • Higher-volume Startup ($39/mo) and Scale ($239/mo) tiers can ramp up quickly

Price

Free plan; Pro from $4/month (annual); Startup $39/month, Scale $239/month

Score breakdown

Performance
92.0
Reputation
80.0
Price
80.0
Recency
95.0

Verification status

Each field carries its own re-verification date. Amber means older than 90 days — treat those figures as stale until re-checked.

Pricing
2026-06-04
source ↗
Benchmark
2026-06-04
source ↗
Reputation
2026-06-04
source ↗
Feature list
2026-06-04
source ↗

Sources