Voice Agent Design — Finding the Brand's Tessitura

Core thesis: In vocal performance, tessitura is the range where a voice sounds most natural — narrower than the full range. Every brand has a tessitura too. This case study finds Minted's: a 5-state pacing model where each conversational state sits at a measured point within the brand's emotional range (0.83–1.30 speedAlpha). The improvement comes from constraining to tessitura, not from a different voice.

Sample A — Weaker Model Reference

Original test sample — outside the brand's tessitura. Higher bandwidth (32kHz) but length-based pacing. Opening rushes, empathy is flat, +46% pitch spike on goodbye.

0:00

Sample B — Stronger Model Reference

Original test sample — outside the brand's tessitura. Better opening warmth, cleaner brand name. Still length-based pacing — empathy lines show 0% rate delta from informational.

0:00

Baseline — Adjective-Prompted Only Outside Tessitura

Length-based pacing — no tessitura awareness. All states cluster at ~3.4 WPS. Prompted with "friendly, warm, professional, empathetic." No pacing rules. Same voice (Rime Luna).

0:00

Designed — 5-State Pacing Within Tessitura In Tessitura

5-state pacing within Minted's tessitura (0.83–1.30). Each state at a distinct position in the range. Same voice. Different behavior: slower for empathy, moderate for options, firm for boundary, slowest for confirmation, warm for close.

0:00

Prosodic Measurements

Metric	Baseline	Designed	Target	Verdict
Opening pace	3.5 WPS	3.4 WPS	3.2–3.8	✓ In range
Empathy pace	3.4 WPS	2.9 WPS	Distinctly slower than other states	✓ Slowest non-confirmation state
Empathy vs non-empathy avg	−3.0%	+16%	≥15%	✓ 2.9 vs 3.5 avg (+19pp swing)
Empathy pre-pause	~0ms	500ms	250–450ms	Observed — intentionally stronger
Options structure	Stacked in one sentence	Separated into two options	One option per sentence	✓ Easier to follow
Boundary contrast	—	3.3 WPS	Distinct from empathy	✓ +0.4 WPS above empathy
Critical-detail isolation	Dense confirmation sentence	Date and watchpoint broken out	Stand-alone chunks	✓
Closing pitch continuity	+20–46% spike	Consistent	<10%	✓ Same persona at close

Key Finding: TTS Models Don't Know Their Tessitura

TTS models know their range — every speed they can produce. They don't know their tessitura. Rime's speed parameter gives directional control over pacing but not precise per-state control. The TTS engine's internal prosody model partially overrides speed hints non-deterministically. True tessitura-aware pacing requires SSML-level control (pause tags, emphasis markers, rate-per-phrase) or a model that accepts emotional state as an input parameter — not just a global speed knob. This finding applies to any TTS provider using speed-only controls.

What This Proves

Even with limited TTS control, the design intervention produces measurable improvement. The empathy delta swung +19 percentage points (from −3% to +16%). The boundary state creates a distinct register for honesty moments. Critical details are isolated instead of buried in one dense confirmation sentence. With SSML or state-aware TTS, these improvements would be larger and more reliable. The improvement comes from design, not from a different voice.

Brand Tessitura Comparison

Same methodology, different tessitura. Change the brand, change the range.

Solid = proven (voiced and measured). Dashed = designed (tessitura mapped, not yet voiced). Same methodology, different tessitura. Change the brand, change the range.

Built by Mical Neill · Voice: Rime Luna (Arcana) · Scenario: Minted Mother's Day Gift Rescue · 5-State Tessitura-Based Pacing · May 2026