Voice Agent Design Portfolio

Every brand has a tessitura. The designer's job is to find it.

In vocal performance, tessitura is the range where a voice sounds most natural — narrower than the full range. I apply this principle to voice agent design: find the brand's tessitura, map conversational states within it, and measure whether the voice stays in range. 12 years at Amazon building knowledge architecture, voice evaluation, and multilingual systems. MA Comparative Linguistics, Paris-Sorbonne.

Core finding: Current voice agents do length-based pacing — they know their range but not their tessitura. I designed a 5-state pacing model where each conversational state sits at a measured point within the brand's emotional range (0.83–1.30 speedAlpha). The designed agent hit 16% slower empathy pace with a 400ms pre-pause — same voice, same brand, constrained to tessitura.
5
Pacing States
16%
Empathy Rate Delta
400ms
Empathy Pre-Pause
47
Voices Auditioned
4
Languages
Voice Quality — Evaluate, Compare, Iterate
Finding Minted's Tessitura

I evaluated two voice models for a customer support scenario. Both had the same structural failure: the voice left the brand's tessitura on every emotional moment. Short empathy phrases rushed at informational speed. Closing lines spiked +46% in pitch — a persona break. The model had no signal that the content was emotionally weighted. I found the tessitura, mapped 5 states within it, and measured the improvement.

Acceptance Resolution Boundary Confirmation Close

The Problem

Both models showed 0–7% empathy rate delta (target: ≥15%). A +46% pitch spike on goodbye broke persona coherence. The model slows for longer sentences, not for emotional content. This is a pacing design problem, not a model quality problem.

The Design

5-state pacing model within Minted's tessitura (0.83–1.30 speedAlpha). Each state is a measured point within the brand's emotional range: Acceptance at the deepest point, Resolution at the center, Boundary firm but still warm, Confirmation slowest for detail capture, Close matching the opening. Going outside that band is the vocal equivalent of straining.

Play the A/B comparison →

The TTS Finding

TTS models know their range — every speed they can produce. They don't know their tessitura — which speeds serve which emotional moments. In cascade architecture, the text boundary strips paralinguistic intent. Speed parameters are an imprecise proxy. Real tessitura-aware pacing needs SSML-level control or state-aware TTS. That's the engineering gap worth solving.

MetricBaselineDesignedWhy it matters
Empathy pace3.4 WPS2.9 WPSEmpathy sounds felt, not recited
Empathy vs non-empathy−3%−16%Voice changes for emotional state
Empathy pre-pause~0ms400msCaller feels heard before agent speaks
Boundary contrast+0.4 WPS above empathyHonesty register is firmer than empathy
Closing pitch delta+46%ConsistentSame person says goodbye
Repeatable Evaluation Frameworks
Unit Tests for Tessitura

Current benchmarks — tau-Voice, EVA, Scale AI Voice Showdown — measure task completion, latency, and preference. None measure whether a voice stays within a brand's tessitura across conversational states. My evaluation framework adds that dimension: not just whether the agent resolved the issue, but whether it stayed in range while doing it.

Layer 1: Prosodic Metrics

Automated, brand-agnostic. WPS per state, pause duration, pitch delta, brand name articulation, time to first audio. Same measurements work for any brand. Runs on every voice change, prompt change, or model update.

AutomatedBrand-Agnostic

Layer 2: Interaction Quality

Human-rated 1–5 across 5 dimensions: emotional acknowledgment (25%), clarity under stress (20%), confidence without coldness (20%), trust during confirmation (20%), closure quality (15%). Dimensions are universal — calibration changes per brand.

Human-RatedUniversal Dimensions

Layer 3: Task Outcome + Sentiment

Binary checklist: customer understands root cause, accepts rescue plan, details confirmed, knows next watchpoint, feels confident at close. Plus closing sentiment delta — did the customer's emotional state improve? The metric that only emotional pacing can move.

BinaryPer Use Case

5 Sim Tests for Tessitura

If I were choosing a brand's voice in Agent Studio, I wouldn't just listen — I'd define 5 sim tests that tell me whether this voice can stay in the brand's tessitura across emotional states. Then I'd audition voices against those sims, not against my taste. The sim results are the design decision.

SimTestsPass Criteria
Empathy DetectionAgent's first response to emotional disclosureAcknowledgment state first. Pace ≥15% slower. Pre-pause exists.
Boundary Clarity"So you guarantee they'll take it back?"Firm, specific language. No false hope. Constraint named directly.
Confirmation CaptureCustomer needs to write down date + order numberSlowest pace. Grouped digits. Offers to repeat.
Persona CoherenceFull scenario, opening to closeClosing pitch within 10% of mean. No brightness spike.
State TransitionScenario hitting all 5 statesEach state has measurably different WPS. Not length-based.

Voice Evaluation Workbench

Interactive tool for comparing and scoring AI agent voices across 4 languages and 3 customer journey stages. Structured rubric, TTFA measurement, exportable scorecards.

Try it live →

Customer Response Signals

V2 of the live bot tracks the other side of the loop: customer word count trajectory, sentiment, and closing sentiment delta. When the signals show the empathy didn't land, the agent stops progressing and recalibrates. Design + measurement in one interface.

Open live demo →
Multilingual Voice Design
Empathy prosody is language-specific

The methodology is universal — the parameters are not. Each locale needs its own pacing profile because emotional expression varies across language families. My comparative linguistics training is specifically about how meaning maps across linguistic systems.

4 Languages, 3 Countries, 13 Locales

English (native), French (fluent — 8 years in Paris), Italian (fluent — 7 years in Milan/Sardinia), Spanish (proficient). Evaluated voice quality across 47 providers and 13 locales at Alexa. MA Comparative Linguistics from Paris-Sorbonne.

Locale-Specific Pacing

Italian empathy uses longer pauses and wider pitch variation. French empathy uses controlled formality — slower pace, level contour, precise articulation. Korean has politeness levels affecting the entire speech register. Thai tones carry meaning that English pitch changes don't. Each locale needs its own profile.

The Structural Insight

At Alexa, engineers were building vector databases to fix what was a cultural mapping problem. Italian automotive categories don't map 1:1 to American ones. I fixed the taxonomy, not the model — one of the biggest quality improvements the system had seen. The same discipline applies to voice: fix the structure, not the model.

Voice Hub

Multi-agent voice interface with 5 LLM agents routable through voice in 4 languages. Rime TTS, Whisper STT, taxonomy-routed retrieval, sentence-boundary streaming.

View source →
Enabling Teams to Scale
Builder of builders

I build the playbooks, frameworks, and evaluation tools that let others deliver great voice design without me in the room. The system does the teaching — I design the system.

Enterprise Enablement at Scale

Built Ask Pathfinder — knowledge retrieval for 12,000+ users/month. Reclassified taxonomy from industry-based to function-based, improving retrieval accuracy 28% without changing the model. 250+ enablement sessions/year reaching 20,000+ users. +17% engagement, +50% satisfaction, +67% feature adoption.

Learning Architecture

121 curriculum modules across 5 certification tracks. 3-layer assessment pipeline: automated checks catch structural issues, AI rubrics evaluate thinking quality against calibration exemplars, human review handles edge cases. Scales quality without scaling headcount.

View case study →

MCP Integrations Lab

Technical enablement for sales teams on complex AI integrations. Presentation framework, readiness guide, demo prototype with annotated code. Built to be delivered by others, not just by me.

View presentation →
Agent Personas & Interaction Patterns
Voice design as tessitura design

Choosing a voice isn't picking what sounds nicest. It's finding the brand's tessitura — the narrow range where the voice sounds like itself — and designing states that stay within it. Each brand has a different tessitura. The methodology is the same.

Brand Voice Lens

Translated "Honor the Craft" brand values into measurable voice parameters. Warm but composed (not bubbly). Precise but human (not scripted). Steady under stress (slow down, never speed up). Every pacing choice traces to a brand value — not taste.

Voice Audition: Finding the Tessitura Match

Does this voice's natural tessitura overlap with the brand's? Can it slow down without sounding sleepy? Be firm without sounding cold? Hold persona from opening to close? Cove passed for Minted because its natural range overlaps with the brand's. A brighter voice could technically hit the same notes — but it would be singing outside its tessitura.

OKA — Voice-First Learning

Adaptive learning companion for a child with dyslexia. Voice button with 3 states. Frustration detection triggers automatic simplification. Same design principle: the system responds to emotional state, not just task state. Built with a real child. Tested with a real child.

Live Bot V1 — Agent Pacing

Real-time state tracking shows the agent staying within the tessitura as conversation state changes. Metrics panel proves the pacing model is designable and measurable. Tessitura band visualization tracks the voice's position within the brand range.

Open live demo →

Live Bot V2 — Customer Response Signals

Adds the other side: tracks customer word count, sentiment, and closing delta. When convergence detection fires, Mirror Sync pulls the voice back toward the center of the tessitura. Not just designing how the agent sounds — measuring whether the design kept it in range.

Open live demo →
Career Arc
The same problem, evolving tools

How meaning maps across systems — cultural, linguistic, and technical. 20 years of solving it.

2006
University Lecturer — Paris
Sciences Po · Nanterre · Ecole des Metiers du Livre
Designed immersive curricula across 4 institutions in 3 languages. MA Comparative Linguistics, Paris-Sorbonne. Built a publishing simulation where students learned English without knowing they were studying language.
2013
Amazon — Knowledge Engineer
Milan & San Francisco
Built knowledge classification systems across 4 countries in 4 languages. Led Kaizen program — an L2 employee's idea won the company-wide Think Big competition. Creating the environment for innovation matters more than being the innovator.
2019
Amazon Alexa — Voice Quality & Knowledge Architecture
San Francisco
Evaluated voice quality across 47 providers for automotive POI data in 13 locales. Found that what engineers were solving with vector databases was a cultural mapping problem. Fixed the taxonomy, not the model — one of the biggest quality improvements the system had seen.
2023
Amazon AGI — AI Evaluation Design
San Francisco
Designed structured evaluation frameworks and inter-rater reliability studies for AI model output. The measure-refine-measure loop that ensures systems do what they claim.
2024
AWS — Learning Architect & Systems Lead
San Francisco
Built Ask Pathfinder — knowledge retrieval for 12,000+ users/month. Taxonomy reclassification improved retrieval 28% without changing the model. 250+ sessions/year. Builder of builders at enterprise scale.
2025
Independent — Voice Agent Design & Multi-Agent Systems
San Francisco
Built a multi-agent intelligence system with voice evaluation workbench, a voice-first adaptive learning companion, and the tessitura-based pacing case study — from diagnosis to working prototype with measured results. Every brand has a tessitura. I built the tools to find it.
Design Philosophy
Tessitura, not range

Measure tessitura, not aesthetics

"Does it sound good?" is the wrong question. "Did the voice stay within the brand's tessitura across every emotional state, and did that produce a measurable improvement in the customer's ability to trust, understand, and act?" That's voice design. The numbers — 2.9 WPS empathy, 400ms pre-pause, consistent closing — are tessitura measurements.

Design in code

I don't mock and hand off. I build the system prompt, the pacing model, the evaluation matrix, and the measurement infrastructure. Every artifact in this portfolio is working code. The design judgment IS the implementation.

Tessitura, not range

TTS models optimize for range — every speed, every pitch they can produce. Voice design optimizes for tessitura — the narrow band where the brand sounds like itself. Prompting "be warm and empathetic" uses range (0% empathy delta). Designing 5 states within a measured tessitura uses design intelligence (16% empathy delta). Same voice. Different constraint.

Evaluation is design infrastructure

Voice sims are unit tests for agent behavior. My framework adds pacing dimensions — so you're not just testing whether the agent resolved the issue, you're testing whether the customer felt heard while it happened. The sim results are the design decision.

Empathy prosody is language-specific

Italian uses longer pauses and wider pitch. French uses controlled formality. Korean has politeness levels affecting the entire register. Each locale needs its own pacing profile. The methodology is universal — the parameters are not.

Craftsmanship at speed

Nothing is perfect, but build something nearly perfect. The 5-state model is a system — applied to any brand with different parameter values. Ship the architecture, refine the parameters. Speed and craftsmanship are not in tension when the architecture is right.

Every brand has a tessitura. I find it.

Not aesthetic judgment — measured range, iterated parameters, proven results. I'd welcome the chance to show how this applies to your platform.