Architecture
- Voice Interface Agent - An interface agent that uses a low-latency, real-time audio model to handle greetings, chitchat, and simple clarifications directly.
- Primary Agent - The main agent with tools and all the capabilities of Autonomy agents. Handles complex questions, database lookups, and tool-based tasks.
Create a Voice Agent
Add avoice configuration to any agent to enable voice capabilities:
images/main/main.py
Voice Configuration
Thevoice parameter accepts a dictionary with the following options:
| Option | Description | Default |
|---|---|---|
realtime_model | Model for voice agent (must support realtime API) | gpt-4o-realtime-preview |
voice | TTS voice ID (alloy, echo, fable, onyx, nova, shimmer) | echo |
allowed_actions | Actions the voice agent handles without delegating | See below |
instructions | Custom voice agent instructions (auto-generated if not set) | None |
filler_phrases | Phrases to say before delegating to primary agent | See below |
input_audio_format | Audio format for input (pcm16, g711_ulaw, g711_alaw) | pcm16 |
output_audio_format | Audio format for output (pcm16, g711_ulaw, g711_alaw) | pcm16 |
vad_threshold | Voice Activity Detection sensitivity (0.0-1.0) | 0.5 |
vad_prefix_padding_ms | Audio to include before speech detection | 300 |
vad_silence_duration_ms | Silence duration to detect end of speech | 500 |
Default Allowed Actions
By default, the voice agent handles these interactions directly:- Greetings
- Chitchat
- Collecting information
- Clarifications
Default Filler Phrases
Before delegating complex requests, the voice agent says one of:- “Just a second.”
- “Let me check.”
- “One moment.”
- “Let me look into that.”
- “Give me a moment.”
- “Let me see.”
Customizing Behavior
Specify What Voice Agent Handles Directly
Control which interactions the voice agent handles without delegating:images/main/main.py
Custom Filler Phrases
Set context-appropriate filler phrases for your use case:images/main/main.py
VAD Settings for Responsive Interaction
Tune Voice Activity Detection for your environment:images/main/main.py
Voice Agents with Tools
Voice agents work seamlessly with tools. The primary agent has access to all tools and uses them when handling delegated requests:images/main/main.py
- Voice agent says “Let me look up your order.”
- Voice agent delegates to primary agent
- Primary agent calls
lookup_order("12345") - Primary agent returns “Your order has shipped and will arrive tomorrow.”
- Voice agent speaks the response verbatim
Voice Agents with Knowledge
Combine voice with knowledge search for intelligent Q&A:images/main/main.py
Memory Isolation
Voice sessions support the same memory isolation as text conversations. Passscope and conversation parameters when connecting:
Complete Example: Software Engineering Interviewer
This example demonstrates a voice agent that conducts first-round screening interviews for software engineering candidates. The agent assesses technical fundamentals, problem-solving ability, and communication skills.Using the Interviewer
Connect via WebSocket for voice:curl
- Greet the candidate and explain the interview format
- Ask about their background and experience
- Pose technical questions adapted to their level
- Explore behavioral scenarios
- Answer questions about the role
- Provide feedback on their performance

