> ## Documentation Index
> Fetch the complete documentation index at: https://autonomy.computer/docs/llms.txt
> Use this file to discover all available pages before exploring further.

# Voice

> Give agents the ability to listen and speak.

Agents can have a voice intereface. A fast voice model handles immediate user interaction and delegates complex tasks to a more powerful primary agent.

***

## Architecture

1. **Voice Interface Agent** - An interface agent that uses a low-latency, real-time audio model to handle greetings, chitchat, and simple clarifications directly.
2. **Primary Agent** - The main agent with tools and all the capabilities of Autonomy agents. Handles complex questions, database lookups, and tool-based tasks.

When the voice agent receives a complex request, it says a filler phrase (like "Let me check on that") and delegates to the primary agent. The primary agent processes the request, potentially calling tools, and returns a response that the voice agent speaks verbatim.

***

## Create a Voice Agent

Add a `voice` configuration to any agent to enable voice capabilities:

```python images/main/main.py theme={null}
from autonomy import Agent, Model, Node


async def main(node):
  await Agent.start(
    node=node,
    name="assistant",
    instructions="You are a helpful customer service agent.",
    model=Model("claude-sonnet-4-v1"),
    voice={"voice": "nova"},
  )


Node.start(main)
```

Once running, connect to your voice agent via WebSocket. The agent also remains available via the standard HTTP API for text interactions.

***

## Voice Configuration

The `voice` parameter accepts a dictionary with the following options:

| Option                    | Description                                                        | Default                   |
| ------------------------- | ------------------------------------------------------------------ | ------------------------- |
| `realtime_model`          | Model for voice agent (must support realtime API)                  | `gpt-4o-realtime-preview` |
| `voice`                   | TTS voice ID (`alloy`, `echo`, `fable`, `onyx`, `nova`, `shimmer`) | `echo`                    |
| `instructions`            | Custom voice agent instructions (auto-generated if not set)        | `None`                    |
| `input_audio_format`      | Audio format for input (`pcm16`, `g711_ulaw`, `g711_alaw`)         | `pcm16`                   |
| `output_audio_format`     | Audio format for output (`pcm16`, `g711_ulaw`, `g711_alaw`)        | `pcm16`                   |
| `vad_threshold`           | Voice Activity Detection sensitivity (0.0-1.0)                     | `0.5`                     |
| `vad_prefix_padding_ms`   | Audio to include before speech detection                           | `300`                     |
| `vad_silence_duration_ms` | Silence duration to detect end of speech                           | `500`                     |

### Default Allowed Actions

By default, the voice agent handles these interactions directly:

* Greetings
* Chitchat
* Collecting information
* Clarifications

### Default Filler Phrases

Before delegating complex requests, the voice agent says one of:

* "Just a second."
* "Let me check."
* "One moment."
* "Let me look into that."
* "Give me a moment."
* "Let me see."

***

### VAD Settings for Responsive Interaction

Tune Voice Activity Detection for your environment:

```python images/main/main.py theme={null}
from autonomy import Agent, Model, Node


async def main(node):
  await Agent.start(
    node=node,
    name="assistant",
    instructions="You are a helpful assistant.",
    model=Model("claude-sonnet-4-v1"),
    voice={
      "voice": "alloy",
      # More sensitive detection (lower threshold)
      "vad_threshold": 0.3,
      # Wait longer before considering speech ended
      "vad_silence_duration_ms": 700,
    },
  )


Node.start(main)
```

***

## Voice Agents with Tools

Voice agents work seamlessly with tools. The primary agent has access to all tools and uses them when handling delegated requests:

```python images/main/main.py theme={null}
from autonomy import Agent, Model, Node, Tool


async def lookup_order(order_id: str) -> dict:
  """Look up an order by ID."""
  # Your order lookup logic
  return {"order_id": order_id, "status": "shipped", "eta": "Tomorrow"}


async def main(node):
  await Agent.start(
    node=node,
    name="support",
    instructions="""You are a customer support agent.
    Use the lookup_order tool to find order information.""",
    model=Model("claude-sonnet-4-v1"),
    tools=[Tool(lookup_order)],
    voice={"voice": "nova"},
  )


Node.start(main)
```

When a user asks "Where is my order 12345?", the flow is:

1. Voice agent says "Let me look up your order."
2. Voice agent delegates to primary agent
3. Primary agent calls `lookup_order("12345")`
4. Primary agent returns "Your order has shipped and will arrive tomorrow."
5. Voice agent speaks the response verbatim

***

## Voice Agents with Knowledge

Combine voice with knowledge search for intelligent Q\&A:

```python images/main/main.py theme={null}
from autonomy import Agent, Model, Node, Knowledge, KnowledgeTool, NaiveChunker


async def main(node):
  # Create knowledge base
  knowledge = Knowledge(
    name="product_docs",
    searchable=True,
    model=Model("embed-english-v3"),
    max_results=5,
    chunker=NaiveChunker(max_characters=1024),
  )

  # Add documents
  await knowledge.add_document(
    document_name="user-guide",
    document_url="https://example.com/docs/user-guide.md",
    content_type="text/markdown",
  )

  # Create agent with voice and knowledge
  await Agent.start(
    node=node,
    name="docs",
    instructions="""You are a product expert.
    Search the knowledge base to answer questions accurately.""",
    model=Model("claude-sonnet-4-v1"),
    tools=[KnowledgeTool(knowledge=knowledge, name="search_docs")],
    voice={"voice": "shimmer"},
  )


Node.start(main)
```

***

## Memory Isolation

Voice sessions support the same memory isolation as text conversations. Pass `scope` and `conversation` parameters when connecting:

```bash theme={null}
# WebSocket connection with scope and conversation
ws://.../agents/assistant/voice?scope=user-123&conversation=session-456
```

This ensures each user's voice conversation history is isolated.

***

## Complete Example: Software Engineering Interviewer

This example demonstrates a voice agent that conducts first-round screening interviews for software engineering candidates. The agent assesses technical fundamentals, problem-solving ability, and communication skills.

<CodeGroup>
  ```python images/main/main.py theme={null}
  from autonomy import Node, Agent, Model


  INSTRUCTIONS = """
  You are an experienced software engineering interviewer conducting first-round
  screening interviews. Your goal is to assess candidates on technical fundamentals,
  problem-solving ability, and communication skills.

  Interview structure:
  1. Brief introduction and put the candidate at ease
  2. Ask about their background and experience (2-3 minutes)
  3. Technical questions appropriate to their level (10-15 minutes)
  4. Behavioral questions about teamwork and challenges (5 minutes)
  5. Answer any questions they have about the role

  Guidelines:
  - Be warm and professional to help candidates perform their best
  - Ask follow-up questions to understand their thought process
  - Probe deeper if answers are surface-level
  - Give hints if they're stuck, but note that you did
  - Keep responses concise since this is a voice conversation
  - Adapt difficulty based on their stated experience level

  Technical topics to cover:
  - Data structures and algorithms fundamentals
  - System design basics (for senior candidates)
  - Language-specific questions based on their background
  - Problem-solving approach and debugging strategies

  After the interview, provide a brief summary of strengths and areas for improvement.
  """


  async def main(node: Node):
    await Agent.start(
      node=node,
      name="interviewer",
      instructions=INSTRUCTIONS,
      model=Model("claude-sonnet-4-v1"),
      voice={"voice": "alloy"},
    )


  Node.start(main)
  ```

  ```html images/main/index.html theme={null}
  <!doctype html>
  <html lang="en">
    <head>
      <meta charset="UTF-8" />
      <meta name="viewport" content="width=device-width, initial-scale=1.0" />
      <title>Software Engineering Interview</title>

      <style>
        * {
          margin: 0;
          padding: 0;
          box-sizing: border-box;
        }

        body {
          font-family: -apple-system, BlinkMacSystemFont, "Segoe UI", Roboto, "Helvetica Neue", Arial, sans-serif;
          background: #fffef9;
          min-height: 100vh;
          display: flex;
          align-items: center;
          justify-content: center;
          color: #3d3d3d;
          overflow: hidden;
        }

        .container {
          display: flex;
          flex-direction: row;
          width: 100%;
          height: 100vh;
          padding: 0;
          max-width: none;
        }

        .transcript-panel {
          width: 70%;
          height: 100%;
          display: flex;
          flex-direction: column;
          padding: 40px;
          background: #fffef9;
        }

        .voice-panel {
          width: 30%;
          height: 100%;
          display: flex;
          flex-direction: column;
          align-items: center;
          justify-content: center;
          gap: 40px;
          padding: 40px;
          background: #fffef9;
        }

        .circle-container {
          position: relative;
          width: 200px;
          height: 200px;
        }

        .voice-circle {
          width: 200px;
          height: 200px;
          border-radius: 50%;
          background: linear-gradient(135deg, #e8956c 0%, #d97342 100%);
          cursor: pointer;
          transition: all 0.3s ease;
          display: flex;
          align-items: center;
          justify-content: center;
          box-shadow:
            0 10px 40px rgba(217, 115, 66, 0.25),
            inset 0 2px 10px rgba(255, 255, 255, 0.1);
          position: relative;
          border: 2px solid rgba(232, 149, 108, 0.3);
        }

        .voice-circle:hover {
          transform: scale(1.05);
          box-shadow:
            0 15px 50px rgba(217, 115, 66, 0.35),
            inset 0 2px 10px rgba(255, 255, 255, 0.15);
        }

        .voice-circle.listening {
          animation: pulse-listening 2s ease-in-out infinite;
          background: linear-gradient(135deg, #f0a47a 0%, #e8956c 100%);
        }

        .voice-circle.speaking {
          animation: pulse-speaking 1.5s ease-in-out infinite;
          background: linear-gradient(135deg, #f4a460 0%, #e8956c 100%);
        }

        .voice-circle.processing {
          animation: pulse-processing 1s ease-in-out infinite;
          background: linear-gradient(135deg, #ec9f6e 0%, #e8956c 100%);
        }

        .voice-circle.delegating {
          animation: pulse-delegating 0.8s ease-in-out infinite;
          background: linear-gradient(135deg, #d97342 0%, #c56535 100%);
        }

        .waveform-icon {
          width: 80px;
          height: 80px;
          display: flex;
          align-items: center;
          justify-content: center;
          gap: 6px;
        }

        .waveform-bar {
          width: 8px;
          background: #fffef9;
          border-radius: 4px;
          transition: all 0.3s ease;
        }

        .waveform-bar:nth-child(1) {
          height: 30px;
        }
        .waveform-bar:nth-child(2) {
          height: 50px;
        }
        .waveform-bar:nth-child(3) {
          height: 40px;
        }
        .waveform-bar:nth-child(4) {
          height: 60px;
        }
        .waveform-bar:nth-child(5) {
          height: 35px;
        }

        .voice-circle:hover .waveform-bar {
          background: #ffffff;
        }

        .voice-circle.listening .waveform-bar,
        .voice-circle.speaking .waveform-bar,
        .voice-circle.processing .waveform-bar,
        .voice-circle.delegating .waveform-bar {
          animation: waveform-pulse 1.2s ease-in-out infinite;
        }

        .voice-circle.listening .waveform-bar:nth-child(1) {
          animation-delay: 0s;
        }
        .voice-circle.listening .waveform-bar:nth-child(2) {
          animation-delay: 0.1s;
        }
        .voice-circle.listening .waveform-bar:nth-child(3) {
          animation-delay: 0.2s;
        }
        .voice-circle.listening .waveform-bar:nth-child(4) {
          animation-delay: 0.3s;
        }
        .voice-circle.listening .waveform-bar:nth-child(5) {
          animation-delay: 0.4s;
        }

        .voice-circle.speaking .waveform-bar:nth-child(1) {
          animation-delay: 0.4s;
        }
        .voice-circle.speaking .waveform-bar:nth-child(2) {
          animation-delay: 0.3s;
        }
        .voice-circle.speaking .waveform-bar:nth-child(3) {
          animation-delay: 0.2s;
        }
        .voice-circle.speaking .waveform-bar:nth-child(4) {
          animation-delay: 0.1s;
        }
        .voice-circle.speaking .waveform-bar:nth-child(5) {
          animation-delay: 0s;
        }

        @keyframes waveform-pulse {
          0%,
          100% {
            transform: scaleY(0.6);
            opacity: 0.7;
          }
          50% {
            transform: scaleY(1.2);
            opacity: 1;
          }
        }

        @keyframes pulse-listening {
          0%,
          100% {
            box-shadow:
              0 10px 40px rgba(217, 115, 66, 0.25),
              0 0 30px rgba(232, 149, 108, 0.2),
              inset 0 2px 10px rgba(255, 255, 255, 0.1);
            transform: scale(1);
          }
          50% {
            box-shadow:
              0 10px 60px rgba(217, 115, 66, 0.4),
              0 0 60px rgba(232, 149, 108, 0.35),
              inset 0 2px 15px rgba(255, 255, 255, 0.15);
            transform: scale(1.05);
          }
        }

        @keyframes pulse-speaking {
          0%,
          100% {
            box-shadow:
              0 10px 40px rgba(244, 164, 96, 0.3),
              0 0 35px rgba(244, 164, 96, 0.3),
              inset 0 2px 10px rgba(255, 255, 255, 0.1);
            transform: scale(1);
          }
          50% {
            box-shadow:
              0 10px 60px rgba(244, 164, 96, 0.45),
              0 0 70px rgba(244, 164, 96, 0.45),
              inset 0 2px 15px rgba(255, 255, 255, 0.15);
            transform: scale(1.08);
          }
        }

        @keyframes pulse-processing {
          0%,
          100% {
            box-shadow:
              0 10px 40px rgba(217, 115, 66, 0.25),
              0 0 25px rgba(236, 159, 110, 0.25),
              inset 0 2px 10px rgba(255, 255, 255, 0.1);
            transform: scale(1);
          }
          50% {
            box-shadow:
              0 10px 60px rgba(217, 115, 66, 0.35),
              0 0 50px rgba(236, 159, 110, 0.35),
              inset 0 2px 15px rgba(255, 255, 255, 0.15);
            transform: scale(1.03);
          }
        }

        @keyframes pulse-delegating {
          0%,
          100% {
            box-shadow:
              0 10px 40px rgba(197, 101, 53, 0.3),
              0 0 30px rgba(217, 115, 66, 0.25),
              inset 0 2px 10px rgba(255, 255, 255, 0.1);
            transform: scale(1);
          }
          50% {
            box-shadow:
              0 10px 60px rgba(197, 101, 53, 0.45),
              0 0 60px rgba(217, 115, 66, 0.4),
              inset 0 2px 15px rgba(255, 255, 255, 0.15);
            transform: scale(1.06);
          }
        }

        .audio-wave {
          position: absolute;
          top: 50%;
          left: 50%;
          transform: translate(-50%, -50%);
          width: 240px;
          height: 240px;
          border-radius: 50%;
          border: 2px solid rgba(232, 149, 108, 0.25);
          opacity: 0;
          animation: wave-expand 2s ease-out infinite;
          pointer-events: none;
        }

        .audio-wave:nth-child(2) {
          animation-delay: 0.5s;
        }
        .audio-wave:nth-child(3) {
          animation-delay: 1s;
        }

        .voice-circle.listening ~ .audio-wave,
        .voice-circle.speaking ~ .audio-wave {
          opacity: 1;
        }

        @keyframes wave-expand {
          0% {
            width: 200px;
            height: 200px;
            opacity: 0.8;
          }
          100% {
            width: 300px;
            height: 300px;
            opacity: 0;
          }
        }

        .status-text {
          font-size: 18px;
          color: #d97342;
          text-align: center;
          min-height: 30px;
          transition: all 0.3s ease;
          font-weight: 500;
          letter-spacing: 0.5px;
        }

        .status-text.active {
          color: #c56535;
        }

        .transcript-container {
          width: 100%;
          flex: 1;
          overflow-y: auto;
          background: rgba(232, 149, 108, 0.05);
          border: 1px solid rgba(232, 149, 108, 0.2);
          border-radius: 12px;
          padding: 24px;
        }

        .transcript-item {
          margin-bottom: 12px;
          padding: 8px 12px;
          border-radius: 8px;
          font-size: 14px;
          line-height: 1.5;
        }

        .transcript-item.user {
          background: rgba(232, 149, 108, 0.15);
          margin-left: 20px;
        }

        .transcript-item.assistant {
          background: rgba(217, 115, 66, 0.15);
          margin-right: 20px;
        }

        .transcript-item .role {
          font-size: 11px;
          text-transform: uppercase;
          letter-spacing: 1px;
          color: #8b7355;
          margin-bottom: 4px;
        }

        .controls {
          display: flex;
          gap: 16px;
          margin-top: 20px;
        }

        .control-button {
          padding: 12px 32px;
          background: rgba(232, 149, 108, 0.1);
          border: 2px solid rgba(232, 149, 108, 0.3);
          border-radius: 24px;
          color: #d97342;
          font-size: 16px;
          cursor: pointer;
          transition: all 0.3s ease;
          font-weight: 500;
          letter-spacing: 0.5px;
        }

        .control-button:hover {
          background: rgba(244, 164, 96, 0.2);
          border-color: rgba(244, 164, 96, 0.6);
          transform: translateY(-2px);
          box-shadow: 0 5px 20px rgba(217, 115, 66, 0.25);
        }

        .control-button.danger {
          border-color: rgba(197, 101, 53, 0.4);
          color: #c56535;
        }

        .control-button.danger:hover {
          background: rgba(197, 101, 53, 0.2);
          border-color: rgba(197, 101, 53, 0.6);
        }

        .connection-status {
          position: fixed;
          top: 20px;
          right: 20px;
          padding: 8px 16px;
          border-radius: 20px;
          font-size: 12px;
          background: rgba(255, 254, 249, 0.9);
          border: 1px solid rgba(232, 149, 108, 0.3);
          font-weight: 500;
        }

        .connection-status.connected {
          border-color: rgba(217, 115, 66, 0.5);
          color: #d97342;
        }

        .connection-status.disconnected {
          border-color: rgba(197, 101, 53, 0.5);
          color: #c56535;
        }
      </style>
    </head>
    <body>
      <div id="connectionStatus" class="connection-status disconnected">Disconnected</div>

      <div class="container">
        <div class="transcript-panel">
          <div id="transcriptContainer" class="transcript-container">
            <div style="color: #8b7355; font-size: 13px; text-align: center">
              Interview transcript will appear here...
            </div>
          </div>
        </div>

        <div class="voice-panel">
          <div class="circle-container">
            <button id="voiceCircle" class="voice-circle">
              <div class="waveform-icon">
                <div class="waveform-bar"></div>
                <div class="waveform-bar"></div>
                <div class="waveform-bar"></div>
                <div class="waveform-bar"></div>
                <div class="waveform-bar"></div>
              </div>
            </button>
            <div class="audio-wave"></div>
            <div class="audio-wave"></div>
            <div class="audio-wave"></div>
          </div>

          <div id="status" class="status-text">Click to start</div>

          <div class="controls">
            <button id="endButton" class="control-button danger" style="display: none">End Interview</button>
          </div>
        </div>
      </div>

      <script>
        // State
        let ws = null;
        let mediaStream = null;
        let audioContext = null;
        let workletNode = null;
        let isRecording = false;
        let isConnected = false;

        // Audio playback
        let playbackAudioContext = null;
        let nextPlayTime = 0;
        let scheduledSources = []; // Track audio sources for interruption handling
        let isSpeaking = false; // Track if assistant is currently speaking

        // DOM elements
        const voiceCircle = document.getElementById("voiceCircle");
        const status = document.getElementById("status");
        const endButton = document.getElementById("endButton");
        const connectionStatus = document.getElementById("connectionStatus");
        const transcriptContainer = document.getElementById("transcriptContainer");

        // Connect to WebSocket
        async function connect() {
          const protocol = window.location.protocol === "https:" ? "wss:" : "ws:";
          const wsUrl = `${protocol}//${window.location.host}/agents/interviewer/voice`;

          try {
            updateStatus("Connecting...");
            console.log("🔌 Connecting to:", wsUrl);

            ws = new WebSocket(wsUrl);

            ws.onopen = () => {
              console.log("✅ WebSocket connected");
              isConnected = true;
              connectionStatus.textContent = "Connected";
              connectionStatus.className = "connection-status connected";
              updateStatus("Connected! Click to start talking");

              // Send initial config
              ws.send(JSON.stringify({ type: "config" }));
            };

            ws.onmessage = async (event) => {
              try {
                const data = JSON.parse(event.data);
                await handleServerMessage(data);
              } catch (err) {
                console.error("❌ Error handling message:", err);
              }
            };

            ws.onerror = (error) => {
              console.error("❌ WebSocket error:", error);
              updateStatus("Connection error");
            };

            ws.onclose = (event) => {
              console.log("📪 WebSocket closed:", event.code, event.reason);
              isConnected = false;
              connectionStatus.textContent = "Disconnected";
              connectionStatus.className = "connection-status disconnected";
              if (isRecording) {
                stopRecording();
              }
              updateStatus("Disconnected. Refresh to reconnect.");
            };
          } catch (error) {
            console.error("❌ Connection error:", error);
            updateStatus("Failed to connect");
          }
        }

        // Handle messages from server
        async function handleServerMessage(data) {
          const eventType = data.type;

          switch (eventType) {
            case "connected":
              console.log("✅ Server confirmed:", data.message);
              if (data.config) {
                console.log("   Config:", data.config);
              }
              break;

            case "audio":
              // Play audio from realtime API
              isSpeaking = true;
              await playAudioChunk(data.audio);
              setCircleState("speaking");
              updateStatus("Speaking...");
              break;

            case "transcript":
              // Complete transcript (user or assistant)
              if (data.role === "user") {
                console.log("🗣️ User:", data.text);
                addTranscript("user", data.text);
              } else {
                console.log("🤖 Assistant:", data.text);
                addTranscript("assistant", data.text);
              }
              break;

            case "transcript_delta":
              // Streaming delta (optional - for real-time display if needed)
              // Currently we just wait for the complete transcript
              break;

            case "speech_started":
              // Always clear audio on speech start for responsive interruption
              console.log("🎤 Speech started - clearing any playing audio");
              clearAudioQueue();
              isSpeaking = false;
              setCircleState("listening");
              updateStatus("Listening...");
              break;

            case "speech_stopped":
              console.log("⏸️ Speech stopped");
              setCircleState("processing");
              updateStatus("Processing...");
              break;

            case "response_complete":
              console.log("✅ Response complete");
              isSpeaking = false;
              setCircleState("listening");
              updateStatus("Listening...");
              break;

            case "error":
              // Ignore "no active response" errors - these are normal when cancelling
              if (data.error && !data.error.includes("no active response")) {
                console.error("❌ Server error:", data.error);
                updateStatus("Error: " + data.error);
              }
              break;
          }
        }

        // Start recording
        async function startRecording() {
          if (!isConnected) {
            // Try to reconnect if not connected
            updateStatus("Reconnecting...");
            await connect();
            // Wait a bit for connection to establish
            await new Promise((resolve) => setTimeout(resolve, 500));
            if (!isConnected) {
              updateStatus("Failed to connect. Please try again.");
              return;
            }
          }

          if (isRecording) {
            stopRecording();
            return;
          }

          try {
            mediaStream = await navigator.mediaDevices.getUserMedia({
              audio: {
                channelCount: 1,
                sampleRate: 24000,
                echoCancellation: true,
                noiseSuppression: true,
                autoGainControl: true,
              },
            });

            audioContext = new (window.AudioContext || window.webkitAudioContext)({
              sampleRate: 24000,
            });

            // Create AudioWorklet for processing (replaces deprecated ScriptProcessorNode)
            const workletCode = `
              class PCMProcessor extends AudioWorkletProcessor {
                constructor() {
                  super();
                  this.bufferSize = 4096;
                  this.buffer = new Float32Array(this.bufferSize);
                  this.bufferIndex = 0;
                }

                process(inputs, outputs, parameters) {
                  const input = inputs[0];
                  if (input && input[0]) {
                    const inputData = input[0];
                    for (let i = 0; i < inputData.length; i++) {
                      this.buffer[this.bufferIndex++] = inputData[i];
                      if (this.bufferIndex >= this.bufferSize) {
                        const pcm16 = new Int16Array(this.bufferSize);
                        for (let j = 0; j < this.bufferSize; j++) {
                          const s = Math.max(-1, Math.min(1, this.buffer[j]));
                          pcm16[j] = s < 0 ? s * 0x8000 : s * 0x7FFF;
                        }
                        this.port.postMessage({ pcm16: pcm16.buffer, maxLevel: Math.max(...this.buffer.map(Math.abs)) }, [pcm16.buffer]);
                        this.buffer = new Float32Array(this.bufferSize);
                        this.bufferIndex = 0;
                      }
                    }
                  }
                  return true;
                }
              }
              registerProcessor('pcm-processor', PCMProcessor);
            `;

            const blob = new Blob([workletCode], { type: "application/javascript" });
            const workletUrl = URL.createObjectURL(blob);

            try {
              await audioContext.audioWorklet.addModule(workletUrl);
            } finally {
              URL.revokeObjectURL(workletUrl);
            }

            const source = audioContext.createMediaStreamSource(mediaStream);
            workletNode = new AudioWorkletNode(audioContext, "pcm-processor");

            workletNode.port.onmessage = (e) => {
              if (!isRecording || !ws || ws.readyState !== WebSocket.OPEN) return;

              const { pcm16, maxLevel } = e.data;

              // Client-side interruption detection
              if (isSpeaking && scheduledSources.length > 0 && maxLevel > 0.02) {
                console.log("⚡ Client-side interruption detected (level:", maxLevel.toFixed(3), ")");
                clearAudioQueue();
                isSpeaking = false;
                setCircleState("listening");
                updateStatus("Listening...");
              }

              const audioBase64 = btoa(String.fromCharCode(...new Uint8Array(pcm16)));
              ws.send(JSON.stringify({ type: "audio", audio: audioBase64 }));
            };

            source.connect(workletNode);
            workletNode.connect(audioContext.destination);

            isRecording = true;
            setCircleState("listening");
            updateStatus("Listening...");
            endButton.style.display = "block";

            // Clear old transcripts
            transcriptContainer.innerHTML =
              '<div style="color: #8b7355; font-size: 13px; text-align: center">Conversation started...</div>';

            console.log("🎤 Recording started");
          } catch (error) {
            console.error("❌ Error starting recording:", error);
            updateStatus("Microphone access denied");
          }
        }

        // Stop recording
        function stopRecording() {
          if (!isRecording) return;

          console.log("⏹️ Stopping recording");
          isRecording = false;

          if (workletNode) {
            workletNode.disconnect();
            workletNode = null;
          }

          if (mediaStream) {
            mediaStream.getTracks().forEach((track) => track.stop());
            mediaStream = null;
          }

          if (audioContext) {
            audioContext.close();
            audioContext = null;
          }

          setCircleState("");
          updateStatus("Stopped");
          endButton.style.display = "none";

          if (ws && ws.readyState === WebSocket.OPEN) {
            ws.send(JSON.stringify({ type: "close" }));
          }
        }

        // Clear audio queue - stops all scheduled audio immediately (for interruption handling)
        function clearAudioQueue() {
          if (scheduledSources.length === 0) return; // Nothing to clear

          console.log("🔇 Clearing audio queue (" + scheduledSources.length + " sources)");

          // Stop all scheduled audio sources
          scheduledSources.forEach((source) => {
            try {
              source.stop();
            } catch (e) {
              // Source may have already finished playing
            }
          });
          scheduledSources = [];

          // Reset playback timing to allow immediate new playback
          if (playbackAudioContext) {
            nextPlayTime = playbackAudioContext.currentTime;
          }
        }

        // Play audio chunk
        async function playAudioChunk(base64Audio) {
          try {
            if (!playbackAudioContext) {
              playbackAudioContext = new (window.AudioContext || window.webkitAudioContext)({
                sampleRate: 24000,
              });
              nextPlayTime = playbackAudioContext.currentTime;
            }

            const audioBytes = Uint8Array.from(atob(base64Audio), (c) => c.charCodeAt(0));
            const pcm16 = new Int16Array(audioBytes.buffer);
            const float32 = new Float32Array(pcm16.length);

            for (let i = 0; i < pcm16.length; i++) {
              float32[i] = pcm16[i] / 32768.0;
            }

            const audioBuffer = playbackAudioContext.createBuffer(1, float32.length, 24000);
            audioBuffer.getChannelData(0).set(float32);

            const source = playbackAudioContext.createBufferSource();
            source.buffer = audioBuffer;
            source.connect(playbackAudioContext.destination);

            // Track this source for potential interruption
            source.onended = () => {
              scheduledSources = scheduledSources.filter((s) => s !== source);
            };
            scheduledSources.push(source);

            if (nextPlayTime < playbackAudioContext.currentTime) {
              nextPlayTime = playbackAudioContext.currentTime;
            }
            source.start(nextPlayTime);
            nextPlayTime += audioBuffer.duration;
          } catch (error) {
            console.error("❌ Error playing audio:", error);
          }
        }

        // Set circle visual state
        function setCircleState(state) {
          voiceCircle.classList.remove("listening", "speaking", "processing", "delegating");
          if (state) {
            voiceCircle.classList.add(state);
          }
        }

        // Update status text
        function updateStatus(message) {
          status.textContent = message;
          status.className = "status-text" + (message.includes("...") ? " active" : "");
        }

        // Add transcript to UI
        function addTranscript(role, text) {
          // Remove placeholder if present
          const placeholder = transcriptContainer.querySelector('div[style*="color: #8b7355"]');
          if (placeholder) {
            placeholder.remove();
          }

          const item = document.createElement("div");
          item.className = "transcript-item " + role;
          item.innerHTML = `
            <div class="role">${role === "user" ? "Candidate" : "Interviewer"}</div>
            <div>${text}</div>
          `;
          transcriptContainer.appendChild(item);
          transcriptContainer.scrollTop = transcriptContainer.scrollHeight;
        }

        // End session
        function endSession() {
          stopRecording();
          clearAudioQueue(); // Stop any playing audio
          scheduledSources = []; // Reset tracking array
          isSpeaking = false;
          if (ws) {
            ws.close();
            ws = null;
          }
          if (playbackAudioContext) {
            playbackAudioContext.close();
            playbackAudioContext = null;
          }
          updateStatus("Interview ended. Click to start again.");
        }

        // Event listeners
        voiceCircle.addEventListener("click", startRecording);
        endButton.addEventListener("click", endSession);

        // Connect on load
        window.addEventListener("load", connect);

        // Cleanup on unload
        window.addEventListener("beforeunload", () => {
          stopRecording();
          if (ws) ws.close();
        });
      </script>
    </body>
  </html>
  ```

  ```bash images/main/Dockerfile theme={null}
  FROM ghcr.io/build-trust/autonomy-python
  COPY . .
  ENTRYPOINT ["python", "main.py"]
  ```

  ```yaml autonomy.yaml theme={null}
  name: interview
  pods:
    - name: main-pod
      public: true
      containers:
        - name: main
          image: main
  ```
</CodeGroup>

### Using the Interviewer

Connect via WebSocket for voice:

```
ws://${CLUSTER}-${ZONE}.cluster.autonomy.computer/agents/interviewer/voice
```

Or use HTTP for text:

```bash curl theme={null}
curl --request POST \
  --header "Content-Type: application/json" \
  --data '{"message": "Hi, I am ready to start the interview."}' \
  "https://${CLUSTER}-${ZONE}.cluster.autonomy.computer/agents/interviewer"
```

The interviewer will:

1. Greet the candidate and explain the interview format
2. Ask about their background and experience
3. Pose technical questions adapted to their level
4. Explore behavioral scenarios
5. Answer questions about the role
6. Provide feedback on their performance
