
What You’ll Build
- Users speak questions and hear answers based on information that is in your documentation.
- Your docs are indexed for semantic search using vector embeddings.
- A fast voice agent handles immediate interaction while a primary agent retrieves accurate information.
- Documentation is periodically reloaded to stay current.
- Streaming text and voice interfaces.
Before you begin
Before starting, ensure you have:- Sign up and install the
autonomycommand. - Documentation hosted somewhere with a URL (Mintlify, GitBook, GitHub, etc.)
- Docker running on your machine.
How It Works
When a user speaks to the agent:- A voice agent receives audio over a websocket and transcribes it. It speaks a filler phrase (“Good question.”) and then delegates the question to a primary agent.
- The primary agent searches a knowledge base for relevant documentation and comes up with a concise answer using the retrieved docs, which the voice agent speaks verbatim.
Application Structure
File Structure:
Step 1: Create the Knowledge Base
First, set up a knowledge base that will index your documentation:images/main/main.py
| Option | Description |
|---|---|
model | Embedding model for vector search. embed-english-v3 works well for English docs. |
max_results | Number of relevant chunks to retrieve per query. |
max_distance | Similarity threshold (0.0 = exact match, 1.0 = very different). Lower values are stricter. |
chunker | Strategy for splitting documents. Larger chunks preserve context; smaller chunks improve precision. |
Step 2: Load Your Documentation
Option A: From a URL Index (Mintlify, GitBook)
Many documentation platforms provide anllms.txt or sitemap file listing all pages. Here’s how to load docs from URLs:
images/main/main.py
Option B: From Text Content Directly
If you have documentation content as text:images/main/main.py
Option C: From Various File Formats
The Knowledge class supports many formats via text extraction:images/main/main.py
Step 3: Create the Agents
Now create the agent with voice capabilities and the knowledge tool.Primary Agent Instructions
The primary agent handles complex questions using the knowledge base:images/main/main.py
Voice Agent Instructions
The voice agent handles immediate interaction and delegates to the primary agent:images/main/main.py
Starting the Agent
images/main/main.py
Voice Configuration Options
| Option | Description | Default |
|---|---|---|
voice | TTS voice: alloy, echo, fable, onyx, nova, shimmer | echo |
realtime_model | Model for voice agent | gpt-4o-realtime-preview |
vad_threshold | Voice detection sensitivity (0.0-1.0). Higher = less sensitive. | 0.5 |
vad_silence_duration_ms | Silence before end of speech detection | 500 |
Step 4: Add Auto-Refresh
Keep your knowledge base current by periodically reloading documentation:images/main/main.py
Step 5: Create the Deployment Configuration
Create theautonomy.yaml file:
autonomy.yaml
images/main/Dockerfile
Step 6: Build the Voice UI
Create anindex.html file in your container image directory. When present, Autonomy automatically serves it at the root URL. See User Interfaces for more options.
Key Components
The voice UI handles several important tasks: 1. WebSocket Connection for Voice (Multi-Tenant) Connect to the voice agent via WebSocket. Notice howscope and conversation are set in the URL - this enables multi-tenant isolation:
images/main/index.html
scope- Set tovisitorId(stored in localStorage), isolates memory per user. Each visitor gets their own conversation history.conversation- Set to a unique ID per session, isolates memory per conversation. A user can have multiple separate conversations.
images/main/index.html
images/main/index.html
images/main/index.html
images/main/index.html
index.html is included in the Complete Example below.
Step 7: Deploy
Deploy to Autonomy Computer:Using Your Agent
Voice Interface
Once deployed, open your zone URL in a browser to access the voice interface:Text Chat Interface
You can also build a text chat interface that uses the streaming HTTP API with a typewriter effect. The Autonomy website uses this approach:/dev/null/chat.js
- Streaming API - Use
?stream=trueto get Server-Sent Events as the agent responds - Typewriter effect - Queue incoming text and display it gradually for a natural feel
- Multi-tenant - Pass
scopeandconversationfor user isolation (same as voice)
HTTP API
You can also interact via HTTP for text-based queries:Manual Refresh
Trigger a knowledge base refresh:Complete Example
Here are all the files for the complete example:Alternative: Using Filesystem Instead of Knowledge
If you prefer not to use vector embeddings, you can use Filesystem Tools as an alternative approach. This is simpler to set up but uses keyword matching instead of semantic search.Useful Filesystem Tools for Documentation
| Tool | Description |
|---|---|
search_in_files | Search for regex patterns across files. Great for finding specific terms or code snippets. |
find_files | Find files matching glob patterns like **/*.md or docs/*.json. Useful for discovering what documentation exists. |
list_directory | List files and directories. Helps the agent navigate the documentation structure. |
read_file | Read file contents. Use after finding relevant files. |
Filesystem Example
images/main/main.py
visibility="agent", all documentation files are shared across all conversations, making them accessible to every user.
Filesystem access
Learn more about filesystem tools and visibility options.
Platform-Specific Tips
Mintlify
Mintlify provides anllms.txt file at https://your-docs.mintlify.dev/llms.txt containing links to all documentation pages in markdown format.
GitBook
GitBook exports can be accessed via their API or by scraping the sitemap athttps://your-space.gitbook.io/sitemap.xml.
ReadTheDocs
ReadTheDocs provides downloadable formats. Use the HTML or PDF export URLs.GitHub Pages / Docusaurus
For static site generators, parse the sitemap or maintain a list of documentation URLs.Troubleshooting
Knowledge base returns no results
Knowledge base returns no results
- Check that
max_distanceisn’t too strict (try 0.5 or higher). - Verify documents loaded successfully by checking the
/refreshendpoint response. - Ensure the embedding model matches your content language.
Voice not working in browser
Voice not working in browser
- Ensure your browser has microphone permissions.
- Use Chrome or Edge for best WebSocket and Web Audio API support.
- Check the browser console for WebSocket connection errors.
Agent gives inaccurate answers
Agent gives inaccurate answers
- Adjust the instructions to emphasize using the search tool first.
- Increase
max_resultsto provide more context. - Lower
max_distanceto retrieve more relevant chunks.
Responses are too slow
Responses are too slow
- Reduce
max_tokensin the model configuration. - Use a faster model for the primary agent.
- Ensure your knowledge base isn’t too large.

