A guide for coding agents to use workers and messaging
Keywords: workers, messaging, distributed computing, send_and_receive, handle_message, Zone.nodesKey Terms: See the definitions section of the main guide.This guide shows how to create workers that process messages asynchronously, send messages between workers, and build distributed applications that scale across multiple pods.
Understanding Workers
Workers are message-processing components that follow the actor model. Each worker:- Has a unique name within its node
- Receives messages asynchronously
- Processes one message at a time
- Can reply to the sender
- Can run on the local node or on remote nodes in other containers and pods
- Asynchronous processing: Handle long-running tasks without blocking
- Parallelism: Process multiple requests concurrently across workers
- Distribution: Spread workload across multiple machines
- Isolation: Each worker maintains its own state independently
Nodes and Containers
Important: Not all containers in a zone are Autonomy Nodes.- Autonomy Node containers: Containers that run
Node.start()- these can run workers and exchange messages - Non-Node containers: Containers that don’t run Autonomy Nodes, such as MCP servers, databases, or other services
- The
mainandrunnercontainers are Autonomy Nodes (they callNode.start()) - You could also have containers running MCP servers or other services that don’t call
Node.start() - Only containers that are Autonomy Nodes can run workers and participate in messaging
⚠️ Message Serialization
CRITICAL: Worker messages MUST be strings. For structured data, use JSON encoding. ✅ CORRECT:json.dumps() to send and json.loads() to receive structured data.
Create a Basic Worker
Step 1: Define a Worker Class
Create a worker class with ahandle_message method:
- Worker Class:
Echoerdefinesasync def handle_message(self, context, message) - Start Worker:
await node.start_worker("echoer", Echoer())creates a worker named “echoer” - Send Message:
await node.send_and_receive("echoer", "hello")sends “hello” and waits for reply - Reply:
await context.reply(message)sends the response back to the caller
Step 2: Deploy and Test
Create the complete file structure:Worker Message Patterns
This example demonstrates send-and-forget, timeouts, stateful workers, and cleanup:- Send-and-forget:
send_message()for logging, notifications (no reply) - Timeouts: Always use
timeout=1000(milliseconds) insend_and_receive() - State: Workers maintain state across messages (Counter keeps count)
- Cleanup: Call
stop_worker()when done
Error Handling and Cleanup
Production-ready pattern for distributed worker management:asyncio.wait_for()prevents indefinite waitingtry/excepthandles individual failures gracefullytry/finallyensures cleanup always happens- Continue processing other workers on failure
- Return partial results rather than failing completely
Distributed Workers Across Multiple Machines
Workers can run on different machines in your zone. This enables true parallel processing at scale.Step 1: Configure Multiple Pods
Create anautonomy.yaml with multiple pods:
- main-pod: The primary pod that coordinates work
- runner-pod: Worker pods that process tasks
- clones: 5: Creates 5 separate machines running the runner container
- Each clone is a separate machine with its own Node
- Both
mainandrunnercontainers are Autonomy Nodes (they callNode.start()) - You could also add non-Node containers in a Zone (like MCP servers) that don’t call
Node.start()
IMPORTANT: If your distributed workers need MCP tools, each pod must have its own MCP server container since agents and workers access MCP servers over localhost. See the tools guide section on multi-pod deployments for complete examples.
Step 2: Discover Nodes in a Zone
UseZone.nodes() to find nodes running in other containers:
- Zone.nodes(node, filter=“runner”): Returns list of Node objects from pods containing “runner” in their name
- Each Node object represents a remote node running in a different pod
- You can interact with remote nodes the same way as the local node
Step 3: Start Workers on Remote Nodes
Start workers on remote nodes just like local workers:- Use
runner.start_worker()instead ofnode.start_worker()to start on remote node - Use
runner.send_and_receive()to send messages to workers on that remote node - Each runner node operates independently
Step 4: Distribute Work Across Nodes
Process work in parallel across multiple machines:- Round-robin:
runners[i % len(runners)]distributes work evenly - Parallel execution:
asyncio.gather(*futures)runs all workers simultaneously - Auto-cleanup: Each worker is stopped after processing
- Work runs on 5 different machines in parallel
Complete Example: Distributed Code Analyzer
Analyze GitHub repositories in parallel across multiple runner pods. File Structure:images/main/Dockerfile and images/runner/Dockerfile):
List Workers on Nodes
Check which workers are running:Best Practices
✅ Do:- Always use JSON: Use
json.dumps()to send,json.loads()to receive structured data - Use timeouts: Specify timeouts in
send_and_receive()to prevent hanging (10-30s for cold, 1-5s for warm) - Use try/finally: Always clean up workers, even when errors occur
- Handle errors: Use
asyncio.wait_for()and try/except for graceful degradation - Unique names: Use
secrets.token_hex(3)for worker names - Test progressively: Start simple (nodes → workers → messages) before full distribution
- Distribute work: Use multiple runners with
asyncio.gather()for parallelism
- Send dicts/objects: Never send
{"data": 123}- usejson.dumps({"data": 123})instead - Skip cleanup: Always stop workers in
finallyblock to prevent resource leaks - Omit error handling: Always use timeouts and exception handling on async operations
- Test immediately: Wait 2-3 minutes after deployment before testing
- Use same name twice: Each worker needs a unique name on its node
- Assume order: Workers run concurrently, not sequentially
Key Concepts Summary
Workers run on Nodes:- Each pod has one or more containers
- Containers that call
Node.start()are Autonomy Nodes - Not all containers are Nodes (e.g., MCP servers, databases)
- Each Autonomy Node can run multiple workers
- Workers have unique names within their Node
- Caller sends message to worker
- Worker processes message asynchronously
- Worker optionally replies
- Caller receives reply
- Use
clonesinautonomy.yamlto create multiple pods - Use
Zone.nodes()to discover nodes in the zone - Start workers on remote nodes with
remote_node.start_worker() - Work executes in parallel across all nodes

