Timeout Configuration

Autonomy applications have multiple timeout layers that work together to ensure reliable execution. Understanding how these layers interact is essential for building robust agents, especially for long-running tasks like research or batch processing.

Understanding Timeout Layers

When a request flows through an Autonomy application, it passes through several timeout boundaries:

HTTP API → Agent Execution → Model Calls → [Throttle Queue] → Gateway → LLM Provider

Each layer has its own timeout configuration. The outermost timeout (HTTP API) acts as the ultimate limit—if inner operations exceed it, the entire request fails.

The Multi-Iteration Challenge

Agents don’t make single requests—they iterate through a loop of thinking, acting, and gathering responses. A typical agent conversation involves multiple model calls:

User Message
    ↓
┌─────────────────────────────────────────────┐
│            Agent State Machine              │
│  ┌───────────────────────────────────────┐  │
│  │ Iteration 1: Model call (~5-30s)      │  │
│  ├───────────────────────────────────────┤  │
│  │ Iteration 2: Tool + Model call (~30s) │  │
│  ├───────────────────────────────────────┤  │
│  │ Iteration 3: Model call (~5-30s)      │  │
│  ├───────────────────────────────────────┤  │
│  │ ... more iterations ...               │  │
│  └───────────────────────────────────────┘  │
└─────────────────────────────────────────────┘
    ↓
Response to User

Time compounds across iterations. A 10-iteration agent with 30-second iterations needs 300 seconds total—but the default HTTP timeout is only 180 seconds.

Quick Reference

Use Case	HTTP Timeout	`max_execution_time`	Throttle	Notes
Simple Chat	60s	60s	No	Quick Q&A, 1-3 iterations
Tool-Augmented Chat	120s	120s	No	Tool calls, 3-5 iterations
Research Agent	660s+	600s	Yes	Deep work, many iterations
Batch Processing	180s	120s	Yes	Per-item timeout
Voice Interface	30s	30s	No	Low latency critical
Subagent Workflows	660s	600s	Yes	Parent + child time

Layer 1: HTTP API Timeout

The HTTP layer is the outermost timeout boundary. Configure it when making requests to the built-in agent endpoints:

curl

# Default: 180 seconds
curl --request POST \
  --header "Content-Type: application/json" \
  --data '{"message": "Research this topic"}' \
  "https://${CLUSTER}-${ZONE}.cluster.autonomy.computer/agents/researcher?timeout=600"

For custom FastAPI endpoints, handle timeouts explicitly:

images/main/main.py

from autonomy import Agent, HttpServer, Model, Node, NodeDep
from fastapi import FastAPI

app = FastAPI()

@app.post("/research")
async def research(request: dict, node: NodeDep):
    agent = await Agent.start(
        node=node,
        name="researcher",
        instructions="You are a thorough researcher",
        model=Model("claude-sonnet-4-v1"),
        max_execution_time=600.0,  # 10 minutes
    )
    
    # For long tasks, use streaming to keep the connection alive
    response = await agent.send(
        request.get("query", ""),
        timeout=660.0  # HTTP timeout > max_execution_time
    )
    
    return {"result": response[-1].content.text}

Node.start(http_server=HttpServer(app=app))

The HTTP timeout must be greater than max_execution_time plus overhead for startup and teardown (typically 60 seconds buffer).

Layer 2: Agent Execution Limits

Control how long an agent can run and how many iterations it can perform:

images/main/main.py

from autonomy import Agent, Model, Node


async def main(node):
    await Agent.start(
        node=node,
        name="researcher",
        instructions="You are a research assistant",
        model=Model("claude-sonnet-4-v1"),
        max_execution_time=600.0,  # Total execution limit (seconds)
        max_iterations=100,         # Maximum reasoning loops
    )


Node.start(main)

Configuration Options

Parameter	Default	Description
`max_execution_time`	600s (10 min)	Total time allowed for agent execution
`max_iterations`	1000	Maximum number of think-act loops

Estimating Execution Time

Use this formula to estimate the time budget:

max_execution_time >= expected_iterations × average_iteration_time + buffer

Task Type	Expected Iterations	Avg Time/Iteration	Recommended `max_execution_time`
Simple Q&A	1-2	5s	30s
Tool usage	3-5	20s	120s
Research	10-20	30s	600s
Complex analysis	20-50	30s	1800s

Layer 3: Agent Lifecycle Timeouts

Separate from execution, agent lifecycle operations have their own timeouts:

images/main/main.py

from autonomy import Agent, Model, Node


async def main(node):
    # Start with timeout (registration should be fast)
    agent = await Agent.start(
        node=node,
        name="assistant",
        instructions="You are helpful",
        model=Model("claude-sonnet-4-v1"),
        timeout=30.0,  # Startup timeout
    )
    
    # Send message with timeout
    response = await agent.send(
        "Hello",
        timeout=120.0  # Covers entire multi-iteration execution
    )
    
    # Stop with timeout
    await Agent.stop(node, agent.name, timeout=10.0)


Node.start(main)

Lifecycle Timeout Guidelines

Operation	Recommended Timeout	Rationale
`Agent.start()`	30s	Registration should be quick
`agent.send()`	`max_execution_time` + 60s	Full execution plus buffer
`Agent.stop()`	10s	Cleanup should be fast

Layer 4: Model Configuration

Each model call has its own timeout settings:

images/main/main.py

from autonomy import Agent, Model, Node


async def main(node):
    model = Model(
        "claude-sonnet-4-v1",
        request_timeout=120.0,   # Per-call timeout (default: 120s)
        connect_timeout=10.0,    # Connection establishment (default: 10s)
        stream_timeout=300.0,    # Streaming responses (default: 300s)
    )
    
    await Agent.start(
        node=node,
        name="assistant",
        instructions="You are helpful",
        model=model,
    )


Node.start(main)

Model Timeout Guidelines

Parameter	Default	When to Adjust
`request_timeout`	120s	Increase for reasoning models (o1, o3) that think longer
`connect_timeout`	10s	Increase if network latency is high
`stream_timeout`	300s	Increase for very long streaming responses

These are per-call timeouts. A 10-iteration agent makes 10+ model calls, so total time can be iterations × request_timeout.

Layer 5: Throttle Configuration

When throttle=True, requests queue when rate limits are approached. This prevents 429 errors but adds latency:

images/main/main.py

from autonomy import Agent, Model, Node


async def main(node):
    model = Model(
        "claude-sonnet-4-v1",
        throttle=True,
        throttle_requests_per_minute=60.0,       # Starting rate
        throttle_max_requests_in_progress=10,    # Concurrent limit
        throttle_max_requests_waiting_in_queue=100,
        throttle_max_seconds_to_wait_in_queue=60.0,  # Queue timeout
        throttle_max_retry_attempts=3,
        throttle_initial_seconds_between_retry_attempts=1.0,
    )
    
    await Agent.start(
        node=node,
        name="batch_processor",
        instructions="Process items efficiently",
        model=model,
    )


Node.start(main)

Throttle Timing Impact

With throttling enabled, each iteration can wait in the queue:

Iteration 1: queue wait (up to 60s) + model call (up to 120s)
Iteration 2: queue wait (up to 60s) + model call (up to 120s)
Iteration 3: queue wait (up to 60s) + model call (up to 120s)
...

Worst case for 3 iterations:

Queue waits: 3 × 60s = 180s
Model calls: 3 × 120s = 360s
Total: 540s

When using throttling, ensure your HTTP timeout accounts for queue wait time multiplied by expected iterations.

Throttle Configuration by Use Case

Use Case	`throttle_max_seconds_to_wait_in_queue`	Rationale
Interactive	30s	Fast feedback on overload
Research	60s	Balance iterations and timeout
Batch	120s+	Allow queue absorption

Layer 6: Subagent Timeouts

Subagents have their own execution time that counts against the parent’s budget:

images/main/main.py

from autonomy import Agent, Model, Node


async def main(node):
    await Agent.start(
        node=node,
        name="coordinator",
        instructions="Coordinate research across specialists",
        model=Model("claude-sonnet-4-v1"),
        max_execution_time=600.0,  # Parent has 10 minutes
        subagents={
            "researcher": {
                "instructions": "Research topics thoroughly",
                "model": Model("claude-sonnet-4-v1"),
                "max_execution_time": 120.0,  # 2 minutes per subagent
            },
            "analyst": {
                "instructions": "Analyze findings",
                "model": Model("claude-sonnet-4-v1"),
                "max_execution_time": 180.0,  # 3 minutes for analysis
            }
        }
    )


Node.start(main)

Subagent Timeout Guidelines

Subagent time counts against parent time:

parent_time_remaining = max_execution_time - time_spent - subagent_time

For parallel subagents, the slowest determines wait time:
```
parallel_wait = max(subagent1_time, subagent2_time, ...)
```

Rule of thumb:

subagent_timeout <= parent_max_execution_time / expected_num_delegations

Configuration Examples

Interactive Chat Application

Fast responses for conversational AI:

images/main/main.py

from autonomy import Agent, Model, Node


async def main(node):
    await Agent.start(
        node=node,
        name="assistant",
        instructions="You are a helpful assistant",
        model=Model("claude-sonnet-4-v1"),
        max_execution_time=60.0,
        max_iterations=10,
    )


Node.start(main)

HTTP timeout: 90 seconds

Research Agent

Deep work with many iterations:

images/main/main.py

from autonomy import Agent, FilesystemTools, Model, Node


async def main(node):
    await Agent.start(
        node=node,
        name="researcher",
        instructions="""
        You are a thorough researcher. Take your time to:
        1. Break down the problem
        2. Research each aspect
        3. Take notes in your filesystem
        4. Synthesize findings
        """,
        model=Model(
            "claude-sonnet-4-v1",
            throttle=True,
            throttle_max_seconds_to_wait_in_queue=60.0,
        ),
        max_execution_time=1800.0,  # 30 minutes
        max_iterations=100,
        tools=[FilesystemTools(visibility="conversation")],
    )


Node.start(main)

HTTP timeout: 1860 seconds (31 minutes), or use streaming

Batch Processing

High throughput with rate limiting:

images/main/main.py

from autonomy import Agent, HttpServer, Model, Node, NodeDep
from fastapi import FastAPI
from asyncio import gather, create_task

app = FastAPI()

async def process_item(node, item: str, timeout: float = 60.0):
    agent = None
    try:
        agent = await Agent.start(
            node=node,
            name=f"processor_{id(item)}",
            instructions="Process the item concisely",
            model=Model(
                "nova-micro-v1",  # Fast, cheap model for batch
                throttle=True,
                throttle_requests_per_minute=100.0,
                throttle_max_seconds_to_wait_in_queue=30.0,
            ),
            max_execution_time=30.0,  # Short per-item timeout
            max_iterations=5,
        )
        
        response = await agent.send(item, timeout=timeout)
        return {"item": item, "result": response[-1].content.text}
    except Exception as e:
        return {"item": item, "error": str(e)}
    finally:
        if agent:
            create_task(Agent.stop(node, agent.name))

@app.post("/batch")
async def batch_process(request: dict, node: NodeDep):
    items = request.get("items", [])
    results = await gather(*(process_item(node, item) for item in items))
    return {"results": results}

Node.start(http_server=HttpServer(app=app))

Voice Interface

Ultra-low latency for real-time:

images/main/main.py

from autonomy import Agent, Model, Node


async def main(node):
    await Agent.start(
        node=node,
        name="voice_assistant",
        instructions="Respond briefly and conversationally",
        model=Model(
            "nova-micro-v1",  # Fast model
            request_timeout=30.0,  # Quick timeout
        ),
        max_execution_time=30.0,
        max_iterations=3,
    )


Node.start(main)

HTTP timeout: 45 seconds

Timeout Hierarchy

For consistent behavior, configure timeouts from outermost to innermost:

┌─────────────────────────────────────────────────────────────┐
│ HTTP API timeout: max_execution_time + 60s buffer           │
│  ┌───────────────────────────────────────────────────────┐  │
│  │ agent.send() timeout: max_execution_time + 30s        │  │
│  │  ┌─────────────────────────────────────────────────┐  │  │
│  │  │ Agent max_execution_time                        │  │  │
│  │  │  ┌───────────────────────────────────────────┐  │  │  │
│  │  │  │ Per-iteration: model_timeout + queue_wait │  │  │  │
│  │  │  │  ┌─────────────────────────────────────┐  │  │  │  │
│  │  │  │  │ Subagent max_execution_time         │  │  │  │  │
│  │  │  │  └─────────────────────────────────────┘  │  │  │  │
│  │  │  └───────────────────────────────────────────┘  │  │  │
│  │  └─────────────────────────────────────────────────┘  │  │
│  └───────────────────────────────────────────────────────┘  │
└─────────────────────────────────────────────────────────────┘

Key rule: Each outer layer’s timeout must be greater than the sum of all possible inner timeouts.

Streaming for Long Tasks

For tasks that may exceed HTTP timeout limits, use streaming to keep the connection alive:

images/main/main.py

from autonomy import Agent, HttpServer, Model, Node, NodeDep
from fastapi import FastAPI
from fastapi.responses import StreamingResponse
import json

app = FastAPI()

@app.post("/research")
async def research(request: dict, node: NodeDep):
    async def generate():
        agent = await Agent.start(
            node=node,
            name="researcher",
            instructions="Research thoroughly",
            model=Model("claude-sonnet-4-v1"),
            max_execution_time=1800.0,  # 30 minutes
        )
        
        async for chunk in agent.stream(request.get("query", "")):
            if chunk.content and chunk.content.text:
                yield json.dumps({"text": chunk.content.text}) + "\n"
    
    return StreamingResponse(generate(), media_type="application/x-ndjson")

Node.start(http_server=HttpServer(app=app))

Streaming keeps the connection alive with periodic chunks, avoiding HTTP timeout issues for long-running research tasks.

Troubleshooting

”Request timed out” at HTTP layer

Symptom: Agent task fails with HTTP timeout, even though agent should have more time. Cause: HTTP timeout (default 180s) < max_execution_time (default 600s) Solution: Increase HTTP timeout or use streaming:

curl

curl "https://.../agents/researcher?timeout=660"

Agent stops mid-task

Symptom: Agent stops before completing complex reasoning. Cause: max_execution_time too short for the number of iterations needed. Solution: Increase max_execution_time and max_iterations:

max_execution_time=600.0,
max_iterations=50,

Subagent timeouts

Symptom: Subagent tasks fail with timeout errors. Cause: Default subagent timeout (60s) too short for multi-step work. Solution: Increase subagent max_execution_time:

subagents={
    "researcher": {
        "max_execution_time": 300.0,  # 5 minutes
    }
}

Throttle queue timeouts under load

Symptom: Many requests fail with queue timeout when system is busy. Cause: throttle_max_seconds_to_wait_in_queue too short for the load. Solution: Increase queue timeout or reduce concurrency:

Model(
    "claude-sonnet-4-v1",
    throttle=True,
    throttle_max_seconds_to_wait_in_queue=120.0,
)

Best Practices

Start Conservative

Begin with shorter timeouts and increase based on observed behavior. Long timeouts can mask performance issues.

Use Streaming

For tasks over 3 minutes, use streaming to avoid HTTP timeout issues and provide progress to users.

Match Layers

Ensure outer timeouts are always greater than inner timeouts plus overhead.

Monitor Iterations

Track how many iterations your agents typically use to right-size timeouts.

GET STARTED

APPLICATIONS

AGENTS

TOOLS

GUIDES

Timeout Configuration

Understanding Timeout Layers

The Multi-Iteration Challenge

Quick Reference

Layer 1: HTTP API Timeout

Layer 2: Agent Execution Limits

Configuration Options

Estimating Execution Time

Layer 3: Agent Lifecycle Timeouts

Lifecycle Timeout Guidelines

Layer 4: Model Configuration

Model Timeout Guidelines

Layer 5: Throttle Configuration

Throttle Timing Impact

Throttle Configuration by Use Case

Layer 6: Subagent Timeouts

Subagent Timeout Guidelines

Configuration Examples

Interactive Chat Application

Research Agent

Batch Processing

Voice Interface

Timeout Hierarchy

Streaming for Long Tasks

Troubleshooting

”Request timed out” at HTTP layer

Agent stops mid-task

Subagent timeouts

Throttle queue timeouts under load

Best Practices

Start Conservative

Use Streaming

Match Layers

Monitor Iterations

GET STARTED

APPLICATIONS

AGENTS

TOOLS

GUIDES

​Understanding Timeout Layers

​The Multi-Iteration Challenge

​Quick Reference

​Layer 1: HTTP API Timeout

​Layer 2: Agent Execution Limits

​Configuration Options

​Estimating Execution Time

​Layer 3: Agent Lifecycle Timeouts

​Lifecycle Timeout Guidelines

​Layer 4: Model Configuration

​Model Timeout Guidelines

​Layer 5: Throttle Configuration

​Throttle Timing Impact

​Throttle Configuration by Use Case

​Layer 6: Subagent Timeouts

​Subagent Timeout Guidelines

​Configuration Examples

​Interactive Chat Application

​Research Agent

​Batch Processing

​Voice Interface

​Timeout Hierarchy

​Streaming for Long Tasks

​Troubleshooting

​”Request timed out” at HTTP layer

​Agent stops mid-task

​Subagent timeouts

​Throttle queue timeouts under load

​Best Practices

Start Conservative

Use Streaming

Match Layers

Monitor Iterations

Understanding Timeout Layers

The Multi-Iteration Challenge

Quick Reference

Layer 1: HTTP API Timeout

Layer 2: Agent Execution Limits

Configuration Options

Estimating Execution Time

Layer 3: Agent Lifecycle Timeouts

Lifecycle Timeout Guidelines

Layer 4: Model Configuration

Model Timeout Guidelines

Layer 5: Throttle Configuration

Throttle Timing Impact

Throttle Configuration by Use Case

Layer 6: Subagent Timeouts

Subagent Timeout Guidelines

Configuration Examples

Interactive Chat Application

Research Agent

Batch Processing

Voice Interface

Timeout Hierarchy

Streaming for Long Tasks

Troubleshooting

”Request timed out” at HTTP layer

Agent stops mid-task

Subagent timeouts

Throttle queue timeouts under load

Best Practices