AI Agent
Otto's AI agent is the engine that turns natural language requests into completed work. This page explains how the agent operates, what tools it has access to, and how you can configure its behavior.
Otto's AI agent is the engine that turns natural language requests into completed work. This page explains how the agent operates, what tools it has access to, and how you can configure its behavior.
The Agent Loop
When you submit a task, Otto follows a structured loop:
- Receive the task. The user's request is queued to a background worker (ARQ) and the agent begins execution.
- Plan steps. The agent analyzes the request, considers available tools and context, and decides on an approach.
- Use tools. The agent calls tools -- reading files, searching the web, sending emails, querying memory -- as needed to complete the work.
- Evaluate. After each tool call, the agent reviews the results and decides whether to call more tools or deliver a final answer.
- Deliver the result. When the agent determines the work is done, it saves the output and notifies the user.
This is a ReAct (Reason + Act) loop powered by a LangGraph state machine. The agent reasons about what to do next, takes an action, observes the result, and repeats until the task is complete.
LLM Provider Flexibility
Otto is not locked to a single AI provider. You choose which large language model powers the agent by setting environment variables:
| Provider | Configuration | Default Model |
|---|---|---|
| Google Gemini | GOOGLE_API_KEY | gemini-3.1-pro-preview-customtools |
| OpenAI | OPENAI_API_KEY | gpt-5-mini |
| Ollama (local) | OLLAMA_MODEL | gpt-oss:latest |
The provider is selected based on which API key is configured. If multiple are present, the priority order is Gemini, then Ollama, then OpenAI.
OpenAI-compatible endpoints are also supported via the OPEN_AI_BASE environment variable, so you can point Otto at any API that implements the OpenAI chat completions interface.
Tools
The agent has access to a built-in toolkit plus any additional tools provided by MCP servers.
Built-in Tools
| Tool | What It Does |
|---|---|
| ask_human | Send a question to a team member and optionally pause until they respond |
| dispatch_subagent | Break work into subtasks that run in parallel (see Subagents) |
| send_email | Send emails with CC support and file attachments via SMTP |
| read_file | Read text files from the sandbox |
| write_file | Write content to text files in the sandbox |
| list_directory | List files in the sandbox storage area |
| attach_file | Mark a file for delivery to the user on task completion |
| convert_markdown_to_docx | Convert markdown to Word documents using pandoc |
| read_document | Convert PDF and Word documents to markdown for analysis |
| memory_search | Search Otto's persistent memory for relevant past knowledge |
| memory_save | Save new information to persistent memory with tags |
| run_skill_script | Execute scripts bundled with matched skills |
| wait | Pause execution for a specified duration |
MCP Tools
Otto supports the Model Context Protocol (MCP) for connecting to external tool servers. MCP tools extend what Otto can do without modifying the core codebase. Common MCP integrations include:
- Tavily -- web search (included by default)
- Context7 -- documentation lookup
- Slack -- advanced Slack messaging
- Gamma -- presentation creation
- Codebase -- code search and analysis
MCP servers are configured via JSON (see MCP Server Integration) and can be added or removed at runtime without restarting Otto.
Iteration Limits
To prevent runaway execution and manage costs, Otto enforces iteration limits on the agent loop.
- Soft limit (8 iterations). The agent receives a warning message in its prompt: "You are approaching the execution limit. Begin wrapping up your work." This encourages the agent to consolidate results and finish.
- Hard limit (10 iterations). The agent is forcibly stopped. Whatever substantive work the agent has produced up to this point is saved as the task result.
Each iteration is one cycle through the agent node (LLM call) and action node (tool execution). For most tasks, Otto completes well within these limits. Complex tasks that need more work can use subagents to parallelize and get more total compute.
Subagents have their own, higher limits (soft: 12, hard: 15) to accommodate research-heavy workflows.
Context Management
Long conversations with many tool calls can exceed the LLM's context window. Otto handles this with conversation compaction rather than simply dropping old messages.
How compaction works:
- When the message history exceeds 12 messages, the compactor activates.
- Older messages are separated from the 8 most recent messages.
- The older messages are summarized by a lightweight LLM call that preserves the original intent, key decisions, tool results, and current approach.
- The summary replaces the older messages, keeping the conversation within context limits while retaining important information.
The compactor is careful not to split tool-call sequences -- if an AI message requested a tool call and the tool returned a result, they stay together. Summaries are cached per task for efficiency across multiple compaction rounds.
Tool result truncation. Large tool outputs are automatically truncated to prevent context bloat:
- Results under 4,000 characters are returned verbatim.
- Results between 4,000 and 12,000 characters are truncated with a metadata hint.
- Results over 12,000 characters show only the first 20 lines with a suggestion to use
read_filewith offset and limit parameters.
Intent Preprocessing
For deployments with many MCP tools, Otto offers an optional intent preprocessing system that narrows down which tools are relevant before the main agent loop begins. This reduces token usage and helps the agent focus.
How it works:
- The user's request is decomposed into 3-5 concrete work steps via a lightweight LLM call.
- Each step is matched against all available tool descriptions using embedding-based cosine similarity.
- Only the matched tools are made available to the agent (built-in tools are always included).
Intent preprocessing is off by default. Enable it by setting ENABLE_INTENT_PREPROCESSING=true. It is most useful when Otto has dozens of MCP tools configured and you want to reduce noise in the agent's tool selection.
Skill Matching
Independent of intent preprocessing, Otto automatically matches incoming tasks against its skills library. Skills are instruction sets that guide Otto's behavior for specific types of work -- things like research methodology, email writing guidelines, or debugging workflows.
When a task arrives, Otto computes semantic similarity between the task description and all available skill descriptions. Skills above the similarity threshold (default 0.35) are injected into the agent's system prompt, giving it specialized knowledge for the task at hand.
Skills can also restrict which tools the agent uses. For example, a presentation-creation skill might limit the agent to only the Gamma API tools, preventing it from trying other approaches.
See the Skills Writing Guide for details on creating and managing skills.
Persona Customization
You can customize Otto's personality and specialization on a per-project basis using a SOUL.md file.
Create a file at .otto/SOUL.md in your project directory with YAML frontmatter:
Available frontmatter fields:
| Field | Effect |
|---|---|
name | The agent's name (used in notifications and responses) |
role | The agent's job title or function |
specialization | Overrides the default specialization in the system prompt |
tone | Guidance on communication style |
The body of the file can contain any additional instructions, guidelines, or context that should shape how Otto approaches tasks in this project. The persona is loaded at task execution time and cached per project path.
Automatic Memory
When Otto completes a task, it automatically saves a memory of the work to its persistent vector store. This means Otto builds organizational knowledge over time without any manual intervention.
The auto-saved memory includes what was requested, what approach was taken, and what the outcome was. Future tasks can benefit from this accumulated context through the memory_search tool. See the Memory documentation for details.
Environment Variables Reference
Key configuration options for the agent:
| Variable | Default | Description |
|---|---|---|
GOOGLE_API_KEY | -- | Google Gemini API key |
OPENAI_API_KEY | -- | OpenAI API key |
OLLAMA_MODEL | gpt-oss:latest | Ollama model name |
ENABLE_INTENT_PREPROCESSING | false | Enable semantic tool filtering |
SPECIALIZATION | General Assistance | Agent specialization (overridden by SOUL.md) |
ADDITIONAL_INSTRUCTIONS | -- | Extra prompt instructions (overridden by SOUL.md) |
MAX_PARALLEL_TOOL_CALLS | 5 | Max simultaneous tool executions per turn |
COMPACTION_THRESHOLD | 12 | Message count before compaction activates |
MAX_RECENT_MESSAGES | 8 | Messages kept verbatim during compaction |
OTTO_SKILL_THRESHOLD | 0.35 | Minimum similarity for skill matching |