AI Agent

Otto's AI agent is the engine that turns natural language requests into completed work. This page explains how the agent operates, what tools it has access to, and how you can configure its behavior.

The Agent Loop

When you submit a task, Otto follows a structured loop:

Receive the task. The user's request is queued to a background worker (ARQ) and the agent begins execution.
Plan steps. The agent analyzes the request, considers available tools and context, and decides on an approach.
Use tools. The agent calls tools -- reading files, searching the web, sending emails, querying memory -- as needed to complete the work.
Evaluate. After each tool call, the agent reviews the results and decides whether to call more tools or deliver a final answer.
Deliver the result. When the agent determines the work is done, it saves the output and notifies the user.

This is a ReAct (Reason + Act) loop powered by a LangGraph state machine. The agent reasons about what to do next, takes an action, observes the result, and repeats until the task is complete.

LLM Provider Flexibility

Otto is not locked to a single AI provider. You choose which large language model powers the agent by setting environment variables:

Provider	Configuration	Default Model
Google Gemini	`GOOGLE_API_KEY`	`gemini-3.1-pro-preview-customtools`
OpenAI	`OPENAI_API_KEY`	`gpt-5-mini`
Ollama (local)	`OLLAMA_MODEL`	`gpt-oss:latest`

The provider is selected based on which API key is configured. If multiple are present, the priority order is Gemini, then Ollama, then OpenAI.

OpenAI-compatible endpoints are also supported via the OPEN_AI_BASE environment variable, so you can point Otto at any API that implements the OpenAI chat completions interface.

Tools

The agent has access to a built-in toolkit plus any additional tools provided by MCP servers.

Built-in Tools

Tool	What It Does
ask_human	Send a question to a team member and optionally pause until they respond
dispatch_subagent	Break work into subtasks that run in parallel (see Subagents)
send_email	Send emails with CC support and file attachments via SMTP
read_file	Read text files from the sandbox
write_file	Write content to text files in the sandbox
list_directory	List files in the sandbox storage area
attach_file	Mark a file for delivery to the user on task completion
convert_markdown_to_docx	Convert markdown to Word documents using pandoc
read_document	Convert PDF and Word documents to markdown for analysis
memory_search	Search Otto's persistent memory for relevant past knowledge
memory_save	Save new information to persistent memory with tags
run_skill_script	Execute scripts bundled with matched skills
wait	Pause execution for a specified duration

MCP Tools

Otto supports the Model Context Protocol (MCP) for connecting to external tool servers. MCP tools extend what Otto can do without modifying the core codebase. Common MCP integrations include:

Tavily -- web search (included by default)
Context7 -- documentation lookup
Slack -- advanced Slack messaging
Gamma -- presentation creation
Codebase -- code search and analysis

MCP servers are configured via JSON (see MCP Server Integration) and can be added or removed at runtime without restarting Otto.

Iteration Limits

To prevent runaway execution and manage costs, Otto enforces iteration limits on the agent loop.

Soft limit (8 iterations). The agent receives a warning message in its prompt: "You are approaching the execution limit. Begin wrapping up your work." This encourages the agent to consolidate results and finish.
Hard limit (10 iterations). The agent is forcibly stopped. Whatever substantive work the agent has produced up to this point is saved as the task result.

Each iteration is one cycle through the agent node (LLM call) and action node (tool execution). For most tasks, Otto completes well within these limits. Complex tasks that need more work can use subagents to parallelize and get more total compute.

Subagents have their own, higher limits (soft: 12, hard: 15) to accommodate research-heavy workflows.

Context Management

Long conversations with many tool calls can exceed the LLM's context window. Otto handles this with conversation compaction rather than simply dropping old messages.

How compaction works:

When the message history exceeds 12 messages, the compactor activates.
Older messages are separated from the 8 most recent messages.
The older messages are summarized by a lightweight LLM call that preserves the original intent, key decisions, tool results, and current approach.
The summary replaces the older messages, keeping the conversation within context limits while retaining important information.

The compactor is careful not to split tool-call sequences -- if an AI message requested a tool call and the tool returned a result, they stay together. Summaries are cached per task for efficiency across multiple compaction rounds.

Tool result truncation. Large tool outputs are automatically truncated to prevent context bloat:

Results under 4,000 characters are returned verbatim.
Results between 4,000 and 12,000 characters are truncated with a metadata hint.
Results over 12,000 characters show only the first 20 lines with a suggestion to use read_file with offset and limit parameters.

Intent Preprocessing

For deployments with many MCP tools, Otto offers an optional intent preprocessing system that narrows down which tools are relevant before the main agent loop begins. This reduces token usage and helps the agent focus.

How it works:

The user's request is decomposed into 3-5 concrete work steps via a lightweight LLM call.
Each step is matched against all available tool descriptions using embedding-based cosine similarity.
Only the matched tools are made available to the agent (built-in tools are always included).

Intent preprocessing is off by default. Enable it by setting ENABLE_INTENT_PREPROCESSING=true. It is most useful when Otto has dozens of MCP tools configured and you want to reduce noise in the agent's tool selection.

Skill Matching

Independent of intent preprocessing, Otto automatically matches incoming tasks against its skills library. Skills are instruction sets that guide Otto's behavior for specific types of work -- things like research methodology, email writing guidelines, or debugging workflows.

When a task arrives, Otto computes semantic similarity between the task description and all available skill descriptions. Skills above the similarity threshold (default 0.35) are injected into the agent's system prompt, giving it specialized knowledge for the task at hand.

Skills can also restrict which tools the agent uses. For example, a presentation-creation skill might limit the agent to only the Gamma API tools, preventing it from trying other approaches.

See the Skills Writing Guide for details on creating and managing skills.

Persona Customization

You can customize Otto's personality and specialization on a per-project basis using a SOUL.md file.

Create a file at .otto/SOUL.md in your project directory with YAML frontmatter:

---
name: Otto
role: Marketing Assistant
specialization: Digital Marketing and Content Creation
tone: Professional but friendly
---
 
Additional instructions go here as free-form markdown. These are
injected into the agent's system prompt for all tasks in this project.

Available frontmatter fields:

Field	Effect
`name`	The agent's name (used in notifications and responses)
`role`	The agent's job title or function
`specialization`	Overrides the default specialization in the system prompt
`tone`	Guidance on communication style

The body of the file can contain any additional instructions, guidelines, or context that should shape how Otto approaches tasks in this project. The persona is loaded at task execution time and cached per project path.

Automatic Memory

When Otto completes a task, it automatically saves a memory of the work to its persistent vector store. This means Otto builds organizational knowledge over time without any manual intervention.

The auto-saved memory includes what was requested, what approach was taken, and what the outcome was. Future tasks can benefit from this accumulated context through the memory_search tool. See the Memory documentation for details.

Environment Variables Reference

Key configuration options for the agent:

Variable	Default	Description
`GOOGLE_API_KEY`	--	Google Gemini API key
`OPENAI_API_KEY`	--	OpenAI API key
`OLLAMA_MODEL`	`gpt-oss:latest`	Ollama model name
`ENABLE_INTENT_PREPROCESSING`	`false`	Enable semantic tool filtering
`SPECIALIZATION`	`General Assistance`	Agent specialization (overridden by SOUL.md)
`ADDITIONAL_INSTRUCTIONS`	--	Extra prompt instructions (overridden by SOUL.md)
`MAX_PARALLEL_TOOL_CALLS`	`5`	Max simultaneous tool executions per turn
`COMPACTION_THRESHOLD`	`12`	Message count before compaction activates
`MAX_RECENT_MESSAGES`	`8`	Messages kept verbatim during compaction
`OTTO_SKILL_THRESHOLD`	`0.35`	Minimum similarity for skill matching

On this page