Otto Docs
Features

AI Agent

Otto's AI agent is the engine that turns natural language requests into completed work. This page explains how the agent operates, what tools it has access to, and how you can configure its behavior.

Otto's AI agent is the engine that turns natural language requests into completed work. This page explains how the agent operates, what tools it has access to, and how you can configure its behavior.


The Agent Loop

When you submit a task, Otto follows a structured loop:

  1. Receive the task. The user's request is queued to a background worker (ARQ) and the agent begins execution.
  2. Plan steps. The agent analyzes the request, considers available tools and context, and decides on an approach.
  3. Use tools. The agent calls tools -- reading files, searching the web, sending emails, querying memory -- as needed to complete the work.
  4. Evaluate. After each tool call, the agent reviews the results and decides whether to call more tools or deliver a final answer.
  5. Deliver the result. When the agent determines the work is done, it saves the output and notifies the user.

This is a ReAct (Reason + Act) loop powered by a LangGraph state machine. The agent reasons about what to do next, takes an action, observes the result, and repeats until the task is complete.


LLM Provider Flexibility

Otto is not locked to a single AI provider. You choose which large language model powers the agent by setting environment variables:

ProviderConfigurationDefault Model
Google GeminiGOOGLE_API_KEYgemini-3.1-pro-preview-customtools
OpenAIOPENAI_API_KEYgpt-5-mini
Ollama (local)OLLAMA_MODELgpt-oss:latest

The provider is selected based on which API key is configured. If multiple are present, the priority order is Gemini, then Ollama, then OpenAI.

OpenAI-compatible endpoints are also supported via the OPEN_AI_BASE environment variable, so you can point Otto at any API that implements the OpenAI chat completions interface.


Tools

The agent has access to a built-in toolkit plus any additional tools provided by MCP servers.

Built-in Tools

ToolWhat It Does
ask_humanSend a question to a team member and optionally pause until they respond
dispatch_subagentBreak work into subtasks that run in parallel (see Subagents)
send_emailSend emails with CC support and file attachments via SMTP
read_fileRead text files from the sandbox
write_fileWrite content to text files in the sandbox
list_directoryList files in the sandbox storage area
attach_fileMark a file for delivery to the user on task completion
convert_markdown_to_docxConvert markdown to Word documents using pandoc
read_documentConvert PDF and Word documents to markdown for analysis
memory_searchSearch Otto's persistent memory for relevant past knowledge
memory_saveSave new information to persistent memory with tags
run_skill_scriptExecute scripts bundled with matched skills
waitPause execution for a specified duration

MCP Tools

Otto supports the Model Context Protocol (MCP) for connecting to external tool servers. MCP tools extend what Otto can do without modifying the core codebase. Common MCP integrations include:

  • Tavily -- web search (included by default)
  • Context7 -- documentation lookup
  • Slack -- advanced Slack messaging
  • Gamma -- presentation creation
  • Codebase -- code search and analysis

MCP servers are configured via JSON (see MCP Server Integration) and can be added or removed at runtime without restarting Otto.


Iteration Limits

To prevent runaway execution and manage costs, Otto enforces iteration limits on the agent loop.

  • Soft limit (8 iterations). The agent receives a warning message in its prompt: "You are approaching the execution limit. Begin wrapping up your work." This encourages the agent to consolidate results and finish.
  • Hard limit (10 iterations). The agent is forcibly stopped. Whatever substantive work the agent has produced up to this point is saved as the task result.

Each iteration is one cycle through the agent node (LLM call) and action node (tool execution). For most tasks, Otto completes well within these limits. Complex tasks that need more work can use subagents to parallelize and get more total compute.

Subagents have their own, higher limits (soft: 12, hard: 15) to accommodate research-heavy workflows.


Context Management

Long conversations with many tool calls can exceed the LLM's context window. Otto handles this with conversation compaction rather than simply dropping old messages.

How compaction works:

  1. When the message history exceeds 12 messages, the compactor activates.
  2. Older messages are separated from the 8 most recent messages.
  3. The older messages are summarized by a lightweight LLM call that preserves the original intent, key decisions, tool results, and current approach.
  4. The summary replaces the older messages, keeping the conversation within context limits while retaining important information.

The compactor is careful not to split tool-call sequences -- if an AI message requested a tool call and the tool returned a result, they stay together. Summaries are cached per task for efficiency across multiple compaction rounds.

Tool result truncation. Large tool outputs are automatically truncated to prevent context bloat:

  • Results under 4,000 characters are returned verbatim.
  • Results between 4,000 and 12,000 characters are truncated with a metadata hint.
  • Results over 12,000 characters show only the first 20 lines with a suggestion to use read_file with offset and limit parameters.

Intent Preprocessing

For deployments with many MCP tools, Otto offers an optional intent preprocessing system that narrows down which tools are relevant before the main agent loop begins. This reduces token usage and helps the agent focus.

How it works:

  1. The user's request is decomposed into 3-5 concrete work steps via a lightweight LLM call.
  2. Each step is matched against all available tool descriptions using embedding-based cosine similarity.
  3. Only the matched tools are made available to the agent (built-in tools are always included).

Intent preprocessing is off by default. Enable it by setting ENABLE_INTENT_PREPROCESSING=true. It is most useful when Otto has dozens of MCP tools configured and you want to reduce noise in the agent's tool selection.


Skill Matching

Independent of intent preprocessing, Otto automatically matches incoming tasks against its skills library. Skills are instruction sets that guide Otto's behavior for specific types of work -- things like research methodology, email writing guidelines, or debugging workflows.

When a task arrives, Otto computes semantic similarity between the task description and all available skill descriptions. Skills above the similarity threshold (default 0.35) are injected into the agent's system prompt, giving it specialized knowledge for the task at hand.

Skills can also restrict which tools the agent uses. For example, a presentation-creation skill might limit the agent to only the Gamma API tools, preventing it from trying other approaches.

See the Skills Writing Guide for details on creating and managing skills.


Persona Customization

You can customize Otto's personality and specialization on a per-project basis using a SOUL.md file.

Create a file at .otto/SOUL.md in your project directory with YAML frontmatter:

---
name: Otto
role: Marketing Assistant
specialization: Digital Marketing and Content Creation
tone: Professional but friendly
---
 
Additional instructions go here as free-form markdown. These are
injected into the agent's system prompt for all tasks in this project.

Available frontmatter fields:

FieldEffect
nameThe agent's name (used in notifications and responses)
roleThe agent's job title or function
specializationOverrides the default specialization in the system prompt
toneGuidance on communication style

The body of the file can contain any additional instructions, guidelines, or context that should shape how Otto approaches tasks in this project. The persona is loaded at task execution time and cached per project path.


Automatic Memory

When Otto completes a task, it automatically saves a memory of the work to its persistent vector store. This means Otto builds organizational knowledge over time without any manual intervention.

The auto-saved memory includes what was requested, what approach was taken, and what the outcome was. Future tasks can benefit from this accumulated context through the memory_search tool. See the Memory documentation for details.


Environment Variables Reference

Key configuration options for the agent:

VariableDefaultDescription
GOOGLE_API_KEY--Google Gemini API key
OPENAI_API_KEY--OpenAI API key
OLLAMA_MODELgpt-oss:latestOllama model name
ENABLE_INTENT_PREPROCESSINGfalseEnable semantic tool filtering
SPECIALIZATIONGeneral AssistanceAgent specialization (overridden by SOUL.md)
ADDITIONAL_INSTRUCTIONS--Extra prompt instructions (overridden by SOUL.md)
MAX_PARALLEL_TOOL_CALLS5Max simultaneous tool executions per turn
COMPACTION_THRESHOLD12Message count before compaction activates
MAX_RECENT_MESSAGES8Messages kept verbatim during compaction
OTTO_SKILL_THRESHOLD0.35Minimum similarity for skill matching

On this page