Otto Docs
Reference

Architecture Reference

> Technical architecture of the Otto AI agent system: components, data flows, queues, checkpointing, and the skills matching pipeline.

Technical architecture of the Otto AI agent system: components, data flows, queues, checkpointing, and the skills matching pipeline.


System Components

graph TB
    subgraph "External Services"
        Ollama[Ollama<br/>LLM & Embeddings]
        OpenAI[OpenAI API<br/>LLM & Embeddings]
        Gemini[Google Gemini<br/>LLM & Embeddings]
    end
 
    subgraph "Communication Platforms"
        Slack[Slack]
        Email[Email/SMTP]
        UI[Web UI]
    end
 
    subgraph "Otto Core System"
        subgraph "API Layer"
            FastAPI[FastAPI Backend<br/>Port 8000]
        end
 
        subgraph "Agent Layer"
            ParentGraph[Parent Agent<br/>LangGraph StateGraph]
            SubagentGraph[Subagent<br/>LangGraph StateGraph]
            Preprocess[Preprocess Intent<br/>Skill + Tool Matching]
            AgentNode[Agent Node<br/>LLM Invocation]
            ActionNode[Action Node<br/>Tool Execution]
            Router[Router<br/>Conditional Edge]
        end
 
        subgraph "Worker Layer"
            ARQWorker[ARQ Worker Process]
            InteractiveQueue[arq:interactive<br/>Task Queue]
            SubagentQueue[arq:subagent<br/>Subagent Queue]
            CronJobs[Cron Jobs<br/>Scheduled Tasks]
        end
 
        subgraph "Tool Ecosystem"
            StaticTools[Static Tools<br/>13 built-in]
            MCPTools[MCP Dynamic Tools<br/>Runtime loaded]
            MCPManager[MCP Manager<br/>Multi-Server Client]
        end
 
        subgraph "Memory & Skills"
            VectorStore[Vector Memory Store<br/>SQLite + Embeddings]
            SkillsMgr[Skills Manager<br/>Matching + Injection]
            Compactor[Conversation Compactor<br/>LLM Summarization]
        end
 
        subgraph "Communication Layer"
            CommManager[Communication Manager]
            SlackProvider[Slack Provider]
            EmailProvider[Email Provider]
            UIProvider[UI Provider]
        end
 
        subgraph "Data Layer"
            Redis[(Redis<br/>Queues, Checkpoints, Cache)]
            SQLite[(SQLite<br/>Tasks, Users, Logs)]
            MemoryDB[(Memory DB<br/>Vector Store)]
            FileStore[File Storage<br/>Local / GCS / Drive]
        end
    end
 
    FastAPI --> InteractiveQueue
    InteractiveQueue --> ARQWorker
    SubagentQueue --> ARQWorker
    ARQWorker --> ParentGraph
    ARQWorker --> SubagentGraph
 
    ParentGraph --> Preprocess
    Preprocess --> AgentNode
    AgentNode --> Router
    Router --> ActionNode
    ActionNode --> AgentNode
 
    AgentNode -->|LLM Calls| OpenAI
    AgentNode -->|LLM Calls| Gemini
    AgentNode -->|LLM Calls| Ollama
 
    ActionNode --> StaticTools
    ActionNode --> MCPTools
    MCPManager --> MCPTools
 
    AgentNode --> VectorStore
    Preprocess --> SkillsMgr
    AgentNode --> Compactor
 
    CommManager --> SlackProvider
    CommManager --> EmailProvider
    CommManager --> UIProvider
    SlackProvider --> Slack
    EmailProvider --> Email
    UIProvider --> UI
 
    ARQWorker --> Redis
    ARQWorker --> SQLite
    VectorStore --> MemoryDB
    ActionNode --> FileStore
 
    classDef api fill:#4A90E2,stroke:#2E5C8A,stroke-width:2px,color:#fff
    classDef agent fill:#7B68EE,stroke:#4B0082,stroke-width:2px,color:#fff
    classDef worker fill:#FF8C00,stroke:#CC7000,stroke-width:2px,color:#fff
    classDef tool fill:#50C878,stroke:#2F7C4F,stroke-width:2px,color:#fff
    classDef data fill:#FFA07A,stroke:#CD5C5C,stroke-width:2px,color:#fff
    classDef external fill:#DDA0DD,stroke:#9370DB,stroke-width:2px,color:#fff
    classDef comm fill:#FF6B6B,stroke:#C92A2A,stroke-width:2px,color:#fff
 
    class FastAPI api
    class ParentGraph,SubagentGraph,Preprocess,AgentNode,ActionNode,Router agent
    class ARQWorker,InteractiveQueue,SubagentQueue,CronJobs worker
    class StaticTools,MCPTools,MCPManager tool
    class Redis,SQLite,MemoryDB,FileStore data
    class Ollama,OpenAI,Gemini,Slack,Email,UI external
    class CommManager,SlackProvider,EmailProvider,UIProvider comm

Component Summary

ComponentTechnologyPurpose
API ServerFastAPI (Python)REST API, request routing, Slack event handling
ARQ WorkerARQ (async Redis queue)Background agent execution, subagent processing, cron jobs
RedisRedis or Upstash RedisTask queues, LangGraph checkpoints, caching
SQLiteSQLite with WAL modeTasks, users, logs, outputs, notifications, channels
Memory DBSQLite (separate file)Vector memory store for semantic search
File StorageLocal / GCS / Google DriveUser file uploads, sandbox for file tools, skill scripts
FrontendNext.js 15, React 19Web UI with 5-second polling
MCP ManagerModel Context ProtocolDynamic tool loading from external MCP servers
Communication ManagerChannel abstractionRoutes notifications to Slack, Email, or Web UI

Agent Execution Flow

Otto's core execution engine is a LangGraph StateGraph with three nodes and one conditional edge. The agent follows a ReAct (Reason + Act) loop.

Parent Agent Graph

Entry
  |
  v
[preprocess_intent]  --->  [agent]  <---->  [action]
                              |                  |
                              |   (conditional)  |
                              v                  |
                             END <---------------+

Nodes:

NodeFunctionPurpose
preprocess_intentpreprocess_intent_nodeMatch skills and (optionally) semantically filter tools before the first LLM call
agentcall_modelAssemble system prompt, inject context, invoke LLM with bound tools
actiontool_node_with_loggingExecute tool calls, truncate results, handle interrupts

Conditional edge (should_call_tool_or_pause):

ConditionRoute
LLM response contains tool callsaction
LLM response contains text, no tool callsEND (task complete)
Hard iteration limit reached (10)END (forced completion)
Task not found in DBEND

Subagent Graph

Subagents use a simplified graph with no preprocessing node:

Entry
  |
  v
[agent]  <---->  [action]
   |                |
   v                |
  END <-------------+

Key differences from parent:

PropertyParent AgentSubagent
PreprocessingYes (skill + tool matching)No
Soft iteration limit812
Hard iteration limit1015
LangGraph recursion limit2535
Step timeout300s (5 min)600s (10 min)
ask_human toolAvailableBlocked
dispatch_subagent toolAvailableBlocked
MCP sessionShared (parent's)Own fresh session
Token budgetUnlimitedConfigurable via token_budget

AgentState Schema

FieldTypePurpose
messagesSequence[BaseMessage]Append-only conversation history
dbDatabaseManagerDatabase handle
task_idstrCurrent task ID
user_idstrTask creator / requester
inputstrOriginal user request text
mcp_managerMCPManagerMCP tool server manager
communication_managerCommunicationManagerNotification routing
skills_managerSkillsManagerSkill matching and context injection
semantically_matched_toolsList[Any]Tools matched by intent preprocessing
relevant_skillsList[Any]Skills matched for prompt injection
intent_stepsList[str]Extracted work steps from user query
project_idstrSlack channel / project for context
iteration_countintLoop counter for runaway prevention
attachmentsList[str]File paths to attach to completion notification

Data Flow

Task Lifecycle (End-to-End)

1. User submits task (API POST /chat, Slack @mention, or webhook)
     |
2. API creates task record (status: pending)
   Enqueues `process_agent_task` to arq:interactive
     |
3. ARQ worker picks up job
   Lazy-initializes heavy components (LLM, MCP, embeddings) if needed
     |
4. run_agent_in_background():
   - Generate checkpoint thread_id ("task-{task_id}")
   - Save checkpoint_thread_id to DB
   - Set task status: running
   - Create initial HumanMessage from user input
     |
5. Graph execution:
   |
   |-- [preprocess_intent_node]
   |     Match skills (always)
   |     Extract steps + match tools (if ENABLE_INTENT_PREPROCESSING=true)
   |
   |-- [call_model] <-----------+
   |     Assemble system prompt  |
   |     Inject context blocks:  |
   |       - Project context     |
   |       - Skills instructions |
   |       - Team info           |
   |       - Vector memories     |
   |       - Persona (SOUL.md)   |
   |     Compact messages if >12 |
   |     Invoke LLM with tools   |
   |                             |
   |-- [should_call_tool_or_pause]
   |     |                       |
   |     |-- tool_calls? -----> [tool_node_with_logging]
   |     |                       Execute tools
   |     |                       Truncate large results
   |     |                       Detect file attachments
   |     |                       Handle ask_human interrupt
   |     |                       Auto-park on subagent dispatch
   |     |                       Return tool results --------+
   |     |
   |     |-- no tool_calls + content --> complete_task_with_result()
   |     |     Save TaskOutput
   |     |     Set status: completed, progress: 100
   |     |     Auto-save memory
   |     |     Send completion notification (with attachments)
   |     |     Clean up checkpoint
   |     |     --> END
   |     |
   |     |-- hard iteration limit --> force complete
   |
6. Interrupt handling (if applicable):
   |
   |-- ask_human: status → waiting_input
   |     Human responds via API or Slack
   |     resume_agent_task enqueued
   |     Graph resumes from Redis checkpoint
   |
   |-- subagent dispatch: status → waiting_subagents
         Subagent(s) run in parallel on arq:interactive
         Each has own graph, MCP session, iteration budget
         On completion: _check_and_resume_parent()
         All siblings terminal? Collect results, resume parent

Communication Flow

Agent completes task or calls ask_human
     |
     v
CommunicationManager.send_notification()
     |
     |--> SlackProvider  --> Slack API  --> User's Slack DM / channel
     |--> EmailProvider  --> SMTP       --> User's email
     |--> UIProvider     --> DB         --> Frontend polls /users/{id}/notifications

File Upload Flow

User uploads file via POST /upload-file
     |
     v
Validate extension (allowlist)
Check storage quota (STORAGE_QUOTA_GB)
     |
     v
Save to user sandbox: {LOCAL_STORAGE_PATH}/{user_id}/{filename}
     |
     v
Agent accesses via read_file / list_directory tools (sandboxed)

Queue Architecture

Otto uses ARQ (async Redis queues) for all background processing.

Queues

Queue NamePurposeProducersConsumers
arq:interactivePrimary task and subagent processingAPI server (POST /chat, /human_response, /slack/dm_response, webhooks)ARQ worker
arq:subagentSubagent task processing (can share workers with interactive)dispatch_subagent toolARQ worker

Job Types

Job FunctionQueueTriggered ByDescription
process_agent_taskarq:interactivePOST /chat, Slack events, webhooksRun the parent agent graph for a new task
resume_agent_taskarq:interactivePOST /human_response, subagent completionResume a paused agent from its Redis checkpoint
process_subagent_task_v2arq:interactivedispatch_subagent toolRun a subagent graph for a subtask

Cron Jobs

The ARQ worker also runs scheduled tasks:

JobSchedulePurpose
Email monitorEvery IMAP_CHECK seconds (default 60)Check IMAP inbox for new emails, create tasks
Scheduler serviceEvery 60 secondsExecute due scheduled/recurring tasks
Skills refreshEvery SKILLS_REFRESH_INTERVAL seconds (default 60)Rescan skill directory for changes

Worker Initialization

The ARQ worker performs lazy initialization of heavy components on first use:

  1. LLM model -- configured via USE_GEMINI / USE_OLLAMA / OpenAI default
  2. MCP Manager -- connects to all configured MCP servers
  3. Embedding model -- matches the active LLM provider
  4. Skills Manager -- loads and indexes skills from filesystem
  5. Compiled LangGraph -- parent agent graph with Redis checkpointer

These are cached for the worker's lifetime and reused across jobs.


Checkpoint and Resume Mechanism

Otto uses LangGraph's AsyncRedisSaver to persist full agent state to Redis. This enables pausing and resuming the agent across process boundaries.

Checkpoint Lifecycle

Task Created
  |
  v
run_agent_in_background()
  |-- thread_id = "task-{task_id}"
  |-- Save checkpoint_thread_id to task record
  |-- agent.astream() with AsyncRedisSaver checkpointer
  |
  |-- Every graph node execution: full AgentState serialized to Redis
  |
  |-- [GraphInterrupt: ask_human or subagent park]
  |     |
  |     v
  |   Task paused
  |   Status: waiting_input or waiting_subagents
  |   last_tool_call_id saved to DB
  |   Checkpoint preserved in Redis
  |     |
  |     v  (human responds or subagents complete)
  |   resume_agent_in_background()
  |     |-- Read checkpoint_thread_id from DB
  |     |-- Read last_tool_call_id from DB
  |     |-- Construct ToolMessage(s) with response
  |     |-- agent.astream() with same thread_id
  |     |-- LangGraph restores full state from Redis
  |     |-- ReAct loop continues
  |
  v
Task completed
  |-- cleanup_task_checkpoint(): delete all Redis keys for thread
  |-- Clear checkpoint_thread_id from DB

Redis Key Pattern

Checkpoint keys follow the pattern: checkpoints:task-{task_id}:*

Redis Connection Pool

SettingValue
decode_responsesFalse (binary for serialization)
health_check_interval30s
max_connections10
retry_on_timeoutTrue
socket_keepaliveTrue

Resume Scenarios

TriggerHow It Works
Human responsePOST /human_response enqueues resume_agent_task. last_tool_call_id is a single string. One ToolMessage is constructed with the human's answer.
Subagent completion_check_and_resume_parent() runs after each subagent finishes. When all siblings are terminal, acquires a Redis SETNX lock (resume_lock:{parent_task_id}, 60s TTL) to prevent double-resume. Collects results from all subtasks. last_tool_call_id is a JSON array of tool call IDs. Multiple ToolMessages are constructed.

Subagent System

The parent agent can delegate focused subtasks to independent subagent workers via the dispatch_subagent tool.

Dispatch Flow

Parent agent calls dispatch_subagent(task_description, context, agent_type, max_iterations)
  |
  v
New task record created (task_type: "subtask", parent_task_id: parent)
  |
  v
process_subagent_task_v2 enqueued to arq:interactive
  |
  v
Subtask ID appended to _pending_dispatch_ids (module-level list)
  |
  v
After all tools execute, tool_node_with_logging calls take_pending_dispatches()
  |
  |--> Fast path: All subtasks already terminal? Collect results inline. Parent continues.
  |
  |--> Slow path (common): Update parent status to waiting_subagents.
       Save dispatch tool_call_ids as JSON array in last_tool_call_id.
       Raise GraphInterrupt. Parent paused.

Agent Types

TypeDescriptionExcluded ToolsDefault Iterations
generalGeneral-purpose subagentask_human, dispatch_subagent10
researchResearch-focused, read-onlyask_human, dispatch_subagent, write_file, send_email10
tool-specialistChained tool workflowsask_human, dispatch_subagent, send_email10

Token and Context Management

Conversation Compaction

When len(messages) > COMPACTION_THRESHOLD (default 12):

  1. Split messages into "older" and "recent" at len - MAX_RECENT_MESSAGES (default 8).
  2. Adjust split point to never break a tool-call sequence (AIMessage with tool_calls stays with its ToolMessages).
  3. Summarize older messages via LLM call (preserves: original intent, key decisions, tool results, approach, errors).
  4. Replace older messages with a single SystemMessage containing the summary.
  5. Cache summary by task_id for incremental updates on subsequent compactions.
  6. Fallback: if summarization fails, trim to most recent messages and strip leading ToolMessages.

Tool Result Truncation

TierSizeBehavior
Small<= 4,000 charsReturned verbatim
Medium4,001 -- 12,000 charsTruncated to 4,000 chars with metadata hint
Large> 12,000 charsFirst 20 lines only, with hint to use read_file with offset/limit

Parallel Tool Call Limiting

MAX_PARALLEL_TOOL_CALLS (default 5) caps how many tools execute in a single turn. Excess calls receive synthetic "skipped -- too many simultaneous requests" responses.

Iteration Limits

AgentSoft LimitHard LimitSoft Limit BehaviorHard Limit Behavior
Parent810Wrap-up warning injected into system promptForce task completion with last substantive content
Subagent1215Wrap-up warning injectedForced completion

Token Tracker (Subagents Only)

  • At 80% of budget: warning injected into system prompt.
  • At 100%: exhausted flag set, router forces completion.
  • Budget None: unlimited (no warnings or limits).

Skills Matching Pipeline

Skills are markdown instruction files that get injected into the agent's system prompt when they match the user's query. Skills can also declare required tools (e.g., specific MCP tools) that are force-included.

Matching Process

                              +--------------------+
                              |   USER REQUEST     |
                              |  "Create a gamma   |
                              |   presentation"    |
                              +---------+----------+
                                        |
                                        v
                         +-----------------------------+
                         | PREPROCESS_INTENT_NODE      |
                         |                             |
                         |  1. Extract Work Steps      |
                         |     (LLM structured output) |
                         |     "research AI topic"     |
                         |     "create presentation"   |
                         |                             |
                         |  2. Semantic Tool Matching   |
                         |     (if ENABLE_INTENT_      |
                         |      PREPROCESSING=true)    |
                         |     Embed steps + tools     |
                         |     Cosine similarity       |
                         |     Top-k per step          |
                         |                             |
                         |  3. Skill Matching (always) |
                         |     a. Keyword boost:       |
                         |        tag in query? +0.1   |
                         |     b. Semantic matching:   |
                         |        embed description    |
                         |        cosine similarity    |
                         |     c. Threshold: 0.35      |
                         +---------+-------------------+
                                   |
                                   v
                         +-----------------------------+
                         | AGENT NODE (call_model)     |
                         |                             |
                         | 4. Skill Tool Resolution    |
                         |    For each matched skill:  |
                         |      allowed_tools list     |
                         |      Exact match first      |
                         |      Then partial match     |
                         |      Case insensitive       |
                         |                             |
                         | 5. Union Logic              |
                         |    Skills-required tools    |
                         |    (guaranteed) + semantic  |
                         |    matches (deduplicated)   |
                         |                             |
                         | 6. Prompt Injection         |
                         |    Skill instructions       |
                         |    appended to system prompt|
                         |                             |
                         | 7. LLM Invocation           |
                         |    bind_tools(final set)    |
                         +-----------------------------+

Skill Matching Algorithm

Step 1: Keyword Boost (first priority)

  • For each skill's tags: if tag appears in query.lower(), boost score to threshold + 0.1.

Step 2: Semantic Matching

  • Embed skill description + tags using the configured embedding model.
  • Compute cosine similarity against query embedding.

Step 3: Threshold Check

  • Default threshold: 0.35 (OTTO_SKILL_THRESHOLD).
  • Boosted skills exceed threshold automatically.
  • Maximum skills returned: 3 (OTTO_MAX_SKILLS).

Tool Matching Algorithm (Intent Preprocessing)

Only runs when ENABLE_INTENT_PREPROCESSING=true.

  1. Extract 3-5 work steps from user query (1 LLM call with structured output).
  2. Embed all available tool descriptions (static + MCP).
  3. For each step, compute cosine similarity against all tool embeddings.
  4. Return top-3 tools per step, deduplicated.

Skill-to-Tool Resolution

When a matched skill declares allowed_tools, those tools are resolved against all available tools:

Resolution StepExample
1. Exact match"wait" matches tool named "wait"
2. Partial match (case-insensitive)"gamma_create" matches "mcp__gamma__gamma_create_generation"

Final Tool Selection

Skills-Required Tools (guaranteed inclusion)
  + Semantic Matched Tools (added if not duplicate)
  + Static Tools (always included)
  = Final available tools bound to LLM

Static Tool Inventory

These tools are always available to the parent agent:

ToolModulePurpose
ask_humantools/human_tool.pySend questions/messages to humans. wait_for_answer=True pauses via GraphInterrupt
send_emailtools/email_tool.pySend emails via SMTP
list_directorytools/file_tools.pyList files in sandbox directory
read_filetools/file_tools.pyRead file contents with offset/limit support
write_filetools/file_tools.pyWrite content to sandbox file
waittools/file_tools.pySleep for N seconds (for async MCP workflows)
convert_markdown_to_docxtools/file_tools.pyConvert markdown to .docx
attach_filetools/file_tools.pyMark file for attachment to completion message
read_documenttools/document_tools.pyRead uploaded documents
memory_searchtools/memory_tools.pySearch the vector memory store
memory_savetools/memory_tools.pySave a memory to the vector store
dispatch_subagenttools/subagent_tools.pyDispatch a subtask to a background subagent
run_skill_scripttools/script_tool.pyExecute a skill's script with configurable timeout

LLM Provider Configuration

Provider selection checked in order of priority:

PriorityEnv VarProviderDefault ModelTemperature
1USE_GEMINI=trueGoogle Gemini (ChatGoogleGenerativeAI)gemini-3.1-pro-preview-customtools1.0
2USE_OLLAMA=trueOllama (ChatOllama)gpt-oss:latest0
3DefaultOpenAI (ChatOpenAI)gpt-5-mini1.0

Embedding provider matches the active LLM provider:

LLM ProviderEmbedding ModelImplementation
Geminimodels/gemini-embedding-001GoogleGenerativeAIEmbeddings
Ollamanomic-embed-text:latestCustom OllamaEmbeddings
OpenAItext-embedding-3-smallOpenAIEmbeddings

Persona System

Otto supports per-project personality customization via .otto/SOUL.md files.

SOUL.md Format

---
name: Otto
role: Marketing Assistant
specialization: Digital Marketing and Content Creation
tone: Professional but friendly
---
 
Additional instructions and personality details as free-form markdown.

The YAML frontmatter is optional. The body becomes the instructions field.

Loading Behavior

  • load_persona(project_path) checks {project_path}/.otto/SOUL.md.
  • If found: specialization overrides SPECIALIZATION env var; instructions overrides ADDITIONAL_INSTRUCTIONS.
  • Results are cached per project path.

System Prompt Assembly

The agent node assembles the system prompt by combining these layers (in order):

LayerSourceAlways Present
Base promptMODEL_PROMPT template with {specialization}, {input}, {requester_info}, {team_info}Yes
Additional instructionsADDITIONAL_INSTRUCTIONS env var or SOUL.mdIf configured
Project contextSlack channel summary + pending messagesIf project_id set
Skills contextMatched skill instructionsIf skills matched
Team contextChannel member roles and responsibilitiesIf project has channel
Memory contextTop 3 relevant memories from vector storeIf memories found
Wrap-up warningIteration count and remaining budgetAt soft limit