Architecture Reference

> Technical architecture of the Otto AI agent system: components, data flows, queues, checkpointing, and the skills matching pipeline.

Technical architecture of the Otto AI agent system: components, data flows, queues, checkpointing, and the skills matching pipeline.

System Components

graph TB
    subgraph "External Services"
        Ollama[Ollama<br/>LLM & Embeddings]
        OpenAI[OpenAI API<br/>LLM & Embeddings]
        Gemini[Google Gemini<br/>LLM & Embeddings]
    end
 
    subgraph "Communication Platforms"
        Slack[Slack]
        Email[Email/SMTP]
        UI[Web UI]
    end
 
    subgraph "Otto Core System"
        subgraph "API Layer"
            FastAPI[FastAPI Backend<br/>Port 8000]
        end
 
        subgraph "Agent Layer"
            ParentGraph[Parent Agent<br/>LangGraph StateGraph]
            SubagentGraph[Subagent<br/>LangGraph StateGraph]
            Preprocess[Preprocess Intent<br/>Skill + Tool Matching]
            AgentNode[Agent Node<br/>LLM Invocation]
            ActionNode[Action Node<br/>Tool Execution]
            Router[Router<br/>Conditional Edge]
        end
 
        subgraph "Worker Layer"
            ARQWorker[ARQ Worker Process]
            InteractiveQueue[arq:interactive<br/>Task Queue]
            SubagentQueue[arq:subagent<br/>Subagent Queue]
            CronJobs[Cron Jobs<br/>Scheduled Tasks]
        end
 
        subgraph "Tool Ecosystem"
            StaticTools[Static Tools<br/>13 built-in]
            MCPTools[MCP Dynamic Tools<br/>Runtime loaded]
            MCPManager[MCP Manager<br/>Multi-Server Client]
        end
 
        subgraph "Memory & Skills"
            VectorStore[Vector Memory Store<br/>SQLite + Embeddings]
            SkillsMgr[Skills Manager<br/>Matching + Injection]
            Compactor[Conversation Compactor<br/>LLM Summarization]
        end
 
        subgraph "Communication Layer"
            CommManager[Communication Manager]
            SlackProvider[Slack Provider]
            EmailProvider[Email Provider]
            UIProvider[UI Provider]
        end
 
        subgraph "Data Layer"
            Redis[(Redis<br/>Queues, Checkpoints, Cache)]
            SQLite[(SQLite<br/>Tasks, Users, Logs)]
            MemoryDB[(Memory DB<br/>Vector Store)]
            FileStore[File Storage<br/>Local / GCS / Drive]
        end
    end
 
    FastAPI --> InteractiveQueue
    InteractiveQueue --> ARQWorker
    SubagentQueue --> ARQWorker
    ARQWorker --> ParentGraph
    ARQWorker --> SubagentGraph
 
    ParentGraph --> Preprocess
    Preprocess --> AgentNode
    AgentNode --> Router
    Router --> ActionNode
    ActionNode --> AgentNode
 
    AgentNode -->|LLM Calls| OpenAI
    AgentNode -->|LLM Calls| Gemini
    AgentNode -->|LLM Calls| Ollama
 
    ActionNode --> StaticTools
    ActionNode --> MCPTools
    MCPManager --> MCPTools
 
    AgentNode --> VectorStore
    Preprocess --> SkillsMgr
    AgentNode --> Compactor
 
    CommManager --> SlackProvider
    CommManager --> EmailProvider
    CommManager --> UIProvider
    SlackProvider --> Slack
    EmailProvider --> Email
    UIProvider --> UI
 
    ARQWorker --> Redis
    ARQWorker --> SQLite
    VectorStore --> MemoryDB
    ActionNode --> FileStore
 
    classDef api fill:#4A90E2,stroke:#2E5C8A,stroke-width:2px,color:#fff
    classDef agent fill:#7B68EE,stroke:#4B0082,stroke-width:2px,color:#fff
    classDef worker fill:#FF8C00,stroke:#CC7000,stroke-width:2px,color:#fff
    classDef tool fill:#50C878,stroke:#2F7C4F,stroke-width:2px,color:#fff
    classDef data fill:#FFA07A,stroke:#CD5C5C,stroke-width:2px,color:#fff
    classDef external fill:#DDA0DD,stroke:#9370DB,stroke-width:2px,color:#fff
    classDef comm fill:#FF6B6B,stroke:#C92A2A,stroke-width:2px,color:#fff
 
    class FastAPI api
    class ParentGraph,SubagentGraph,Preprocess,AgentNode,ActionNode,Router agent
    class ARQWorker,InteractiveQueue,SubagentQueue,CronJobs worker
    class StaticTools,MCPTools,MCPManager tool
    class Redis,SQLite,MemoryDB,FileStore data
    class Ollama,OpenAI,Gemini,Slack,Email,UI external
    class CommManager,SlackProvider,EmailProvider,UIProvider comm

Component Summary

Component	Technology	Purpose
API Server	FastAPI (Python)	REST API, request routing, Slack event handling
ARQ Worker	ARQ (async Redis queue)	Background agent execution, subagent processing, cron jobs
Redis	Redis or Upstash Redis	Task queues, LangGraph checkpoints, caching
SQLite	SQLite with WAL mode	Tasks, users, logs, outputs, notifications, channels
Memory DB	SQLite (separate file)	Vector memory store for semantic search
File Storage	Local / GCS / Google Drive	User file uploads, sandbox for file tools, skill scripts
Frontend	Next.js 15, React 19	Web UI with 5-second polling
MCP Manager	Model Context Protocol	Dynamic tool loading from external MCP servers
Communication Manager	Channel abstraction	Routes notifications to Slack, Email, or Web UI

Agent Execution Flow

Otto's core execution engine is a LangGraph StateGraph with three nodes and one conditional edge. The agent follows a ReAct (Reason + Act) loop.

Parent Agent Graph

Entry
  |
  v
[preprocess_intent]  --->  [agent]  <---->  [action]
                              |                  |
                              |   (conditional)  |
                              v                  |
                             END <---------------+

Nodes:

Node	Function	Purpose
`preprocess_intent`	`preprocess_intent_node`	Match skills and (optionally) semantically filter tools before the first LLM call
`agent`	`call_model`	Assemble system prompt, inject context, invoke LLM with bound tools
`action`	`tool_node_with_logging`	Execute tool calls, truncate results, handle interrupts

Conditional edge (should_call_tool_or_pause):

Condition	Route
LLM response contains tool calls	`action`
LLM response contains text, no tool calls	`END` (task complete)
Hard iteration limit reached (10)	`END` (forced completion)
Task not found in DB	`END`

Subagent Graph

Subagents use a simplified graph with no preprocessing node:

Entry
  |
  v
[agent]  <---->  [action]
   |                |
   v                |
  END <-------------+

Key differences from parent:

Property	Parent Agent	Subagent
Preprocessing	Yes (skill + tool matching)	No
Soft iteration limit	8	12
Hard iteration limit	10	15
LangGraph recursion limit	25	35
Step timeout	300s (5 min)	600s (10 min)
`ask_human` tool	Available	Blocked
`dispatch_subagent` tool	Available	Blocked
MCP session	Shared (parent's)	Own fresh session
Token budget	Unlimited	Configurable via `token_budget`

AgentState Schema

Field	Type	Purpose
`messages`	`Sequence[BaseMessage]`	Append-only conversation history
`db`	`DatabaseManager`	Database handle
`task_id`	`str`	Current task ID
`user_id`	`str`	Task creator / requester
`input`	`str`	Original user request text
`mcp_manager`	`MCPManager`	MCP tool server manager
`communication_manager`	`CommunicationManager`	Notification routing
`skills_manager`	`SkillsManager`	Skill matching and context injection
`semantically_matched_tools`	`List[Any]`	Tools matched by intent preprocessing
`relevant_skills`	`List[Any]`	Skills matched for prompt injection
`intent_steps`	`List[str]`	Extracted work steps from user query
`project_id`	`str`	Slack channel / project for context
`iteration_count`	`int`	Loop counter for runaway prevention
`attachments`	`List[str]`	File paths to attach to completion notification

Data Flow

Task Lifecycle (End-to-End)

1. User submits task (API POST /chat, Slack @mention, or webhook)
     |
2. API creates task record (status: pending)
   Enqueues `process_agent_task` to arq:interactive
     |
3. ARQ worker picks up job
   Lazy-initializes heavy components (LLM, MCP, embeddings) if needed
     |
4. run_agent_in_background():
   - Generate checkpoint thread_id ("task-{task_id}")
   - Save checkpoint_thread_id to DB
   - Set task status: running
   - Create initial HumanMessage from user input
     |
5. Graph execution:
   |
   |-- [preprocess_intent_node]
   |     Match skills (always)
   |     Extract steps + match tools (if ENABLE_INTENT_PREPROCESSING=true)
   |
   |-- [call_model] <-----------+
   |     Assemble system prompt  |
   |     Inject context blocks:  |
   |       - Project context     |
   |       - Skills instructions |
   |       - Team info           |
   |       - Vector memories     |
   |       - Persona (SOUL.md)   |
   |     Compact messages if >12 |
   |     Invoke LLM with tools   |
   |                             |
   |-- [should_call_tool_or_pause]
   |     |                       |
   |     |-- tool_calls? -----> [tool_node_with_logging]
   |     |                       Execute tools
   |     |                       Truncate large results
   |     |                       Detect file attachments
   |     |                       Handle ask_human interrupt
   |     |                       Auto-park on subagent dispatch
   |     |                       Return tool results --------+
   |     |
   |     |-- no tool_calls + content --> complete_task_with_result()
   |     |     Save TaskOutput
   |     |     Set status: completed, progress: 100
   |     |     Auto-save memory
   |     |     Send completion notification (with attachments)
   |     |     Clean up checkpoint
   |     |     --> END
   |     |
   |     |-- hard iteration limit --> force complete
   |
6. Interrupt handling (if applicable):
   |
   |-- ask_human: status → waiting_input
   |     Human responds via API or Slack
   |     resume_agent_task enqueued
   |     Graph resumes from Redis checkpoint
   |
   |-- subagent dispatch: status → waiting_subagents
         Subagent(s) run in parallel on arq:interactive
         Each has own graph, MCP session, iteration budget
         On completion: _check_and_resume_parent()
         All siblings terminal? Collect results, resume parent

Communication Flow

Agent completes task or calls ask_human
     |
     v
CommunicationManager.send_notification()
     |
     |--> SlackProvider  --> Slack API  --> User's Slack DM / channel
     |--> EmailProvider  --> SMTP       --> User's email
     |--> UIProvider     --> DB         --> Frontend polls /users/{id}/notifications

File Upload Flow

User uploads file via POST /upload-file
     |
     v
Validate extension (allowlist)
Check storage quota (STORAGE_QUOTA_GB)
     |
     v
Save to user sandbox: {LOCAL_STORAGE_PATH}/{user_id}/{filename}
     |
     v
Agent accesses via read_file / list_directory tools (sandboxed)

Queue Architecture

Otto uses ARQ (async Redis queues) for all background processing.

Queues

Queue Name	Purpose	Producers	Consumers
`arq:interactive`	Primary task and subagent processing	API server (POST /chat, /human_response, /slack/dm_response, webhooks)	ARQ worker
`arq:subagent`	Subagent task processing (can share workers with interactive)	`dispatch_subagent` tool	ARQ worker

Job Types

Job Function	Queue	Triggered By	Description
`process_agent_task`	`arq:interactive`	`POST /chat`, Slack events, webhooks	Run the parent agent graph for a new task
`resume_agent_task`	`arq:interactive`	`POST /human_response`, subagent completion	Resume a paused agent from its Redis checkpoint
`process_subagent_task_v2`	`arq:interactive`	`dispatch_subagent` tool	Run a subagent graph for a subtask

Cron Jobs

The ARQ worker also runs scheduled tasks:

Job	Schedule	Purpose
Email monitor	Every `IMAP_CHECK` seconds (default 60)	Check IMAP inbox for new emails, create tasks
Scheduler service	Every 60 seconds	Execute due scheduled/recurring tasks
Skills refresh	Every `SKILLS_REFRESH_INTERVAL` seconds (default 60)	Rescan skill directory for changes

Worker Initialization

The ARQ worker performs lazy initialization of heavy components on first use:

LLM model -- configured via USE_GEMINI / USE_OLLAMA / OpenAI default
MCP Manager -- connects to all configured MCP servers
Embedding model -- matches the active LLM provider
Skills Manager -- loads and indexes skills from filesystem
Compiled LangGraph -- parent agent graph with Redis checkpointer

These are cached for the worker's lifetime and reused across jobs.

Checkpoint and Resume Mechanism

Otto uses LangGraph's AsyncRedisSaver to persist full agent state to Redis. This enables pausing and resuming the agent across process boundaries.

Checkpoint Lifecycle

Task Created
  |
  v
run_agent_in_background()
  |-- thread_id = "task-{task_id}"
  |-- Save checkpoint_thread_id to task record
  |-- agent.astream() with AsyncRedisSaver checkpointer
  |
  |-- Every graph node execution: full AgentState serialized to Redis
  |
  |-- [GraphInterrupt: ask_human or subagent park]
  |     |
  |     v
  |   Task paused
  |   Status: waiting_input or waiting_subagents
  |   last_tool_call_id saved to DB
  |   Checkpoint preserved in Redis
  |     |
  |     v  (human responds or subagents complete)
  |   resume_agent_in_background()
  |     |-- Read checkpoint_thread_id from DB
  |     |-- Read last_tool_call_id from DB
  |     |-- Construct ToolMessage(s) with response
  |     |-- agent.astream() with same thread_id
  |     |-- LangGraph restores full state from Redis
  |     |-- ReAct loop continues
  |
  v
Task completed
  |-- cleanup_task_checkpoint(): delete all Redis keys for thread
  |-- Clear checkpoint_thread_id from DB

Redis Key Pattern

Checkpoint keys follow the pattern: checkpoints:task-{task_id}:*

Redis Connection Pool

Setting	Value
`decode_responses`	`False` (binary for serialization)
`health_check_interval`	30s
`max_connections`	10
`retry_on_timeout`	`True`
`socket_keepalive`	`True`

Resume Scenarios

Trigger	How It Works
Human response	`POST /human_response` enqueues `resume_agent_task`. `last_tool_call_id` is a single string. One `ToolMessage` is constructed with the human's answer.
Subagent completion	`_check_and_resume_parent()` runs after each subagent finishes. When all siblings are terminal, acquires a Redis `SETNX` lock (`resume_lock:{parent_task_id}`, 60s TTL) to prevent double-resume. Collects results from all subtasks. `last_tool_call_id` is a JSON array of tool call IDs. Multiple `ToolMessage`s are constructed.

Subagent System

The parent agent can delegate focused subtasks to independent subagent workers via the dispatch_subagent tool.

Dispatch Flow

Parent agent calls dispatch_subagent(task_description, context, agent_type, max_iterations)
  |
  v
New task record created (task_type: "subtask", parent_task_id: parent)
  |
  v
process_subagent_task_v2 enqueued to arq:interactive
  |
  v
Subtask ID appended to _pending_dispatch_ids (module-level list)
  |
  v
After all tools execute, tool_node_with_logging calls take_pending_dispatches()
  |
  |--> Fast path: All subtasks already terminal? Collect results inline. Parent continues.
  |
  |--> Slow path (common): Update parent status to waiting_subagents.
       Save dispatch tool_call_ids as JSON array in last_tool_call_id.
       Raise GraphInterrupt. Parent paused.

Agent Types

Type	Description	Excluded Tools	Default Iterations
`general`	General-purpose subagent	`ask_human`, `dispatch_subagent`	10
`research`	Research-focused, read-only	`ask_human`, `dispatch_subagent`, `write_file`, `send_email`	10
`tool-specialist`	Chained tool workflows	`ask_human`, `dispatch_subagent`, `send_email`	10

Token and Context Management

Conversation Compaction

When len(messages) > COMPACTION_THRESHOLD (default 12):

Split messages into "older" and "recent" at len - MAX_RECENT_MESSAGES (default 8).
Adjust split point to never break a tool-call sequence (AIMessage with tool_calls stays with its ToolMessages).
Summarize older messages via LLM call (preserves: original intent, key decisions, tool results, approach, errors).
Replace older messages with a single SystemMessage containing the summary.
Cache summary by task_id for incremental updates on subsequent compactions.
Fallback: if summarization fails, trim to most recent messages and strip leading ToolMessages.

Tool Result Truncation

Tier	Size	Behavior
Small	<= 4,000 chars	Returned verbatim
Medium	4,001 -- 12,000 chars	Truncated to 4,000 chars with metadata hint
Large	> 12,000 chars	First 20 lines only, with hint to use `read_file` with offset/limit

Parallel Tool Call Limiting

MAX_PARALLEL_TOOL_CALLS (default 5) caps how many tools execute in a single turn. Excess calls receive synthetic "skipped -- too many simultaneous requests" responses.

Iteration Limits

Agent	Soft Limit	Hard Limit	Soft Limit Behavior	Hard Limit Behavior
Parent	8	10	Wrap-up warning injected into system prompt	Force task completion with last substantive content
Subagent	12	15	Wrap-up warning injected	Forced completion

Token Tracker (Subagents Only)

At 80% of budget: warning injected into system prompt.
At 100%: exhausted flag set, router forces completion.
Budget None: unlimited (no warnings or limits).

Skills are markdown instruction files that get injected into the agent's system prompt when they match the user's query. Skills can also declare required tools (e.g., specific MCP tools) that are force-included.

Matching Process

                              +--------------------+
                              |   USER REQUEST     |
                              |  "Create a gamma   |
                              |   presentation"    |
                              +---------+----------+
                                        |
                                        v
                         +-----------------------------+
                         | PREPROCESS_INTENT_NODE      |
                         |                             |
                         |  1. Extract Work Steps      |
                         |     (LLM structured output) |
                         |     "research AI topic"     |
                         |     "create presentation"   |
                         |                             |
                         |  2. Semantic Tool Matching   |
                         |     (if ENABLE_INTENT_      |
                         |      PREPROCESSING=true)    |
                         |     Embed steps + tools     |
                         |     Cosine similarity       |
                         |     Top-k per step          |
                         |                             |
                         |  3. Skill Matching (always) |
                         |     a. Keyword boost:       |
                         |        tag in query? +0.1   |
                         |     b. Semantic matching:   |
                         |        embed description    |
                         |        cosine similarity    |
                         |     c. Threshold: 0.35      |
                         +---------+-------------------+
                                   |
                                   v
                         +-----------------------------+
                         | AGENT NODE (call_model)     |
                         |                             |
                         | 4. Skill Tool Resolution    |
                         |    For each matched skill:  |
                         |      allowed_tools list     |
                         |      Exact match first      |
                         |      Then partial match     |
                         |      Case insensitive       |
                         |                             |
                         | 5. Union Logic              |
                         |    Skills-required tools    |
                         |    (guaranteed) + semantic  |
                         |    matches (deduplicated)   |
                         |                             |
                         | 6. Prompt Injection         |
                         |    Skill instructions       |
                         |    appended to system prompt|
                         |                             |
                         | 7. LLM Invocation           |
                         |    bind_tools(final set)    |
                         +-----------------------------+

Skill Matching Algorithm

Step 1: Keyword Boost (first priority)

For each skill's tags: if tag appears in query.lower(), boost score to threshold + 0.1.

Step 2: Semantic Matching

Embed skill description + tags using the configured embedding model.
Compute cosine similarity against query embedding.

Step 3: Threshold Check

Default threshold: 0.35 (OTTO_SKILL_THRESHOLD).
Boosted skills exceed threshold automatically.
Maximum skills returned: 3 (OTTO_MAX_SKILLS).

Tool Matching Algorithm (Intent Preprocessing)

Only runs when ENABLE_INTENT_PREPROCESSING=true.

Extract 3-5 work steps from user query (1 LLM call with structured output).
Embed all available tool descriptions (static + MCP).
For each step, compute cosine similarity against all tool embeddings.
Return top-3 tools per step, deduplicated.

Skill-to-Tool Resolution

When a matched skill declares allowed_tools, those tools are resolved against all available tools:

Resolution Step	Example
1. Exact match	`"wait"` matches tool named `"wait"`
2. Partial match (case-insensitive)	`"gamma_create"` matches `"mcp__gamma__gamma_create_generation"`

Final Tool Selection

Skills-Required Tools (guaranteed inclusion)
  + Semantic Matched Tools (added if not duplicate)
  + Static Tools (always included)
  = Final available tools bound to LLM

Static Tool Inventory

These tools are always available to the parent agent:

Tool	Module	Purpose
`ask_human`	`tools/human_tool.py`	Send questions/messages to humans. `wait_for_answer=True` pauses via GraphInterrupt
`send_email`	`tools/email_tool.py`	Send emails via SMTP
`list_directory`	`tools/file_tools.py`	List files in sandbox directory
`read_file`	`tools/file_tools.py`	Read file contents with offset/limit support
`write_file`	`tools/file_tools.py`	Write content to sandbox file
`wait`	`tools/file_tools.py`	Sleep for N seconds (for async MCP workflows)
`convert_markdown_to_docx`	`tools/file_tools.py`	Convert markdown to .docx
`attach_file`	`tools/file_tools.py`	Mark file for attachment to completion message
`read_document`	`tools/document_tools.py`	Read uploaded documents
`memory_search`	`tools/memory_tools.py`	Search the vector memory store
`memory_save`	`tools/memory_tools.py`	Save a memory to the vector store
`dispatch_subagent`	`tools/subagent_tools.py`	Dispatch a subtask to a background subagent
`run_skill_script`	`tools/script_tool.py`	Execute a skill's script with configurable timeout

LLM Provider Configuration

Provider selection checked in order of priority:

Priority	Env Var	Provider	Default Model	Temperature
1	`USE_GEMINI=true`	Google Gemini (`ChatGoogleGenerativeAI`)	`gemini-3.1-pro-preview-customtools`	1.0
2	`USE_OLLAMA=true`	Ollama (`ChatOllama`)	`gpt-oss:latest`	0
3	Default	OpenAI (`ChatOpenAI`)	`gpt-5-mini`	1.0

Embedding provider matches the active LLM provider:

LLM Provider	Embedding Model	Implementation
Gemini	`models/gemini-embedding-001`	`GoogleGenerativeAIEmbeddings`
Ollama	`nomic-embed-text:latest`	Custom `OllamaEmbeddings`
OpenAI	`text-embedding-3-small`	`OpenAIEmbeddings`

Persona System

Otto supports per-project personality customization via .otto/SOUL.md files.

SOUL.md Format

---
name: Otto
role: Marketing Assistant
specialization: Digital Marketing and Content Creation
tone: Professional but friendly
---
 
Additional instructions and personality details as free-form markdown.

The YAML frontmatter is optional. The body becomes the instructions field.

Loading Behavior

load_persona(project_path) checks {project_path}/.otto/SOUL.md.
If found: specialization overrides SPECIALIZATION env var; instructions overrides ADDITIONAL_INSTRUCTIONS.
Results are cached per project path.

System Prompt Assembly

The agent node assembles the system prompt by combining these layers (in order):

Layer	Source	Always Present
Base prompt	`MODEL_PROMPT` template with `{specialization}`, `{input}`, `{requester_info}`, `{team_info}`	Yes
Additional instructions	`ADDITIONAL_INSTRUCTIONS` env var or SOUL.md	If configured
Project context	Slack channel summary + pending messages	If `project_id` set
Skills context	Matched skill instructions	If skills matched
Team context	Channel member roles and responsibilities	If project has channel
Memory context	Top 3 relevant memories from vector store	If memories found
Wrap-up warning	Iteration count and remaining budget	At soft limit

Architecture Reference

On this page