Architecture Reference
> Technical architecture of the Otto AI agent system: components, data flows, queues, checkpointing, and the skills matching pipeline.
Technical architecture of the Otto AI agent system: components, data flows, queues, checkpointing, and the skills matching pipeline.
System Components
Component Summary
| Component | Technology | Purpose |
|---|---|---|
| API Server | FastAPI (Python) | REST API, request routing, Slack event handling |
| ARQ Worker | ARQ (async Redis queue) | Background agent execution, subagent processing, cron jobs |
| Redis | Redis or Upstash Redis | Task queues, LangGraph checkpoints, caching |
| SQLite | SQLite with WAL mode | Tasks, users, logs, outputs, notifications, channels |
| Memory DB | SQLite (separate file) | Vector memory store for semantic search |
| File Storage | Local / GCS / Google Drive | User file uploads, sandbox for file tools, skill scripts |
| Frontend | Next.js 15, React 19 | Web UI with 5-second polling |
| MCP Manager | Model Context Protocol | Dynamic tool loading from external MCP servers |
| Communication Manager | Channel abstraction | Routes notifications to Slack, Email, or Web UI |
Agent Execution Flow
Otto's core execution engine is a LangGraph StateGraph with three nodes and one conditional edge. The agent follows a ReAct (Reason + Act) loop.
Parent Agent Graph
Nodes:
| Node | Function | Purpose |
|---|---|---|
preprocess_intent | preprocess_intent_node | Match skills and (optionally) semantically filter tools before the first LLM call |
agent | call_model | Assemble system prompt, inject context, invoke LLM with bound tools |
action | tool_node_with_logging | Execute tool calls, truncate results, handle interrupts |
Conditional edge (should_call_tool_or_pause):
| Condition | Route |
|---|---|
| LLM response contains tool calls | action |
| LLM response contains text, no tool calls | END (task complete) |
| Hard iteration limit reached (10) | END (forced completion) |
| Task not found in DB | END |
Subagent Graph
Subagents use a simplified graph with no preprocessing node:
Key differences from parent:
| Property | Parent Agent | Subagent |
|---|---|---|
| Preprocessing | Yes (skill + tool matching) | No |
| Soft iteration limit | 8 | 12 |
| Hard iteration limit | 10 | 15 |
| LangGraph recursion limit | 25 | 35 |
| Step timeout | 300s (5 min) | 600s (10 min) |
ask_human tool | Available | Blocked |
dispatch_subagent tool | Available | Blocked |
| MCP session | Shared (parent's) | Own fresh session |
| Token budget | Unlimited | Configurable via token_budget |
AgentState Schema
| Field | Type | Purpose |
|---|---|---|
messages | Sequence[BaseMessage] | Append-only conversation history |
db | DatabaseManager | Database handle |
task_id | str | Current task ID |
user_id | str | Task creator / requester |
input | str | Original user request text |
mcp_manager | MCPManager | MCP tool server manager |
communication_manager | CommunicationManager | Notification routing |
skills_manager | SkillsManager | Skill matching and context injection |
semantically_matched_tools | List[Any] | Tools matched by intent preprocessing |
relevant_skills | List[Any] | Skills matched for prompt injection |
intent_steps | List[str] | Extracted work steps from user query |
project_id | str | Slack channel / project for context |
iteration_count | int | Loop counter for runaway prevention |
attachments | List[str] | File paths to attach to completion notification |
Data Flow
Task Lifecycle (End-to-End)
Communication Flow
File Upload Flow
Queue Architecture
Otto uses ARQ (async Redis queues) for all background processing.
Queues
| Queue Name | Purpose | Producers | Consumers |
|---|---|---|---|
arq:interactive | Primary task and subagent processing | API server (POST /chat, /human_response, /slack/dm_response, webhooks) | ARQ worker |
arq:subagent | Subagent task processing (can share workers with interactive) | dispatch_subagent tool | ARQ worker |
Job Types
| Job Function | Queue | Triggered By | Description |
|---|---|---|---|
process_agent_task | arq:interactive | POST /chat, Slack events, webhooks | Run the parent agent graph for a new task |
resume_agent_task | arq:interactive | POST /human_response, subagent completion | Resume a paused agent from its Redis checkpoint |
process_subagent_task_v2 | arq:interactive | dispatch_subagent tool | Run a subagent graph for a subtask |
Cron Jobs
The ARQ worker also runs scheduled tasks:
| Job | Schedule | Purpose |
|---|---|---|
| Email monitor | Every IMAP_CHECK seconds (default 60) | Check IMAP inbox for new emails, create tasks |
| Scheduler service | Every 60 seconds | Execute due scheduled/recurring tasks |
| Skills refresh | Every SKILLS_REFRESH_INTERVAL seconds (default 60) | Rescan skill directory for changes |
Worker Initialization
The ARQ worker performs lazy initialization of heavy components on first use:
- LLM model -- configured via
USE_GEMINI/USE_OLLAMA/ OpenAI default - MCP Manager -- connects to all configured MCP servers
- Embedding model -- matches the active LLM provider
- Skills Manager -- loads and indexes skills from filesystem
- Compiled LangGraph -- parent agent graph with Redis checkpointer
These are cached for the worker's lifetime and reused across jobs.
Checkpoint and Resume Mechanism
Otto uses LangGraph's AsyncRedisSaver to persist full agent state to Redis. This enables pausing and resuming the agent across process boundaries.
Checkpoint Lifecycle
Redis Key Pattern
Checkpoint keys follow the pattern: checkpoints:task-{task_id}:*
Redis Connection Pool
| Setting | Value |
|---|---|
decode_responses | False (binary for serialization) |
health_check_interval | 30s |
max_connections | 10 |
retry_on_timeout | True |
socket_keepalive | True |
Resume Scenarios
| Trigger | How It Works |
|---|---|
| Human response | POST /human_response enqueues resume_agent_task. last_tool_call_id is a single string. One ToolMessage is constructed with the human's answer. |
| Subagent completion | _check_and_resume_parent() runs after each subagent finishes. When all siblings are terminal, acquires a Redis SETNX lock (resume_lock:{parent_task_id}, 60s TTL) to prevent double-resume. Collects results from all subtasks. last_tool_call_id is a JSON array of tool call IDs. Multiple ToolMessages are constructed. |
Subagent System
The parent agent can delegate focused subtasks to independent subagent workers via the dispatch_subagent tool.
Dispatch Flow
Agent Types
| Type | Description | Excluded Tools | Default Iterations |
|---|---|---|---|
general | General-purpose subagent | ask_human, dispatch_subagent | 10 |
research | Research-focused, read-only | ask_human, dispatch_subagent, write_file, send_email | 10 |
tool-specialist | Chained tool workflows | ask_human, dispatch_subagent, send_email | 10 |
Token and Context Management
Conversation Compaction
When len(messages) > COMPACTION_THRESHOLD (default 12):
- Split messages into "older" and "recent" at
len - MAX_RECENT_MESSAGES(default 8). - Adjust split point to never break a tool-call sequence (AIMessage with
tool_callsstays with its ToolMessages). - Summarize older messages via LLM call (preserves: original intent, key decisions, tool results, approach, errors).
- Replace older messages with a single
SystemMessagecontaining the summary. - Cache summary by
task_idfor incremental updates on subsequent compactions. - Fallback: if summarization fails, trim to most recent messages and strip leading ToolMessages.
Tool Result Truncation
| Tier | Size | Behavior |
|---|---|---|
| Small | <= 4,000 chars | Returned verbatim |
| Medium | 4,001 -- 12,000 chars | Truncated to 4,000 chars with metadata hint |
| Large | > 12,000 chars | First 20 lines only, with hint to use read_file with offset/limit |
Parallel Tool Call Limiting
MAX_PARALLEL_TOOL_CALLS (default 5) caps how many tools execute in a single turn. Excess calls receive synthetic "skipped -- too many simultaneous requests" responses.
Iteration Limits
| Agent | Soft Limit | Hard Limit | Soft Limit Behavior | Hard Limit Behavior |
|---|---|---|---|---|
| Parent | 8 | 10 | Wrap-up warning injected into system prompt | Force task completion with last substantive content |
| Subagent | 12 | 15 | Wrap-up warning injected | Forced completion |
Token Tracker (Subagents Only)
- At 80% of budget: warning injected into system prompt.
- At 100%:
exhaustedflag set, router forces completion. - Budget
None: unlimited (no warnings or limits).
Skills Matching Pipeline
Skills are markdown instruction files that get injected into the agent's system prompt when they match the user's query. Skills can also declare required tools (e.g., specific MCP tools) that are force-included.
Matching Process
Skill Matching Algorithm
Step 1: Keyword Boost (first priority)
- For each skill's tags: if tag appears in
query.lower(), boost score tothreshold + 0.1.
Step 2: Semantic Matching
- Embed skill description + tags using the configured embedding model.
- Compute cosine similarity against query embedding.
Step 3: Threshold Check
- Default threshold:
0.35(OTTO_SKILL_THRESHOLD). - Boosted skills exceed threshold automatically.
- Maximum skills returned:
3(OTTO_MAX_SKILLS).
Tool Matching Algorithm (Intent Preprocessing)
Only runs when ENABLE_INTENT_PREPROCESSING=true.
- Extract 3-5 work steps from user query (1 LLM call with structured output).
- Embed all available tool descriptions (static + MCP).
- For each step, compute cosine similarity against all tool embeddings.
- Return top-3 tools per step, deduplicated.
Skill-to-Tool Resolution
When a matched skill declares allowed_tools, those tools are resolved against all available tools:
| Resolution Step | Example |
|---|---|
| 1. Exact match | "wait" matches tool named "wait" |
| 2. Partial match (case-insensitive) | "gamma_create" matches "mcp__gamma__gamma_create_generation" |
Final Tool Selection
Static Tool Inventory
These tools are always available to the parent agent:
| Tool | Module | Purpose |
|---|---|---|
ask_human | tools/human_tool.py | Send questions/messages to humans. wait_for_answer=True pauses via GraphInterrupt |
send_email | tools/email_tool.py | Send emails via SMTP |
list_directory | tools/file_tools.py | List files in sandbox directory |
read_file | tools/file_tools.py | Read file contents with offset/limit support |
write_file | tools/file_tools.py | Write content to sandbox file |
wait | tools/file_tools.py | Sleep for N seconds (for async MCP workflows) |
convert_markdown_to_docx | tools/file_tools.py | Convert markdown to .docx |
attach_file | tools/file_tools.py | Mark file for attachment to completion message |
read_document | tools/document_tools.py | Read uploaded documents |
memory_search | tools/memory_tools.py | Search the vector memory store |
memory_save | tools/memory_tools.py | Save a memory to the vector store |
dispatch_subagent | tools/subagent_tools.py | Dispatch a subtask to a background subagent |
run_skill_script | tools/script_tool.py | Execute a skill's script with configurable timeout |
LLM Provider Configuration
Provider selection checked in order of priority:
| Priority | Env Var | Provider | Default Model | Temperature |
|---|---|---|---|---|
| 1 | USE_GEMINI=true | Google Gemini (ChatGoogleGenerativeAI) | gemini-3.1-pro-preview-customtools | 1.0 |
| 2 | USE_OLLAMA=true | Ollama (ChatOllama) | gpt-oss:latest | 0 |
| 3 | Default | OpenAI (ChatOpenAI) | gpt-5-mini | 1.0 |
Embedding provider matches the active LLM provider:
| LLM Provider | Embedding Model | Implementation |
|---|---|---|
| Gemini | models/gemini-embedding-001 | GoogleGenerativeAIEmbeddings |
| Ollama | nomic-embed-text:latest | Custom OllamaEmbeddings |
| OpenAI | text-embedding-3-small | OpenAIEmbeddings |
Persona System
Otto supports per-project personality customization via .otto/SOUL.md files.
SOUL.md Format
The YAML frontmatter is optional. The body becomes the instructions field.
Loading Behavior
load_persona(project_path)checks{project_path}/.otto/SOUL.md.- If found:
specializationoverridesSPECIALIZATIONenv var;instructionsoverridesADDITIONAL_INSTRUCTIONS. - Results are cached per project path.
System Prompt Assembly
The agent node assembles the system prompt by combining these layers (in order):
| Layer | Source | Always Present |
|---|---|---|
| Base prompt | MODEL_PROMPT template with {specialization}, {input}, {requester_info}, {team_info} | Yes |
| Additional instructions | ADDITIONAL_INSTRUCTIONS env var or SOUL.md | If configured |
| Project context | Slack channel summary + pending messages | If project_id set |
| Skills context | Matched skill instructions | If skills matched |
| Team context | Channel member roles and responsibilities | If project has channel |
| Memory context | Top 3 relevant memories from vector store | If memories found |
| Wrap-up warning | Iteration count and remaining budget | At soft limit |