Memory

Otto maintains a persistent memory system that allows it to accumulate knowledge across tasks and sessions. Rather than starting from scratch every time, Otto can recall past decisions, project con...

What Memory Is

Memory in Otto is a collection of text entries stored in a vector database. Each entry represents a piece of knowledge -- a completed task summary, a project decision, a user preference, or any information Otto determines is worth retaining.

Memories are stored as embedding vectors alongside their text content, enabling semantic search. When Otto needs to recall relevant information, it searches by meaning rather than exact keywords.

How Otto Saves Memories

Otto saves memories in two ways:

Automatic Saves

When Otto completes a task, it automatically saves a summary of the work to the memory store. This includes what was requested, what approach was taken, and what the outcome was. No user action is required.

Explicit Saves

During task execution, Otto can use the memory_save tool to store specific pieces of information it encounters. For example, if Otto discovers an important fact during research or receives a key decision from a team member, it can save that to memory with relevant tags for future retrieval.

memory_save(
  content="The company's fiscal year starts in April. Q1 = Apr-Jun, Q2 = Jul-Sep, Q3 = Oct-Dec, Q4 = Jan-Mar.",
  tags=["finance", "fiscal-year", "company-policy"]
)

Tags improve retrieval accuracy. When a future search query contains words that match tags, those memories receive a relevance boost in the results.

How Otto Recalls Memories

Otto searches memory using the memory_search tool with a natural language query. The search works through semantic similarity -- the query is converted to an embedding vector and compared against all stored memories using cosine similarity.

Results are ranked by:

Semantic similarity -- how closely the memory's meaning matches the query.
Tag boost -- memories with tags that appear in the search query receive a +0.1 similarity bonus.
Project scope -- when a project ID is provided, results are filtered to that project's memories.

The agent's system prompt is also enriched with the top 3 most relevant memories from the store at the start of each task, providing ambient context without the agent needing to explicitly search.

Project-Scoped Memory

Memories can be scoped to specific projects. When Otto is working within a project context (for example, a Slack channel), memories are tagged with the project ID and searches can be filtered accordingly.

This means Otto maintains separate knowledge bases for different teams or workstreams. A marketing project's memories do not pollute results when Otto is working on an engineering task.

How Project Context Is Built

When Otto is added to a Slack channel, it runs a background onboarding process:

Channel message history is fetched and formatted.
An LLM summarizes the history into a project-level summary.
The summary is stored and used as context for future tasks in that channel.
As new messages come in, they are buffered and periodically compacted into updated project memories.

Conversation Compaction

Separate from the persistent memory system, Otto uses conversation compaction to manage context within a single task's execution. As a task progresses and the conversation history grows, older messages are summarized rather than dropped. For additional context on how compaction fits into the agent's execution loop, see AI Agent -- Context Management.

How It Works

Trigger. When the message history exceeds 12 messages, compaction activates.
Split. Messages are divided into "older" and "recent" groups. The 8 most recent messages are kept verbatim. The split point is adjusted to avoid breaking tool-call sequences -- an AI message that requested a tool call stays together with the tool's response.
Summarize. The older messages are summarized by a lightweight LLM call. The summary preserves the original intent, key decisions, tool results, the current approach, and any error-handling context.
Replace. The older messages are replaced with a single system message containing the summary. Subsequent compaction rounds build incrementally on the cached summary.
Fallback. If summarization fails for any reason, a safe trim fallback keeps the most recent messages and strips any orphaned tool responses from the beginning.

This approach preserves significantly more context than naive message truncation. The agent retains awareness of early decisions and findings even as the conversation grows long.

Pending Messages Buffer

For Slack-integrated projects, messages sent to a channel between tasks are captured in a Redis-backed buffer. When Otto starts a new task in that project, pending messages are included in the agent's context so it is aware of recent team communication.

The buffer uses a FIFO queue per project with a 30-day TTL. Messages are consumed when they are included in a task's context or when they are compacted into a project summary.

Vector Store Details

Otto's memory store uses SQLite with vector embeddings for persistence. This is a lightweight, file-based approach that requires no external database server.

Key characteristics:

Embedding dimensions: 768 (default)
Similarity metric: Cosine similarity via the simsimd library (SIMD-accelerated)
Embedding providers: Matches the configured LLM provider
- Google Gemini: gemini-embedding-001
- Ollama: nomic-embed-text
- OpenAI: text-embedding-3-small
Storage: Single SQLite file at the path configured by OTTO_MEMORY_DB

Configuration

Variable	Default	Description
`OTTO_MEMORY_DB`	`<LOCAL_STORAGE_PATH>/otto_memory.db`	File path for the SQLite memory database
`COMPACTION_THRESHOLD`	`12`	Message count in a task before conversation compaction activates
`MAX_RECENT_MESSAGES`	`8`	Number of recent messages kept verbatim during compaction

The memory database is created automatically on first use. No manual setup is required beyond setting the file path.

The embedding provider is determined by your LLM configuration -- if you are using Gemini for the agent, Gemini embeddings are used for memory. The same applies for OpenAI and Ollama.

On this page