Eight providers compared — hosting, free tiers, benchmarks, architecture, and a concrete pick for each major use case.
🕐 Last verified April 2026Scroll right on mobile. "Best for" and pricing are the most decision-relevant columns.
| Provider | Best for | Free tier | Paid from | Hosting | License | Stars | Tools | Benchmark |
|---|---|---|---|---|---|---|---|---|
| Mem0 | All-around default | 10K adds / 1K recalls | $19/mo Starter $249 Pro (Graph) |
Cloud + self-host | Apache 2.0 | 51.4K ⭐ | 3 | LongMemEval-S 67.6% |
| Hindsight | Best benchmarks, coding | Full local, free | $15/M retain $0.75/M recall · $3/M reflect |
Local + cloud | MIT | 2.4K ⭐ | 4 | LongMemEval 91.4–94.6%BEAM 64.1% · LoCoMo 89.6% |
| ByteRover | Multi-hop / temporal | Local CLI free | $19/mo Pro $35/user/mo Team |
Local + cloud | Partial OSS | 4.2K ⭐ | 3 | LoCoMo 92.2%single-hop 95.4% · temporal 94.4% |
| Supermemory | Search-heavy / RAG | 1M tokens · 10K searches | $19/mo Pro $399/mo Scale |
Cloud only | Proprietary | ~18K ⭐ | 4 | LongMemEval 81.6%with GPT-4o |
| Holographic | Zero cost, privacy-first | Fully local, free | — | Local only | MIT | — | 2 | — |
| OpenViking | On-prem / air-gapped | Self-host, free | — | Self-host only | AGPL-3.0 | ~17.9K ⭐ | 5 | — |
| Honcho | User modeling | $100 free credits | $2/M ingested $0.001+/query |
Cloud + self-host | AGPL-3.0 | 414 ⭐ | 4 | — |
| RetainDB | Structured schema recall | None | $20/mo | Cloud only | Proprietary | — | 5 | — |
Architecture, pricing, tools exposed, and what each provider actually does well.
Hybrid triple-store: vector + key-value + knowledge graph. An LLM pass extracts structured facts from conversations and stores them across all three layers. Most widely integrated option — first-class Python, TypeScript, and OpenAI-compatible SDKs.
TEMPR architecture: four parallel retrieval strategies — temporal, entity, metadata, and BM25 for exact keyword matches. Strong at structured technical recall: port numbers, error codes, service names, deployment configs. Three-stage pipeline: retain (ingest) → recall (retrieve) → reflect (synthesize across stored knowledge).
Leads the LoCoMo benchmark — specifically designed for multi-hop and temporal reasoning across long conversation histories. Local CLI is open source and free. Cloud sync for cross-device persistence costs $19/mo. Built with coding agents as the primary use case.
Optimized for search-heavy workloads. Ingests content from many sources and surfaces it via semantic search. Generous free tier (1M tokens). The star count reflects the consumer app frontend — the core memory engine is closed source. Strong LongMemEval score but behind Hindsight.
Uses Holographic Reduced Representations (HRR) algebra on a local SQLite + FTS5 store. Zero external dependencies — no API keys, no network calls, no Docker. Memory lives in a single file in your Hermes home directory. The most private option by definition. The fact_store tool exposes 9 actions: add, search, probe, related, reason, contradict, update, remove, list.
Tiered context loading by resolution depth: L0 loads ~50-token abstracts, L1 loads ~500-token overviews, L2 loads full content on demand. Only the detail level needed for each query gets pushed into the context window — that's the mechanism behind the 80–90% token savings. Self-hosted only, AGPL. Requires Docker and an LLM provider for extraction.
Three specialized LLM agents — Deriver (extracts user preferences), Dialectic (surfaces them in context), Dreamer (synthesizes across sessions). The only provider focused on building a persistent user model ("dialect"), not just storing facts. Available as cloud or self-hosted via the AGPL repo.
Database-style memory with structured schema. Explicit control over what gets stored and how it's queried — more like a managed database than an LLM memory layer. No free tier. Domain availability was inconsistent at time of writing — verify before depending on it.
Ranked by fit for each scenario — not by partnership or popularity.
State across long sessions, multi-hop reasoning, exact technical recall (ports, configs, error codes).
Indexing a large, growing body of content and surfacing the relevant slice at query time.
Best option for most use cases when you don't have a strong constraint pushing you elsewhere.
You need memory working now with no billing setup, or your budget is zero.
Data cannot leave your infrastructure. No external API calls, no cloud.
The agent needs to learn who you are — your style, preferences, working patterns — and apply that across every session.
SSO, audit logs, SLA, on-prem support, no AGPL/copyleft risk in your product.
Add to your config.yaml — full docs at the link below each snippet.
Full config reference and advanced options at hermes-agent.nousresearch.com/docs/…/memory-providers
When a single dimension drives the decision.
Full config reference, advanced options, and provider-specific setup guides in the docs.
Memory provider docs →