Run LLM Agents That Remember: Setting Up Memori for Persistent, Multi-User, Multi-Session Apps
Forget stateless chatbots. With Memori, you can build LLM apps that actually remember user context—across sessions, users, and even agent personas. Set this up right, and your models stop treating every message like the first. Here’s how to get persistent, multi-user LLM memory running in your own Google Colab notebook, according to MarkTechPost.
Prepare Your Environment for Building Persistent LLM Applications with Memori
Start in a Clean Google Colab Notebook
Colab gives you an isolated, fresh Python environment that avoids dependency clashes. This is crucial—Memori’s latest features require recent package versions.Install Required Packages
Use pip to get the right toolchain. Paste this into a Colab cell:import subprocess, sys def _pip(*pkgs): subprocess.check_call([sys.executable, "-m", "pip", "install", "-q", *pkgs]) _pip("memori>=3.3.0", "openai>=1.40.0", "nest_asyncio")This brings in Memori, OpenAI’s SDK, and
nest_asynciofor async compatibility.Set API Keys Securely
Don’t hardcode secrets. Use:import os, getpass if not os.getenv("OPENAI_API_KEY"): os.environ["OPENAI_API_KEY"] = getpass.getpass("OPENAI_API_KEY: ") if not os.getenv("MEMORI_API_KEY"): v = getpass.getpass("MEMORI_API_KEY (leave blank for rate-limited tier): ") if v.strip(): os.environ["MEMORI_API_KEY"] = v.strip() else: print("→ No MEMORI_API_KEY set. Continuing with rate-limited tier.")This prompts you for keys at runtime. No accidental GitHub leaks.
Enable Async in the Colab Runtime
Run:import nest_asyncio; nest_asyncio.apply()This sidesteps Colab’s event loop limitations, letting you test async LLM calls.
Watch out for:
- Typos in package names or missing API keys will halt progress immediately.
- Free tier Memori API keys are rate-limited, which can slow down multi-session testing.
Analysis:
These steps ensure your coding environment won’t trip over missing dependencies or async pitfalls. The source’s use of nest_asyncio is key—without it, async LLM calls will fail in Colab.
Initialize Memori as the Agent-Native Memory Layer for Your LLM Application
Import and Instantiate Memori
from memori import Memori mem = Memori()This single object will intercept and track all your LLM interactions.
Understand Attribution: User and Role Separation
Memori usesentity_id(user identity) andprocess_id(agent role) to isolate memory.
Example:mem.attribution(entity_id="[email protected]", process_id="personal-assistant")Set Session IDs for Granular Context
For each thread or topic, generate unique session IDs:import uuid session_id = f"project-fastapi-{uuid.uuid4().hex[:8]}" mem.set_session(session_id)This lets you scope memory to, say, a specific project or conversation.
Test Initialization
No need for complex setup. Ifmeminstantiates and accepts attribution, you’re ready. Any errors here usually mean a missing or invalid API key.
Why it matters:
This setup makes it trivial to store and recall facts per user, per agent persona, and per session. You’re not just tossing everything into a single memory bucket—context stays sharp and relevant.
What we know:
The source demonstrates that Memori’s memory can persist facts and isolate them by user and agent role, tested explicitly with multiple real and synthetic identities.
Connect Memori to Synchronous and Asynchronous OpenAI Clients for Seamless Memory Integration
Register OpenAI Clients
from openai import OpenAI, AsyncOpenAI client = OpenAI() async_client = AsyncOpenAI() mem.llm.register(client) mem.llm.register(async_client)Both sync and async clients are supported, meaning you can run blocking or concurrent model calls.
Route All Model Calls Through Memori
Any call toclient.chat.completions.create()or its async twin is now intercepted by Memori. This is automatic after registration—no wrapper code needed.Implement Helper Functions
Clean up repeated logic:MODEL = "gpt-4o-mini" def ask(prompt, system=None): msgs = [] if system: msgs.append({"role": "system", "content": system}) msgs.append({"role": "user", "content": prompt}) r = client.chat.completions.create(model=MODEL, messages=msgs) return r.choices[0].message.contentHandle Streaming and Async Calls
With the async client and proper attribution, you can support streaming responses and concurrent sessions. The source tests this in both basic and advanced scenarios.Error Handling
If registration fails, double-check API keys and Memori version. In the provided code, errors at this stage typically stem from misconfigured authentication or version mismatches.
What remains unclear:
- The internal mechanism by which Memori intercepts and persists context is not described in the source.
- The source does not specify limits on concurrent sessions or memory storage.
Analysis:
The dual sync/async support is significant. Many memory layers only cover one or the other, but Memori’s integration with both lets you scale your app’s architecture—from single-user bots to concurrent multi-agent systems—without code rewrites.
Build Persistent Multi-User and Multi-Session LLM Applications Leveraging Memori’s Memory Infrastructure
Isolate Memory by User and Agent Role
Switch attribution before every interaction.
Example (from the source):mem.attribution(entity_id="[email protected]", process_id="personal-assistant") ask("My name is Alice. I love hiking, Italian food, and I'm allergic to peanuts.") mem.attribution(entity_id="[email protected]", process_id="personal-assistant") ask("I'm Bob. Vegetarian, write Rust for a living, live in Berlin.")Test Context Recall
After storing facts, prompt the model to recall them:mem.attribution(entity_id="[email protected]", process_id="personal-assistant") print("[Alice]", ask("What's my favorite cuisine and any dietary issues?"))The model’s answer should reflect only Alice’s context—Bob’s data stays walled off.
Support Multiple Agent Personas per User
Use differentprocess_idvalues for the same user:mem.attribution(entity_id="[email protected]", process_id="fitness-coach") ask("Goal: sub-25-minute 5K by June. Currently I run 30 minutes flat.")Switch roles and context persists independently.
Group Multi-Turn Sessions
For each new thread, assign a fresh session:mem.set_session(session_id) ask("Notes: building a FastAPI app called 'Lighthouse', Python 3.12, deploying to Fly.io.") ask("Decision: SQLAlchemy + Alembic for the data layer.")Later, retrieve facts by returning to the same session ID.
Validate Isolation and Persistence
The source confirms that facts do not bleed between users, roles, or sessions. Multi-turn memory persists even after context switches, as long as you maintain correct attribution and session assignment.
What to watch:
- Rate limits: Free Memori API keys can throttle multi-session tests.
- Memory bloat: The source does not discuss pruning or scaling strategies if user/session counts grow large.
Analysis:
This setup is more than just a demo. It’s a template for any SaaS app requiring persistent, context-aware AI—think customer support, tutoring, or personal assistants that remember and adapt.
Recap Key Steps to Implement Agent-Native Memory Infrastructure with Memori for LLMs
You’ve now seen how to:
- Set up a Colab environment that’s ready for persistent memory integration.
- Instantiate Memori and configure it for strict user, agent, and session isolation.
- Register both synchronous and asynchronous OpenAI clients, making every LLM call context-aware by default.
- Build and validate multi-user, multi-session apps that remember facts and keep contexts separate.
What remains to be explored is how Memori performs under real-world scaling and whether it needs manual memory management as user/session counts grow. The source does not address these operational questions, but the coding pattern is robust for prototyping and early deployment scenarios.
Next action:
Experiment with more complex session flows and user/role combinations. Stress-test memory boundaries if you plan to scale. Memori’s architecture, as documented by MarkTechPost, offers a clear path to LLM agents that finally act less like goldfish and more like attentive collaborators.
Key Takeaways
- Memori enables LLM applications to retain user context across sessions, improving personalization and continuity.
- Setting up persistent memory infrastructure allows multi-user and multi-agent environments, unlocking richer AI experiences.
- Securely managing API keys and dependencies in Colab ensures safe and reliable deployment of advanced LLM memory features.



