How to Build an Obsidian-Style 3D Agent Memory Graph Using Gemini Video API and Claude Code
Screen-record Obsidian, send the video to Gemini's video understanding API, ask Claude Code to replicate it. Zero graph libraries required.
You Can Build a 3D Agent Memory Graph by Showing Gemini a Screen Recording
The trick is embarrassingly simple: screen-record Obsidian’s graph view using Loom, feed the video file to Gemini’s video understanding API, and ask Claude Code to replicate what it sees. No graph library research. No hours spent reading D3.js documentation. You describe what you want by showing it.
This is the core technique behind the 3D agent memory visualization that Mark Kashef built as part of his multi-agent operating system — a system where every agent task, memory, conversation, and scheduled job lives in a local SQLite database with zero cloud database costs. The 3D graph is the most visually striking piece, but the method used to build it is the part worth stealing.
The approach generalizes. If you can record it, Gemini can describe it, and Claude Code can build it.
The Technique: Video-to-Code via Gemini’s Understanding API
Here’s the exact sequence. You open Obsidian, navigate to the graph view, and screen-record yourself exploring it — clicking nodes, watching edges animate, filtering by tag. You narrate as you go: “This is what I love about the Obsidian graph view. Each bullet is a project or task. I want this but synchronized in real time to what my agents are doing.”
Remy doesn't write the code. It manages the agents who do.
Remy runs the project. The specialists do the work. You work with the PM, not the implementers.
You download the Loom recording as an .mp4. You pass it to Gemini’s video understanding API using the Gemini skill inside Claude Code. Then you prompt Claude Code: build this, but wire it to my hive mind database instead of Obsidian’s vault.
Gemini’s video understanding API processes the full temporal sequence of the recording — not just a frame grab. It can describe the interaction patterns, the visual hierarchy, the animation behavior when nodes are filtered. That’s the signal Claude Code needs to reconstruct the UI from scratch.
The Gemini skill in this system is a Claude Code plugin that exposes Gemini’s model family, including the video understanding API and image generation via Nano Banana. Once the skill is installed globally, every agent in the system inherits it. That’s the architecture: global skills propagate automatically, so you build the capability once.
If you’ve been following the Claude Code memory architecture work that surfaced from the source leak, this pattern of layering capabilities through skills rather than baking them into individual agents will feel familiar. The skill is the unit of reuse.
What the Graph Is Actually Visualizing
Before you build the visualization, you need to understand what it’s pointing at.
The underlying data structure is a shared SQLite database — the hive mind — that stores every agent task, every memory, every conversation, and every scheduled cron job locally. There’s no Supabase, no Neon, no cloud database bill. The entire multi-agent OS runs on a file on your laptop.
Each node in the 3D graph represents a task that an agent has completed. The edges represent relationships — which agent ran the task, which memories were involved, which other tasks share context. The graph is live: as agents complete work and write to SQLite, the visualization updates.
The system also has a 2D version (closer to Obsidian’s flat graph view) and a list view that shows every action of every agent in a table. The list view is the foundation. If the table is populated correctly and agents are logging in real time, the 2D and 3D views are just rendering layers on top. The creator is explicit about this: “If the list view is actually working… everything else is just additive.”
This matters for how you build it. Don’t start with the 3D graph. Start with the SQLite schema and make sure agents are writing to it reliably. The visualization is the last 20% of the work.
Why This Method Beats Starting from a Library
The conventional path to building a graph visualization is: pick a library (D3.js, Three.js, Sigma.js, Cytoscape), read the docs, adapt an example, wire in your data. That path takes days if you’re not already fluent in the library’s API.
The video-to-code path takes an afternoon. You’re not asking Claude Code to invent a graph visualization from first principles. You’re giving it a reference implementation — a working example captured on video — and asking it to replicate the behavior. The model’s job shifts from design to translation.
Other agents ship a demo. Remy ships an app.
Real backend. Real database. Real auth. Real plumbing. Remy has it all.
There’s a deeper principle here. Claude Code is better at “make it look and behave like this” than “design something that looks good.” Showing is more precise than describing. The Gemini video API is the bridge between showing and prompting.
One caveat the creator mentions: the 3D version is resource-intensive. If you’re on a machine with limited GPU, the 2D version is the practical choice. The 3D graph uses significantly more rendering resources than the flat version. Know your hardware before you commit to the implementation.
The Non-Obvious Detail: Gemini Flash Does the Cheap Work
The 3D visualization is the flashy part, but the more instructive detail is how Gemini models are used throughout this system for the tasks where cost matters.
The mission control kanban’s auto-assign feature uses Gemini 3 Flash — described explicitly as “the cheapest model from Gemini” — to classify tasks to agents. The system prompt is dynamic: it includes descriptions of all current agents and asks the model to route the task. The cost is negligible. The classification is good enough.
The agent suggestion system works the same way. Gemini Flash scans JSON conversation history, identifies overburdened agents (the comms agent handling WhatsApp, school, mail, and everything else), and recommends new agents to create. You’re feeding entire JSON files of conversation history into a large context window and asking for pattern recognition. Gemini Flash’s context window makes this practical; its price makes it sustainable.
The war room’s /standup command pings all agents simultaneously, each one querying its own slice of the SQLite database and returning a 24-hour status report. The /discuss command opens a multi-agent conversation where each agent has context on the previous responses. /pin designates a lead agent who frames every reply. These slash commands are just system prompts with database queries attached — the sophistication is in the data, not the prompting.
The Anthropic SDK bridge is what connects a Claude Code subscription to the Telegram interface. You’re not paying for a separate API tier; you’re routing your existing subscription through the bridge so every skill, every project, every plugin is accessible from your phone.
Building the Memory Layer That Feeds the Graph
The graph is only as interesting as the memory system behind it. This system uses five to six layers organized by importance, salience, and recency. Salience is the technical term for how significant a memory is — not just when it was created, but how much it should influence future behavior.
The memory tab is searchable. Type “Gmail” and you get every memory related to Gmail across all agents. You can also run an insights pass: a cheap language model scans your memory corpus and derives patterns you haven’t noticed — the equivalent of running /insights in Claude Code to get a 30-day behavioral summary.
The creator’s advice on memory architecture is worth quoting directly: have Claude Code interview you about how you want to handle fresh memories versus fading memories. Do important memories get pinned permanently? Do fading memories decay to nothing or get archived? These are design decisions, not technical ones. The technical implementation follows from the decisions.
Seven tools to build an app. Or just Remy.
Editor, preview, AI agents, deploy — all in one tab. Nothing to install.
For the graph to be meaningful, agents need to log their tasks with enough semantic richness that the nodes are distinguishable. A task logged as “completed email” is a dead node. A task logged with the agent name, the skill used, the outcome, and a summary is a node with edges worth rendering.
If you want to go deeper on memory architecture for Claude Code specifically, the self-evolving memory system using Obsidian and Claude Code hooks covers the session-log-to-wiki pipeline that complements this approach well.
Agent Configuration: The YAML + claude.md Pattern
Each agent in this system is two files: a claude.md that defines its role, personality, and instructions, and a YAML config file that defines its configuration — model, skills, rules. That’s the entire agent definition.
Skills can be global (every agent inherits them) or project-level (specific to one agent or task). The Gemini skill is global. The Meta Ads CLI skill — which lets Claude Code query the full Meta campaign API and generate a daily report with hyperlinks to specific ads, sent via Telegram at 7:30am — is specific to the meta agent.
Creating a new agent is: write the name and description in the UI, go to Telegram’s BotFather, create a new bot, copy the token, paste it into the frontend, activate. The agent is live. It inherits all global skills immediately.
This is the architecture decision that makes the system scale: global skills as infrastructure, agent-specific skills as specialization. When you add a new capability — say, a new CLI integration or a new Gemini model — you add it once at the global level and every agent gets it.
For teams building multi-agent systems at this level of complexity, platforms like MindStudio offer a different path: 200+ models, 1,000+ integrations, and a visual builder for chaining agents and workflows, without writing the orchestration layer from scratch. The tradeoff is customization depth versus setup speed.
Replicating This: The Practical Sequence
If you want to build the 3D graph visualization specifically, here’s the sequence that follows from the source material:
Step 1: Build the SQLite schema first. Define tables for agents, tasks, memories, conversations, and scheduled jobs. Make sure agents are writing to the tasks table with rich summaries after every completion. The graph is useless without populated data.
Step 2: Install the Gemini skill in Claude Code. This gives Claude Code access to Gemini’s video understanding API. The skill is a global plugin — install it once.
Step 3: Screen-record Obsidian’s graph view. Use Loom or any screen recorder. Narrate what you like about it. Show filtering, node interaction, edge rendering. Keep it under 5 minutes. Download as .mp4.
Step 4: Pass the video to Gemini via the skill. Prompt: “Here’s a screen recording of Obsidian’s graph view. I want to build something that looks and behaves like this, but reads from a SQLite database with this schema: [paste schema]. The nodes should be agent tasks. Edges should connect tasks that share the same agent or memory context.”
Step 5: Let Claude Code build the component. It will produce a React component (or vanilla JS, depending on your stack) that renders the graph. Expect iteration — the first pass will get the structure right but miss interaction details. Feed it specific corrections.
One coffee. One working app.
You bring the idea. Remy manages the project.
Step 6: Wire in the live data connection. The graph should poll or subscribe to the SQLite database. As agents complete tasks and write rows, the graph updates. This is the part that makes it a live system rather than a static visualization.
The 2D version is faster to build and cheaper to run. If you’re starting from scratch, build the 2D version first, validate that the data pipeline works, then layer on the 3D version once you know the underlying data is reliable.
For the broader question of going from a spec like this to a deployed full-stack app, Remy takes a different approach: you write an annotated markdown spec describing your application’s data model, rules, and edge cases, and it compiles that into a complete TypeScript backend, SQLite database with auto-migrations, frontend, and deployment. The spec is the source of truth; the code is derived output. If your graph visualization is part of a larger application, that compilation step can save significant scaffolding time.
The Underlying Principle
The video-to-code technique works because it sidesteps the hardest part of UI development: translating a mental image into a technical specification. Most developers are better at recognizing good UI than describing it. Gemini’s video understanding API converts recognition into description. Claude Code converts description into code.
This is a general pattern, not a one-off trick. Any time you find yourself trying to describe a UI behavior in words and failing, consider whether you could just record something that already does it. The recording is a more precise prompt than prose.
The animated 3D websites built with Claude Code and AI video generation work follows a similar logic — using video as a design reference rather than a written spec. The pattern is converging on something: video is becoming a first-class input for code generation, and the tools are catching up to that.
The deeper lesson from this entire system is the one the creator keeps returning to: this is a data engineering problem, not an AI problem. The graph visualization is impressive. The SQLite schema that makes it meaningful is the actual work. Get the back of house right, and the front end is just rendering.