Markdown as a Protocol for Agentic UI: Making AIs Build UIs on the Fly! π
Yo, fam! This article is straight-up π₯. It's talking about how AI agents can build user interfaces as they're talking to you, right in the middle of a conversation. Forget static web pages, we're talking about dynamic UI components popping up when you need them.
WHY does this even exist? π€ The Pain Before the Protocol
Look, bro, think about how you usually build software or even use AI today.
- β Building traditional UIs is slow and rigid. You gotta design it, code it, deploy it. If something changes, you gotta do it all again. It's like pouring concrete every time you want to move a couch.
- β AI chatbots are mostly text-based. You ask a question, they give you text. If you want to, say, fill out a form or interact with a graph, the AI can't just whip that up for you. It's stuck in text mode.
- β Current AI tools often have a "tool calling" step. The AI identifies it needs a tool, tells you what it's doing, calls the tool, then gives you the result. That's a lot of back-and-forth and not very fluid.
The "OHHHHHH" Moment: What if the AI could just generate the UI itself, natively, as part of its response, whenever it needed to get information from you or show you something visually? Like, boom, here's a form, fill it out. Or, boom, here's a live chart. This system aims to solve that pain, making AI interactions way more dynamic and, dare I say, agentic.
The BIG PICTURE: How Markdown Becomes a UI Wizard π§ββοΈ
The core idea is pretty wild: Markdown isn't just for formatting text anymore; it's a communication protocol where AI, server, and client all speak the same language.
Imagine your AI chatbot not just sending you text, but also:
- Running code on a server.
- Streaming live data.
- Popping up interactive forms or charts right there in your chat window.
All of this happens inside the same conversation flow, using the syntax the LLM (Large Language Model, aka the AI) already knows best: Markdown.
Here's the mental model for how it glues together:
βββββββββββββββββββ βββββββββββββββββββββββ
β User (You) β β AI (LLM Agent) β
β (Chat Interface)β β (The smarty pants) β
βββββββββββ¬ββββββββ βββββββββββ¬ββββββββββββ
β (Prompt) β
β β Generates
βΌ β Markdown
βββββββββββββββββββ Text, Code, Data β (interleaved)
β Client (Browser)β βββββββββββββββββββββββ΄βββββββββββ
β (Parses Markdown) (WebSockets, HTTP/2...) β
βββββββββββ¬ββββββββ β
β Execute Code Blocks β
β Render UI Components β
βΌ β
βββββββββββββββββββ β
β Server (Runtime)β βββββββββββββββββββββββββββββββββββ
β (bun-streaming-exec)
β (Mounts React UIs)
βββββββββββ¬ββββββββ
β (Console logs, errors, user input)
β
ββββββββββββββββββββββββββββββββββββββββββββΊ
(Feeds back to LLM for next turn)
The magic happens because the LLM uses Markdown to tell the client and server what to do. The client parses it, renders UIs, and sends user input back. The server executes code blocks and sends outputs back to the LLM. It's a continuous loop!
The MECHANICS: Three Pillars Holding Up This UI Temple πͺ
This whole system rests on three super clever ideas:
1. Markdown as Protocol: The Universal Translator π£οΈ
WHY: LLMs are literally trained on the internet. And what's all over the internet? Markdown! Code fences, formatting, tablesβthey get it. So, why invent a new language for them when they already speak one fluently? Using Markdown means the AI doesn't need special training; it's already pre-wired for this.
HOW IT WORKS: The agent outputs a single stream of Markdown, but it's not just text. This Markdown stream contains three different types of "blocks":
-
Text Blocks:
- Syntax: Just regular Markdown (
**like this**,*or this*). - Purpose: This is the standard chat output. It streams directly to your eyeballs, token by token.
- Example: "Hey! I am the assistant. This text is streamed to the user token by token."
- Syntax: Just regular Markdown (
-
Code Fences:
- Syntax: ```tsx agent.run
(or similar, with anagent.run` hint). - Purpose: These blocks contain actual code (TypeScript/JavaScript) that the server needs to execute. The
agent.runpart tells the system, "Hey, run this code!". - Example:
const messages = await fetchMessages(); // Server will run this!
- Syntax: ```tsx agent.run
-
Data Fences:
- Syntax: ```json agent.data => "id"
(or similar, with anagent.data` hint). - Purpose: These are for streaming structured data (like JSON) directly into a UI component that's already mounted on the client. The
"id"part tells it which UI component should receive this data. - Example:
[ { "name": "Blade Runner", "rating": 4.5 } // This JSON goes to UI component "fake-movies" ]
- Syntax: ```json agent.data => "id"
KEY TAKEAWAY: All these different types of information are interleaved in one single Markdown stream. The client and server just parse it as it comes in and handle each block type appropriately. No complicated, separate APIs!
2. Streaming Execution: No Waiting Around β°
WHY: Imagine asking an AI for something, and it starts generating code for a UI. If you have to wait for the entire block of code to be sent before it starts running, that's a delay. And if it's a huge block, you're just staring at a blank screen. That's a bad user experience. We live in an instant gratification world, bro!
HOW IT WORKS:
- As the LLM generates the Markdown, the system doesn't wait for a full code block to finish.
- As soon as a complete statement within a code fence arrives, it's immediately executed on the server.
- This means:
- API calls start instantly.
- UI components can begin rendering right away.
- Errors surface faster, so the AI can correct itself sooner.
The "Cursed" Part: This is tricky to implement in standard runtimes. The author had to build a custom solution (bun-streaming-exec) to achieve this, using some clever vm.Script magic. It's like feeding code line by line to the server and having it run instantly, even though it's part of a larger, still-being-generated block.
3. The mount() Primitive: Your UI Creation Superpower β¨
WHY: You've got text, code, and data streaming. But how do you actually turn that into a visible, interactive user interface? You need a specific function that says, "Hey, put this UI on the screen now!" React is perfect because LLMs have seen millions of examples of React components and JSX.
HOW IT WORKS:
- The
mount()function is the core primitive the agent uses to create UIs. - When the LLM generates a code block like this:
mount({ ui: () => <Card>Hello from the agent!</Card> // This is React JSX! }); - The server executes it.
mount()takes that React component, serializes it (turns it into a format that can be sent over the wire), and sends it to the client. - The client then receives this serialized component and renders it directly inside the chat interface.
This is where the agent becomes a UI designer on the fly!
FOUR Ways Data Moves: The Dance of Information πΊπ
This system is all about dynamic interaction, and that means data needs to flow seamlessly between the user, the server, and the AI. The article highlights four key patterns:
1. Client β Server (Forms: Getting User Input) π
- The Pain: How does the AI get structured input from you beyond just text? Like when you need to fill out a field or pick an option.
- The Solution: The
mountfunction can render forms.- The
outputSchema(usingz.objectfrom Zod) tells the system what kind of data to expect from the form. await form.resultis a crucial line on the server. It pauses the server-side code execution until the user fills out and submits the form on the client.- Once submitted, the
console.logsends that user input back to the LLM as part of its "runtime transcript."
- The
- Imagine: AI says, "What's your name and email?" Pop! A form appears. You fill it. Press submit. The AI immediately gets your data and continues the conversation.
2. Server β Client (Live Updates: Reacting to Server Changes) π
- The Pain: If something changes on the server (e.g., a progress bar update, a new message coming in), how do you get that reflected in the UI without the AI needing to regenerate the whole UI or know about every tiny change?
- The Solution:
Dataobjects.- The server creates a
new Data({ progress: 0 }). - This
Dataobject is passed tomount(), and the UI usesdata.progress. - When the server-side
data.progress = 40changes, the system detects this mutation (becauseDataobjects are "proxies"). - It serializes just that change (a "patch"), sends it over a WebSocket, and the client applies the patch to update the UI instantly.
- The server creates a
- Imagine: AI starts a long task. Pop! A progress bar appears. As the task progresses on the server, the bar updates in real-time in your chat.
3. LLM β Client (Streaming: Data Arriving Piecemeal) π
- The Pain: What if the data for a UI component is large or arrives slowly? You don't want to wait for all of it before showing anything.
- The Solution:
StreamedDataobjects andjsonriver.- The agent mounts a UI that expects
streamedData. - The LLM then uses a
json agent.data => "id"fence to stream JSON data directly to thatStreamedDataobject. - The client uses
jsonriverto incrementally parse the incoming JSON. As each piece of JSON arrives, the UI updates, showing partial data.
- The agent mounts a UI that expects
- Imagine: AI is fetching a list of movies. Pop! A list appears. As each movie's data arrives from the AI, it pops onto the list in real-time, one by one, instead of waiting for the full list.
4. Client β Server (Callbacks: Interactive UI Actions) π
- The Pain: How do you let the user click a button or perform an action within an AI-generated UI, and have that action trigger server-side logic without needing the AI to be involved in every single click?
- The Solution:
callbacksinmount().- The server defines an
asyncfunction, likeonRefresh. - This function is passed to
mount()in thecallbacksprop. - The UI component's button (e.g.,
onClick={callbacks.onRefresh}) calls this server-side function when clicked. - The server executes
onRefresh, which can fetch new data, updateDataobjects (triggering live updates as above), etc.
- The server defines an
- Imagine: AI shows you a list of messages. Pop! A refresh button appears. You click it. The client makes a call to the server-side
onRefreshfunction, which fetches new messages, updates the displayed data, all without prompting the AI for a new turn.
Slots: Building Complex UIs Incrementally ποΈ
WHY: If an AI has to generate a very complex UI, it might take a while to write all the code. That means the user has to wait longer to see anything. This is bad for responsiveness.
HOW IT WORKS:
- The agent first creates a "shell" UI with
mount(). This shell hasSlotcomponents where more complex parts will go. These slots can show aSkeleton(placeholder) while waiting. - Then, the agent uses
shell.mountSlot("slotName", uiComponent)to fill in each slot as soon as it finishes generating the code for that specific part. - This means the user sees the basic structure of the UI very quickly, and then individual sections pop in as they're ready.
- Slots inherit context (data, callbacks) from their parent, so they remain reactive and connected.
AI thinks π§ ...
Sends shell UI code.
βββββββββββββββββββββββββ
β Card Header β
βββββββββββββββββββββββββ€
β βββββββββββββββββββββ β
β β Skeleton (Stats) β β <-- User sees this FAST!
β βββββββββββββββββββββ β
βββββββββββββββββββββββββ€
β βββββββββββββββββββββ β
β β Skeleton (Blockers)β β
β βββββββββββββββββββββ β
βββββββββββββββββββββββββ
AI thinks π§ ...
Sends stats slot code.
βββββββββββββββββββββββββ
β Card Header β
βββββββββββββββββββββββββ€
β βββββββββββββββββββββ β
β β Actual Stats β β <-- Stats pop in!
β βββββββββββββββββββββ β
βββββββββββββββββββββββββ€
β βββββββββββββββββββββ β
β β Skeleton (Blockers)β β
β βββββββββββββββββββββ β
βββββββββββββββββββββββββ
AI thinks π§ ...
Sends blockers slot code.
βββββββββββββββββββββββββ
β Card Header β
βββββββββββββββββββββββββ€
β βββββββββββββββββββββ β
β β Actual Stats β β
β βββββββββββββββββββββ β
βββββββββββββββββββββββββ€
β βββββββββββββββββββββ β
β β Actual Blockers β β <-- Blockers pop in!
β βββββββββββββββββββββ β
βββββββββββββββββββββββββ
This makes the whole experience feel much snappier.
On Security: The Elephants in the Room π
The article gives a brief but important nod to security.
- The Problem: Executing AI-generated code is inherently risky. What if the AI generates malicious code?
- The Stance: This project doesn't solve security. It assumes that sandboxing, permissions, and static analysis (all things companies like Google and OpenAI are working on) will eventually handle the code execution risks.
- The Unsolved Problem: Prompt injection. If a user can trick the AI into generating bad code, that's still a huge challenge across all agent architectures, not just this one.
TL;DR: This system is built on the premise that the underlying code execution can be made safe.
WHY This Works So Well (LLM Ergonomics) β
The author found that LLMs picked up this protocol immediately. Why?
- Markdown: LLMs are trained on it. It's their native tongue.
- TypeScript/React: The most popular modern dev tools on GitHub. LLMs have seen millions of examples of how to write JSX components, how to use state, how to define callbacks.
mount()arguments (awaitresults,callbacks, Zod schemas): These are common programming patterns that LLMs have seen everywhere.
The core genius: Instead of trying to teach the LLM a new language or a new way to think about UI, this system just takes patterns the LLM already knows and arranges them into a functional system. It's like giving a master chef all the ingredients they know and asking them to cook, rather than teaching them a whole new cuisine.
BURN THIS IN YOUR BRAIN π₯
- Markdown is the new API: It's not just for text; it's a multimodal protocol for AI, client, and server.
- AI as a UI designer: LLMs can generate and render interactive UIs on the fly, mid-conversation.
- Streaming is key: Code executes incrementally, data streams in, UIs update instantly for a snappy experience.
mount()is your entry point: This function transforms AI-generated code into real, interactive React components.- Data flow patterns: Important to understand the four ways data can move for dynamic interactions.
- LLM Ergonomics: This whole thing is designed around what LLMs already know, making them naturally adapt to it.
This is a wild look into one possible future of AI interaction, where your chatbot isn't just chatting; it's building things for you as you speak. Pretty mind-blowing, right? π€―