mcp-ui-bridge: Bridging Web UIs and LLMs for True Digital Accessibility
Introducing a new approach to web interaction that lets you code once and serve both human users and Large Language Models with full feature parity.
The Bottleneck: LLMs Navigating a Human-Centric Web
Large Language Models (LLMs) are revolutionizing countless fields, yet their ability to meaningfully interact with the vast majority of web applications remains surprisingly clunky. The web, in its current form, is overwhelmingly designed for human visual perception and interaction. This creates a fundamental mismatch for LLMs, which operate natively in the realm of text and structured data.
We typically see two main strategies to bridge this divide, each with significant drawbacks:
Giving LLMs "Eyes": Some approaches attempt to equip LLMs with visual interpretation capabilities, essentially trying to teach them to "see" a webpage and "click" buttons like a human. While an interesting research avenue, this often forces LLMs to operate outside their core strengths, leading to solutions that can be brittle, inefficient, and struggle with the dynamic nature of modern UIs.
Building Separate LLM Tools/APIs: The more common method involves developers creating bespoke, secondary interfaces (APIs or simplified tools) for LLMs to access some functionalities of a web application. This immediately creates a two-tiered system: the rich, full-featured web application for humans, and a restricted, often lagging, interface for LLMs. The result? Inevitable feature gaps, increased development and maintenance overhead (coding features twice), and an LLM that never truly gets to use the "real" application with its complete capabilities.
Both paths highlight a core problem: we're either trying to make LLMs less like LLMs, or we're not giving them access to the actual application.
Our experience, and that of many in the field, has shown that while the idea of an LLM directly "seeing" and navigating a visual UI is appealing, it's fraught with practical challenges. LLMs, in their current dominant form, are primarily text-processing engines. Forcing them to interpret complex visual layouts, infer button functionalities from subtle design cues, or accurately determine interaction points (like precise mouse coordinates for a click) often leads to unreliability and errors. They might misread text embedded in images, misunderstand the purpose of a visually ambiguous icon, or struggle with dynamic content that shifts layout.
This is akin to asking a brilliant textual scholar to interpret a complex architectural blueprint without any prior training in visual design or engineering symbols. While they might grasp some high-level concepts, the nuances critical for precise interaction would likely be lost. The LLM is essentially guessing based on incomplete data, leading to a frustratingly high error rate.
Instead of attempting to force-fit LLMs into a visual paradigm where they are inherently weaker, mcp-ui-bridge
takes a different stance. We believe in meeting LLMs where they excel: structured text. Our approach effectively "down-projects" the rich, two-dimensional (or even N-dimensional, considering interactive states and layers) visual user interface into a one-dimensional, semantically rich, textual representation. This isn't about "dumbing down" the interface; it's about translating it into a high-fidelity language the LLM fluently understands, ensuring clarity, precision, and robust interaction. We provide the LLM with the meaning and structure of the UI, not just a flat image of it.
Showcase
Introducing mcp-ui-bridge
: The Solution for Seamless Interaction
This is where mcp-ui-bridge
comes in. We asked: What if we could build web applications that are natively and equally accessible to both humans and LLMs through a single, unified development effort?
mcp-ui-bridge
is a tool and a methodology designed to achieve precisely this. It is currently a work in progress, and the repository serves as a demonstration of these concepts. You can explore its codebase and progress on GitHub: https://github.com/SDCalvo/mcp-ui-bridge. It enables developers to create web applications that, while offering a rich visual experience for human users, can simultaneously expose a structured, text-based, and fully interactive representation of themselves to LLMs via the Model Context Protocol (MCP).
The Core Principle: LLM-Oriented Accessibility
Think of this as the next evolution of web accessibility. Just as ARIA attributes make web pages understandable and navigable for users relying on assistive technologies like screen readers, mcp-ui-bridge
leverages a specific set of semantic data-mcp-*
attributes to make web applications "readable" and "operable" by LLMs.
Humans interact with the intuitive visual UI; LLMs interact with a semantically equivalent, text-based interface derived from the exact same underlying codebase and application logic. This is the heart of LLM-Oriented Accessibility.
Code Once, Serve All: True Feature Parity
The most significant advantage of the mcp-ui-bridge
approach is true feature parity with minimal extra effort.
No More Coding Features Twice: Developers build their web application as they normally would. The semantic
data-mcp-*
attributes are lightweight additions, not a separate development track.LLMs Use the Real App: The LLM isn't interacting with a watered-down API. It's engaging with the full application, just through a different modality more suited to its capabilities.
Consistent Experience: Any feature available to a human user is inherently available to the LLM, ensuring consistency and completeness.
This paradigm respects the strengths of both humans (visual processing, intuition) and LLMs (text processing, structured data analysis, automation at scale) by providing each with an interface optimized for their needs, all stemming from a single source of truth: your application code.
A Deep Dive: How mcp-ui-bridge
Works
mcp-ui-bridge
seamlessly integrates with any web application (regardless of the specific JavaScript framework, though our examples often use React) to provide this LLM-friendly bridge. Here's a look at its core components:
1. Semantic Instrumentation: The data-mcp-*
Attributes
At the foundation are data-mcp-*
attributes. Developers annotate their HTML elements with these attributes to provide the semantic meaning that mcp-ui-bridge
needs to understand the UI's structure, purpose, and interactive capabilities.
Key attributes include (but are not limited to):
data-mcp-interactive-element="unique-id"
: Marks any element the user can interact with (buttons, links, input fields, checkboxes, etc.). The value serves as a stable identifier.data-mcp-element-label="Descriptive Label"
: Provides a clear, human-readable (and LLM-readable) label for the element.data-mcp-element-type="button | text-input | checkbox | select | radio"
: Explicitly defines the element's role, aiding the LLM in understanding how to interact with it.data-mcp-purpose="Description of what the element/container does"
: A crucial semantic hint that explains the function or goal of an element or a group of elements (e.g., "Adds a new item to the todo list," "Filters search results by date").data-mcp-display-container="unique-id"
: Identifies areas that display lists, tables, or collections of data items.data-mcp-display-item-text
: Marks the primary textual content within a display item (e.g., the text of a todo item).data-mcp-value="current-value"
: Can provide the current value of an element if it's not easily inferable from standard DOM properties.data-mcp-disabled="true|false"
anddata-mcp-readonly="true|false"
: Explicitly convey the state of an element.
These attributes are designed to be simple to add and manage, augmenting existing HTML rather than requiring a completely different way of building UIs.
2. The DomParser
: Your UI's Interpreter
The DomParser
module is the intelligent agent that analyzes the live DOM of the target web application. It leverages the power of Playwright (a robust browser automation library) to:
Scan the Page: It meticulously scans the current page for all
data-mcp-*
attributes.Extract Information: For each annotated element, it extracts crucial details: its unique ID, label, type, purpose, current state (e.g., value of an input, whether a checkbox is checked, disabled status), and any relationships to other elements (e.g., an input field belonging to a specific form region).
Infer Intelligently: Where explicit attributes are missing, the
DomParser
attempts to infer information. For example, it can derive labels fromaria-label
,placeholder
text, or even an element'stextContent
if nodata-mcp-element-label
is provided.
The output of the DomParser
is a rich, structured JSON representation of the page's interactive elements and display data, ready to be consumed by an LLM.
3. The PlaywrightController
: The LLM's Hands
Once the LLM, guided by the DomParser
's output, decides on an action, the PlaywrightController
takes over. It also uses Playwright to interact directly with the browser, performing actions such as:
Clicking buttons, links, and other elements (
click #element-id
).Typing text into input fields (
type #element-id "text to type"
).Selecting options from dropdown/select elements (
select #element-id "valueToSelect"
).Checking or unchecking checkboxes (
check #element-id
,uncheck #element-id
).Choosing radio button options (
choose #radio-id "value"
).Hovering over elements (
hover #element-id
).Clearing content from input fields (
clear #element-id
).
The PlaywrightController
ensures that these interactions are performed reliably on the live web page.
4. The MCP Server: The Communication Bridge
mcp-ui-bridge
runs a lightweight MCP (Model Context Protocol) server, built using the FastMCP
library. This server exposes a standardized set of tools that an LLM can call:
get_current_screen_data
: This is the LLM's primary way to "see" the page. It returns a detailed JSON object containing:The current URL.
A list of all interactive elements found by
DomParser
, including their IDs, labels, types, purposes, current values/states, and any other relevant metadata.Structured data from any
data-mcp-display-container
elements, presenting lists or collections of information in an LLM-friendly format.
get_current_screen_actions
: To make interaction even more straightforward, this tool provides a list of suggested actions the LLM can take based on the current screen elements. Each action includes:The ID of the target element.
A descriptive label.
The element type and purpose.
A precise
commandHint
(e.g.,click #submit-button
,type #username-input "your_text_here"
) that the LLM can use directly with thesend_command
tool.
send_command
: This is the LLM's action tool. It accepts a command string (often derived fromget_current_screen_actions
hints) and executes the specified action (e.g.,click
,type
,select
) on the target element using thePlaywrightController
. It then returns the result of the action (success/failure, error messages).
(Note: A get_page_screenshot
tool was also implemented to capture the page as a base64 PNG. However, as most LLMs cannot directly interpret visual data, this tool is currently not exposed via the MCP server to maintain focus on semantic, text-based interaction. The underlying code remains available for specific use cases where an external image processing step might be involved.)
The Grand Plan: From Project to Versatile Library
Our vision for mcp-ui-bridge
extends beyond its current form. We are actively working towards packaging it as an easy-to-use, turnkey tool and library (Phase 3.5
of our development plan). The goal is to enable developers to:
Easily Integrate: Install
mcp-ui-bridge
as a dependency or run it as a standalone tool against their existing web application.Automatic Server Setup: Have the DOM parser, Playwright integration, and MCP server start automatically, configured with sensible defaults but overridable for advanced needs (target URL, headless/headed browser mode, MCP port, etc.).
Flexible Usage:
Run it as a background server for LLM agents to connect to.
Potentially use it with an interactive CLI for developers to directly debug or test the
data-mcp-*
annotations and LLM interactions.
This will lower the barrier to entry for making any web application LLM-accessible.
Unlocking the Benefits: Why mcp-ui-bridge
is a Game-Changer
Adopting the mcp-ui-bridge
methodology and toolset offers compelling advantages:
True Feature Parity: As emphasized, LLMs interact with the same application logic and features as human users, because it is the same application, just accessed via a different, more suitable interface.
Simplified and Future-Proof Development: Developers code their UI and features once. The LLM-accessible interface is derived, not separately built and maintained. This significantly reduces development overhead and ensures that as the visual UI evolves, the LLM's understanding evolves with it (as long as
data-mcp-*
attributes are maintained).Enhanced LLM Capabilities & Reliability: By providing LLMs with clear, structured, text-based information and action primitives, they can interact with web applications more reliably, deeply, and intelligently than if they were trying to interpret pixels or navigate via brittle selectors.
Cross-Framework Compatibility: While our initial development might use React for examples, the
data-mcp-*
attribute approach and DOM parsing are fundamentally framework-agnostic.mcp-ui-bridge
is designed to work with any web application, regardless of the underlying JavaScript framework (Angular, Vue, Svelte, or even vanilla JS).
Exciting Use Cases Enabled by mcp-ui-bridge
The implications of making web applications truly LLM-accessible are vast. Here are a couple of immediate, powerful use cases:
1. LLMs as Advanced Testers and QA Automation Agents
Imagine an LLM that can act as a sophisticated Quality Assurance engineer. With mcp-ui-bridge
:
Deep Understanding: The LLM can receive a full semantic breakdown of each screen.
Intelligent Interaction: It can formulate and execute complex test scenarios by calling
send_command
based on theget_current_screen_actions
andget_current_screen_data
.State Verification: It can read data from the screen to verify that actions had the intended effects.
Comprehensive Coverage: LLMs can be instructed to explore application flows, test edge cases, and identify bugs with a level of semantic understanding that traditional scripted UI automation often lacks.
This moves beyond simple click-and-assert testing to a more intelligent, adaptive, and thorough form of automated QA.
2. LLM-Powered In-IDE Development and Interaction Workflow
For developers, mcp-ui-bridge
can revolutionize the inner development loop:
Develop a Feature: A developer codes a new UI component or feature, annotating it with
data-mcp-*
attributes as they go.Instantly Interact via LLM/MCP: Without leaving their IDE, or by using a simple companion tool, they can have an LLM (or a script acting as an MCP client) immediately interact with this new feature through the
mcp-ui-bridge
server running locally.Rapid Prototyping & Testing: The developer can instruct the LLM to:
"Fetch the current screen data. Does my new button '#save-profile' appear with the label 'Save Profile' and purpose 'Saves user profile data'?"
"Try to type 'John Doe' into '#full-name-input' and click '#save-profile'."
"Verify that after clicking save, the status message container '#profile-status' displays 'Profile saved successfully'."
Iterate and Refine: Based on the LLM's interaction and feedback (all facilitated via MCP text-based exchanges), the developer can quickly refine their code,
data-mcp-*
annotations, and re-test, all within a tight, efficient loop.
This brings the power of LLM-driven interaction and verification directly into the development process, enabling faster iterations and more robust, semantically rich UIs.
The Future is Accessible – For Everyone and Every LLM
LLM-Oriented Accessibility, powered by tools like mcp-ui-bridge
, is not about diminishing the human-centered visual web. It's about enriching the underlying application architecture so that it can be natively understood and operated by different kinds of intelligence.
We believe this paradigm shift will unlock new potentials for how AI agents integrate with the web, fostering a digital ecosystem that is truly and equally usable for both human and artificial intelligence, without overburdening developers.
What are your thoughts on mcp-ui-bridge
and the concept of LLM-Oriented Accessibility? How else can we make the web a more LLM-friendly space while enhancing human usability and streamlining developer workflows? We'd love to hear your ideas and feedback in the comments below!
About the Authors:
This post was co-authored by Santiago Calvo and Gemini 2.5 Pro, a large language model from Google. Together, we explored the concepts, developed the code for mcp-ui-bridge
, and drafted this article to share our vision for a more LLM-accessible web.