From Blind to Aware: Teaching Claude Code to See Your React Native App

Note: This post describes a setup I designed and installed, but haven't yet battle-tested in a real debugging session or release cycle. The architecture is based on documented tool capabilities and community patterns. I'll update with lessons learned once I put it through real work.

This is a companion piece to my earlier post on closing the React Native + Claude Code workflow gaps. That post covered what I set up. This one covers how I got there — the process of diagnosing DX gaps, researching solutions, and designing a layered architecture for making Claude Code genuinely useful for mobile development.

The Starting Point: What Was Already Working

Before tackling the gaps, I audited what was already in place:

ios-simulator MCP — connected globally, providing screenshot, ui_tap, ui_describe_all, and other low-level simulator tools
cmux — Ghostty-based AI terminal with notification hooks (cmux notify)
RTK — Rust Token Killer proxy saving 60-90% on dev operations
react-native-best-practices skill — Callstack's performance guidelines
A well-structured CLAUDE.md — with Unistyles patterns, Reanimated conventions, domain layer architecture, and product philosophy

This setup was solid for writing code. But for debugging or verifying changes, there was no feedback loop — the information was scattered across separate windows and terminals.

Diagnosing the Three Blind Spots

I started by comparing the web developer workflow in cmux against what mobile developers get:

| Capability | Web (cmux) | Mobile (before) | |---|---|---| | Runtime errors | Browser DevTools + console | Copy-paste from Metro terminal | | Visual state | WebKit browser pane | Separate simulator window | | Build errors | Webpack/Vite output in terminal | Scroll through Metro logs | | DX integration | Browser embedded in cmux | Nothing |

Three specific gaps emerged:

Metro blindness — Claude has no visibility into Metro bundler output. Build failures, console.log statements, network requests — all invisible unless manually copy-pasted into the conversation.
Screen blindness — The ios-simulator MCP has screenshot and ui_describe_all, but they're reactive tools. Claude only sees the simulator when you explicitly ask. There's no proactive awareness of what changed on screen after a code edit.
cmux blindness — cmux gives web developers a browser pane with DevTools embedded right in the terminal. For mobile, the simulator is a separate macOS window with no cmux integration.

The Research: Three Parallel Investigations

Rather than tackling gaps sequentially, I ran three research tracks in parallel:

Track 1: Metro Integration

The key discovery was metro-mcp (steve228uk/metro-mcp). It connects to Metro's Hermes runtime via Chrome DevTools Protocol (CDP) — the same protocol Chrome uses for its own DevTools.

The architecture is clever: Metro/Hermes only allows a single CDP connection, but metro-mcp runs an internal proxy (port 9222) that multiplexes connections. This means Chrome DevTools and the MCP server can work simultaneously.

What sealed the decision: ~90 tools covering console logs, bundle errors, network tracking, JS REPL, Redux/Apollo state inspection, and component tree exploration — all documented, all accessible through natural language in Claude Code.

Track 2: Screen Awareness

I considered two approaches:

Approach A: Continuous polling — A background process that takes simulator screenshots on a timer. Noisy and wastes tokens on unchanged screens.

Approach B: Event-driven capture — Use Claude Code's hook system to trigger a screenshot after specific events. I went with a PostToolUse hook that fires after any Edit or Write to .ts/.tsx files. The screenshot saves to /tmp/claude-sim-latest.png, and Claude can read it when needed.

This is the same hybrid pattern that a viral tweet by @0x__tom (101K views) described: accessibility tree for structure + screenshots for visual verification. Per the ios-simulator-skill docs, the accessibility tree costs 10-50 tokens per screen; a screenshot costs 1,600-6,300 tokens. Use the cheap one by default, the expensive one when you need visual confirmation.

Track 3: cmux Integration

I researched the awesome-cmux ecosystem — 170+ community projects across 10 feature dimensions. cmux exposes three primitives:

CMUX_WORKSPACE_ID — workspace context
CMUX_SURFACE_ID — UI surface reference
CMUX_SOCKET_PATH — IPC channel

Plugins can write status pills, progress bars, and log entries to the sidebar. But the WebKit browser pane — the thing that makes web development so seamless in cmux — is a first-party feature with no documented plugin API.

Reality check: Embedding the iOS simulator in cmux isn't possible today. The WebKit rendering is built into cmux, not a plugin pattern others can replicate. A workaround using sixel/kitty image protocol could render frames in a terminal pane, but that's a significant build.

Pragmatic choice: Use cmux's notification and status-pill systems to push Metro errors to the sidebar. Not visual embedding, but awareness.

The Layered Design

Each layer builds on the previous one:

Layer 1: Metro Awareness (metro-mcp)

claude mcp add --scope project metro-mcp -- bunx metro-mcp

This single command should give Claude visibility into the entire Metro/Hermes runtime. The intended workflow: Claude checks get_bundle_errors before investigating UI bugs — many apparent UI issues are actually build failures.

Layer 2: Screen Awareness (PostToolUse hooks)

{
  "hooks": {
    "PostToolUse": [{
      "matcher": "Edit|Write",
      "hooks": [{
        "type": "command",
        "command": "if echo \"$TOOL_INPUT\" | grep -qE '\\.(tsx?|jsx?)'; then xcrun simctl io booted screenshot /tmp/claude-sim-latest.png 2>/dev/null; fi"
      }]
    }]
  }
}

After every code edit, the simulator state is captured. Claude doesn't automatically read it (that would waste tokens), but it's there when Claude needs to investigate a visual issue.

Layer 3: Notification Pipeline (cmux notify)

{
  "matcher": "mcp__metro-mcp__get_bundle_errors",
  "hooks": [{
    "type": "command",
    "command": "if echo \"$TOOL_OUTPUT\" | grep -qiE 'error|failed'; then cmux notify --title 'Metro' --body 'Bundle error detected'; fi"
  }]
}

Errors flow from Metro → metro-mcp → cmux sidebar. The goal is to surface errors without having to watch the Metro terminal.

Layer 4: Autonomous Testing (ios-simulator-skill)

The ios-simulator-skill adds 21 Python scripts for semantic UI navigation:

# Analyze what's on screen (10 tokens, not 6,000)
python scripts/screen_mapper.py

# Find and tap by meaning, not coordinates
python scripts/navigator.py --find-text "Settings" --tap

# WCAG accessibility audit
python scripts/accessibility_audit.py

The design bet: semantic navigation should survive layout changes. Finding a button by its label ("Start Focus") should work whether you're on an iPhone SE or an iPad Pro, where coordinate-based taps would break on every screen size.

Combined with test flow definitions in CLAUDE.md, Claude should be able to autonomously crawl the app, check every screen for accessibility violations, and report bugs — the same pattern @0x__tom's tweet demonstrated.

CLAUDE.md as "Brain"

The most deliberate design choice wasn't any tool — it was teaching Claude when to use which tool via CLAUDE.md instructions:

### Debugging Priority
1. Check get_bundle_errors first (is it a build error?)
2. Check get_logs for runtime errors
3. Read /tmp/claude-sim-latest.png for visual state
4. Use ui_describe_all for accessibility tree
5. Use execute_js to inspect runtime state

This mirrors how a human developer debugs: check if it compiles, check the logs, look at the screen. The hypothesis is that without this priority order, Claude would default to expensive operations like screenshots or deep code inspection instead of the cheaper options.

The Full Stack

| Layer | Tool | Purpose | Tokens/use | |---|---|---|---| | Metro awareness | metro-mcp | Build errors, logs, REPL, state inspection | ~50-200 | | Screen awareness | PostToolUse hook | Auto-capture simulator after code edits | 0 (file on disk) | | Notifications | cmux notify hook | Push errors to cmux sidebar | 0 | | Low-level sim | ios-simulator MCP | Screenshots, taps, swipes, accessibility tree | 10-6,300 | | High-level testing | ios-simulator-skill | Semantic navigation, WCAG audit, test recording | 10-50 | | Debugging workflow | CLAUDE.md | Priority order, tool selection, caveats | 0 | | Token savings | RTK | CLI proxy reducing dev operation tokens | -60-90% |

Design Principles

These aren't "lessons learned" — I haven't put this through a real debugging session yet. They're the principles that guided the architecture decisions:

1. Audit before building. I spent the first chunk of time just reading what was already configured — MCP servers, hooks, settings, CLAUDE.md. Half the infrastructure was already there. Worth doing before reaching for new tools.

2. Research in parallel. Three investigation tracks running simultaneously informed each other. Learning about CDP from metro-mcp research helped clarify why the accessibility tree approach from ios-simulator-skill is cheaper than screenshots.

3. Layer for independent adoption. Each layer is designed to be independently useful. metro-mcp works without the screenshot hook. The screenshot hook works without cmux. The goal is to let the user adopt one layer at a time instead of all-or-nothing.

4. Prefer cheap signals by default. Accessibility tree (10-50 tokens) over screenshots (1,600-6,300 tokens). get_bundle_errors (one call) over reading Metro logs (scroll through everything). Whether this actually plays out in debugging sessions depends on how Claude weighs the choices — which is part of what I'll learn by using it.

What's Still a Gap

Simulator embedding in cmux — The dream. No plugin API for custom visual surfaces exists yet.
Continuous Metro log streaming — metro-mcp buffers logs but doesn't push. A background watcher script piping to cmux sidebar would close this.
Hot reload awareness — Claude doesn't know when Metro finishes a hot reload. A hook detecting reload completion + screenshot would tighten the feedback loop.
Autonomous test flow execution — The skill and tools are wired up. Whether Claude actually executes end-to-end flows cleanly (crawl all screens, check accessibility, report bugs) is untested.
Real-world validation — Everything above is a design. The lessons learned post comes after I've actually debugged something with it and shipped a release through it.

The design is a step toward closing the gap between mobile and web DX in Claude Code. Whether it actually feels that way in practice is something I'll find out when I put it through real work.