NAT Traversal for Peer-to-Peer Streaming: STUN, TURN, and MQTT Signaling

NAT Traversal for Peer-to-Peer Streaming: STUN, TURN, and MQTT Signaling

Remote desktop streaming over the internet requires solving a fundamental networking problem: both the host Mac and the client iPad are usually behind NATs (Network Address Translators) that block incoming connections. Neither device has a public IP address, and neither can accept connections from the other.

This post covers the complete NAT traversal stack used in remote desktop apps like Astropad Workbench.

The Problem

Mac (host)                                iPad (client)
192.168.1.10[NAT Router]98.51.x.x    10.0.0.5[NAT Router]72.84.x.x

Neither side can initiate a connection to the other.

The Solution Stack

1. SIGNALING (MQTT)  — Discover peers, exchange connection info
2. STUN              — Discover your own public IP:port
3. ICE               — Try all candidate connections in parallel
4. TURN              — Relay fallback when direct connection fails
5. QUIC              — Actual data transport once connected

Layer 1: MQTT Signaling

Before any peer-to-peer connection, both devices need a meeting point. Astropad uses MQTT over TLS:

Production: mqtts://signal.astropad.com:1883
Cloud:      mqtts://signal.astropadcloud.com

Setting Up a Signaling Server

# Install Mosquitto (MQTT broker)
brew install mosquitto

# Configure with TLS and authentication
cat > /opt/homebrew/etc/mosquitto/mosquitto.conf << EOF
listener 8883
protocol mqtt
cafile /path/to/ca.crt
certfile /path/to/server.crt
keyfile /path/to/server.key
require_certificate false
password_file /opt/homebrew/etc/mosquitto/passwords
EOF

# Create user
mosquitto_passwd -c /opt/homebrew/etc/mosquitto/passwords liquid-user

# Start
mosquitto -c /opt/homebrew/etc/mosquitto/mosquitto.conf

Signaling Protocol

Each device publishes its presence and connection candidates:

Topics:
  liquid/devices/{device_id}/online     — Presence (retained message)
  liquid/devices/{device_id}/offer      — Connection offer with candidates
  liquid/devices/{device_id}/answer     — Connection answer with candidates
  liquid/devices/{device_id}/ice        — Additional ICE candidates (trickle)

Offer message:

{
  "device_id": "abc123",
  "device_name": "My MacBook Pro",
  "public_key": "ed25519:base64...",
  "candidates": [
    {"type": "host", "ip": "192.168.1.10", "port": 9876},
    {"type": "srflx", "ip": "98.51.42.17", "port": 45123},
    {"type": "relay", "ip": "turn.example.com", "port": 3478}
  ]
}

Layer 2: STUN — Discover Your Public IP

STUN (Session Traversal Utilities for NAT) tells you your public-facing IP:port.

How STUN Works

Your device (192.168.1.10:12345)
    → NAT translates to (98.51.42.17:45123)
    → Sends UDP packet to STUN server (stun.l.google.com:19302)
    → STUN server replies: "I see you as 98.51.42.17:45123"
    → Now you know your public endpoint

Swift STUN Client

import Network

func discoverPublicEndpoint() async throws -> (String, UInt16) {
    let stunServer = NWEndpoint.Host("stun.l.google.com")
    let stunPort = NWEndpoint.Port(rawValue: 19302)!
    
    let connection = NWConnection(
        host: stunServer,
        port: stunPort,
        using: .udp
    )
    
    return try await withCheckedThrowingContinuation { continuation in
        connection.stateUpdateHandler = { state in
            if case .ready = state {
                // Send STUN Binding Request
                let request = buildSTUNBindingRequest()
                connection.send(content: request, completion: .contentProcessed { error in
                    if let error = error {
                        continuation.resume(throwing: error)
                        return
                    }
                })
                
                // Receive STUN Binding Response
                connection.receiveMessage { data, _, _, error in
                    if let data = data {
                        let (ip, port) = parseSTUNResponse(data)
                        continuation.resume(returning: (ip, port))
                    }
                }
            }
        }
        connection.start(queue: .global())
    }
}

func buildSTUNBindingRequest() -> Data {
    var data = Data(capacity: 20)
    // Message Type: Binding Request (0x0001)
    data.append(contentsOf: [0x00, 0x01])
    // Message Length: 0 (no attributes)
    data.append(contentsOf: [0x00, 0x00])
    // Magic Cookie: 0x2112A442
    data.append(contentsOf: [0x21, 0x12, 0xA4, 0x42])
    // Transaction ID: 12 random bytes
    data.append(contentsOf: (0..<12).map { _ in UInt8.random(in: 0...255) })
    return data
}

func parseSTUNResponse(_ data: Data) -> (String, UInt16) {
    // Parse XOR-MAPPED-ADDRESS attribute
    // ... (RFC 5389 parsing logic)
    // Returns the public IP and port
}

A minimal STUN client is about 200 lines. For production, use a library.

Layer 3: ICE — Try All Candidates

ICE (Interactive Connectivity Establishment) tries all possible connection paths in parallel:

struct ConnectionCandidate {
    enum CandidateType { case host, serverReflexive, relay }
    let type: CandidateType
    let ip: String
    let port: UInt16
    let priority: Int
}

func gatherCandidates() async -> [ConnectionCandidate] {
    var candidates: [ConnectionCandidate] = []
    
    // 1. Host candidates (local IPs)
    for interface in getLocalInterfaces() {
        candidates.append(ConnectionCandidate(
            type: .host,
            ip: interface.ip,
            port: listeningPort,
            priority: 100
        ))
    }
    
    // 2. Server-reflexive candidates (STUN)
    let (publicIP, publicPort) = try await discoverPublicEndpoint()
    candidates.append(ConnectionCandidate(
        type: .serverReflexive,
        ip: publicIP,
        port: publicPort,
        priority: 50
    ))
    
    // 3. Relay candidates (TURN)
    let (relayIP, relayPort) = try await allocateTURNRelay()
    candidates.append(ConnectionCandidate(
        type: .relay,
        ip: relayIP,
        port: relayPort,
        priority: 10
    ))
    
    return candidates.sorted { $0.priority > $1.priority }
}

Simultaneous Connection Attempts

The key insight from Tailscale's NAT traversal blog: send packets from both sides simultaneously. This punches holes in both NATs at the same time.

func attemptConnection(to candidates: [ConnectionCandidate]) async -> NWConnection? {
    // Try all candidates in parallel
    return await withTaskGroup(of: NWConnection?.self) { group in
        for candidate in candidates {
            group.addTask {
                let connection = NWConnection(
                    host: NWEndpoint.Host(candidate.ip),
                    port: NWEndpoint.Port(rawValue: candidate.port)!,
                    using: .quic(alpn: ["liquid-v1"])
                )
                // Attempt with timeout
                return try? await withTimeout(seconds: 5) {
                    await connectAndVerify(connection)
                }
            }
        }
        
        // Return the first successful connection
        for await connection in group {
            if let connection = connection {
                group.cancelAll()
                return connection
            }
        }
        return nil
    }
}

Layer 4: TURN Relay (Fallback)

When direct connection fails (symmetric NATs, corporate firewalls), TURN provides a relay:

Deploying coturn

# On a VPS (Ubuntu)
apt install coturn

# Configure /etc/turnserver.conf
listening-port=3478
tls-listening-port=5349
realm=liquid.example.com
server-name=liquid.example.com
fingerprint
lt-cred-mech
user=liquid:secretpassword
total-quota=100
max-bps=50000000  # 50 Mbps max per session
cert=/etc/letsencrypt/live/liquid.example.com/fullchain.pem
pkey=/etc/letsencrypt/live/liquid.example.com/privkey.pem

Multi-Region Deployment

Astropad has "11 relay regions." For a self-hosted solution:

| Region | Provider | Cost | |--------|----------|------| | US East | AWS Lightsail ($5/mo) | ~$60/yr | | US West | AWS Lightsail ($5/mo) | ~$60/yr | | EU | Hetzner ($4/mo) | ~$48/yr | | Asia | Vultr ($5/mo) | ~$60/yr |

Use DNS-based routing (Route 53 latency-based routing or Cloudflare Workers) to direct clients to the nearest relay.

Relay Statistics

From Tailscale's blog:

  • Direct UDP connectivity succeeds ~90% of the time
  • Only ~10% of connections need TURN relay
  • Relay adds 10-50ms latency depending on relay location

Layer 5: End-to-End Encryption

Even with QUIC's TLS 1.3, you may want additional encryption for TURN relay scenarios (where relay operators could theoretically inspect traffic):

// Generate Ed25519 keypair per device
let privateKey = Curve25519.Signing.PrivateKey()
let publicKey = privateKey.publicKey

// Store in Keychain
// Exchange public keys during MQTT signaling
// Verify peer identity: Trust On First Use (TOFU)

// Derive shared secret for AES-256-GCM
let sharedSecret = try privateKey.keyAgreement(with: peerPublicKey)
let symmetricKey = sharedSecret.hkdfDerivedSymmetricKey(
    using: SHA256.self,
    salt: sessionSalt,
    sharedInfo: Data("liquid-v1".utf8),
    outputByteCount: 32
)

What Astropad Does (From Binary Analysis)

Astropad's networking stack revealed in their binary:

  • Signaling: MQTT via mqtts://signal.astropad.com:1883
  • Transport: QUIC via quinn-proto 0.11.9
  • NAT Traversal: STUN + TURN (error strings: STUN Error:, TURN Error:, QUIC/TURN task destination:)
  • Auth: JWT tokens (HS256, ES256, EdDSA) with shared sessions
  • Peer Identity: Ed25519-based (astro-key-store, astro-peer-identity)
  • Connection Types: QUIC, USB, QR code, manual IP, auto-discovery
  • Encryption: QUIC TLS 1.3 + AES-256-GCM + CHACHA20_POLY1305

Their liquid_net::connectivity_state::inner_state manages the full state machine of connection establishment, and liquid_net::router handles message routing across channels.

Complete Connection Flow

1. Host starts → publishes to MQTT (online, public key)
2. Client starts → subscribes to MQTT, sees host
3. Client gathers candidates (host IPs, STUN, TURN)
4. Client sends offer to host via MQTT
5. Host gathers its own candidates
6. Host sends answer to client via MQTT
7. Both sides start simultaneous QUIC connection attempts
8. First successful connection wins
9. Verify peer identity (public key from MQTT matches QUIC cert)
10. Streaming begins

Keeping Connections Alive

  • Send keepalive packets every 15 seconds (NAT mappings expire in ~30s)
  • Monitor connection quality (RTT, packet loss)
  • If quality degrades, try upgrading to a better candidate (e.g., direct → relay → direct)
  • QUIC connection migration handles WiFi↔cellular transitions automatically

Part 10 of the "Building a Remote Desktop from Scratch" series. Based on reverse engineering analysis of Astropad Workbench 1.1.0.

← All notes