Building a Screen Capture Pipeline with ScreenCaptureKit
Building a Screen Capture Pipeline with ScreenCaptureKit
ScreenCaptureKit is Apple's modern framework for high-performance screen capture on macOS. Introduced at WWDC 2022, it replaced the older CGDisplayStream and AVCaptureScreenInput APIs with a GPU-accelerated pipeline that delivers sub-frame latency and fine-grained content filtering.
This post walks through building a production-ready screen capture pipeline — the kind used in remote desktop apps like Astropad Workbench.
Why ScreenCaptureKit?
| API | Introduced | Latency | GPU Accelerated | Content Filtering | |-----|-----------|---------|----------------|-------------------| | CGDisplayStream | macOS 10.8 | ~20-50ms | No | No | | AVCaptureScreenInput | macOS 10.15 | ~50-100ms | No | No | | ScreenCaptureKit | macOS 12.3 | <10ms | Yes | Yes |
ScreenCaptureKit is the only option for real-time streaming. Astropad Workbench uses it exclusively (confirmed via otool -L showing ScreenCaptureKit.framework linked).
The Pipeline
SCShareableContent (discover displays/windows)
→ SCContentFilter (select what to capture)
→ SCStreamConfiguration (resolution, FPS, pixel format)
→ SCStream (start capture)
→ SCStreamOutput delegate (receive CMSampleBuffer)
→ CVPixelBuffer (backed by IOSurface — zero-copy GPU)
Step 1: Discover Available Content
let content = try await SCShareableContent.excludingDesktopWindows(
false,
onScreenWindowsOnly: true
)
// Enumerate displays
for display in content.displays {
print("Display \(display.displayID): \(display.width)x\(display.height)")
}
// Enumerate windows
for window in content.windows {
print("Window: \(window.title ?? "untitled") — \(window.owningApplication?.applicationName ?? "")")
}
SCShareableContent is a snapshot — it doesn't update live. Call it again when you need a fresh list.
Step 2: Create a Content Filter
// Capture entire display, excluding our own app
let excludedApps = content.applications.filter {
$0.bundleIdentifier == Bundle.main.bundleIdentifier
}
let filter = SCContentFilter(
display: content.displays[0],
excludingApplications: excludedApps,
exceptingWindows: []
)
You can also capture a single window:
let filter = SCContentFilter(desktopIndependentWindow: targetWindow)
Step 3: Configure the Stream
let config = SCStreamConfiguration()
// Resolution — match the display or downscale
config.width = 1920
config.height = 1080
// Frame rate — 60 FPS for smooth streaming
config.minimumFrameInterval = CMTime(value: 1, timescale: 60)
// Pixel format — BGRA for Metal compatibility
config.pixelFormat = kCVPixelFormatType_32BGRA
// Cursor
config.showsCursor = true
// Audio capture (macOS 13+)
config.capturesAudio = true
config.sampleRate = 48000
config.channelCount = 2
Critical detail: Set pixelFormat to kCVPixelFormatType_32BGRA (not YUV). While YUV is more efficient for video encoding, BGRA gives you direct Metal texture compatibility for the tile diffing pipeline. VideoToolbox accepts BGRA as input and handles the color space conversion internally on the GPU.
Step 4: Start Capturing
let stream = SCStream(filter: filter, configuration: config, delegate: self)
// Add output handler — separate queues for video and audio
let videoQueue = DispatchQueue(label: "com.liquid.capture.video", qos: .userInteractive)
let audioQueue = DispatchQueue(label: "com.liquid.capture.audio", qos: .userInteractive)
try stream.addStreamOutput(self, type: .screen, sampleHandlerQueue: videoQueue)
try stream.addStreamOutput(self, type: .audio, sampleHandlerQueue: audioQueue)
try await stream.startCapture()
Step 5: Handle Captured Frames
extension ScreenCaptureService: SCStreamOutput {
func stream(_ stream: SCStream, didOutputSampleBuffer sampleBuffer: CMSampleBuffer,
of type: SCStreamOutputType) {
switch type {
case .screen:
handleVideoFrame(sampleBuffer)
case .audio:
handleAudioFrame(sampleBuffer)
@unknown default:
break
}
}
}
func handleVideoFrame(_ sampleBuffer: CMSampleBuffer) {
// CRITICAL: Check frame status before processing
guard let attachments = CMSampleBufferGetSampleAttachmentsArray(sampleBuffer, createIfNecessary: false)
as? [[SCStreamFrameInfo: Any]],
let statusValue = attachments.first?[.status] as? Int,
let status = SCFrameStatus(rawValue: statusValue),
status == .complete else {
return // Skip incomplete, idle, or blank frames
}
// Extract the pixel buffer (IOSurface-backed, zero-copy)
guard let pixelBuffer = CMSampleBufferGetImageBuffer(sampleBuffer) else { return }
// Get presentation timestamp
let pts = CMSampleBufferGetPresentationTimeStamp(sampleBuffer)
// Pass to encoder pipeline
encoder.encode(pixelBuffer: pixelBuffer, presentationTime: pts)
}
The frame status check is essential. Without it, you'll process blank frames (when screen is idle), incomplete frames (during transitions), and suspended frames. Astropad's binary shows they check SCFrameStatus in their liquid_screencap::capture::capture_macos::sc_kit::stream::output module.
The IOSurface Zero-Copy Advantage
The CVPixelBuffer from ScreenCaptureKit is backed by an IOSurface — a GPU memory buffer shared across frameworks:
// Get the underlying IOSurface
let ioSurface = CVPixelBufferGetIOSurface(pixelBuffer)!
// Create a Metal texture directly from it — ZERO CPU COPY
let textureDesc = MTLTextureDescriptor.texture2DDescriptor(
pixelFormat: .bgra8Unorm,
width: CVPixelBufferGetWidth(pixelBuffer),
height: CVPixelBufferGetHeight(pixelBuffer),
mipmapped: false
)
let metalTexture = device.makeTexture(
descriptor: textureDesc,
iosurface: ioSurface,
plane: 0
)
This is how you achieve sub-millisecond capture-to-GPU time. The pixel data never touches the CPU.
Permissions
ScreenCaptureKit requires the Screen Recording permission. The first capture attempt triggers the system dialog. Check programmatically:
// There's no direct API to check Screen Recording permission
// The recommended approach is to attempt capture and handle failure
// In practice, SCShareableContent.getWithCompletionHandler will
// return empty results if permission is denied
Add to your Info.plist:
<key>NSScreenCaptureUsageDescription</key>
<string>Screen capture is required for remote desktop streaming.</string>
Performance Tips
- Use
.userInteractiveQoS for the output queue — this ensures frames are delivered with minimal scheduling delay - Don't block the output callback — hand off to an async pipeline immediately
- Match the display's native resolution when possible — downscaling happens on the GPU anyway
- Set
minimumFrameIntervalappropriately — 60fps is good for streaming, but 30fps halves the encoding load - Filter out your own app — prevents the infinite mirror effect and saves GPU work
What Astropad Does Differently
From our binary analysis, Astropad's liquid_screencap module also:
- Uses a
display_streamabstraction that can switch between SCStream backends - Implements
img_processingdirectly on the captured IOSurface (GPU diffing) - Has a
filters::windows_wrapperslayer for fine-grained window capture - Tracks
backendstate for capture lifecycle management
Next Steps
With screen capture working, the next step is encoding these frames into H.264/HEVC using VideoToolbox — which we cover in the next post.
Part 2 of the "Building a Remote Desktop from Scratch" series. Based on reverse engineering analysis of Astropad Workbench 1.1.0.