Hardware Video Encoding with VideoToolbox: Zero to HEVC in 1ms
Hardware Video Encoding with VideoToolbox: Zero to HEVC in 1ms
VideoToolbox is Apple's framework for hardware-accelerated video encoding and decoding. On Apple Silicon, the dedicated Media Engine can encode 4K HEVC in under 1 millisecond — making real-time screen streaming not just possible, but trivial from a computational perspective.
This post covers configuring VideoToolbox for the lowest possible latency in a remote desktop streaming context.
Why Hardware Encoding Matters
| Approach | Encode Time (1080p60) | CPU Usage | Power | |----------|----------------------|-----------|-------| | x264 (software) | 15-30ms | 100%+ | High | | VideoToolbox H.264 | ~2ms | <5% | Low | | VideoToolbox HEVC | ~2ms | <5% | Low | | VideoToolbox HEVC (M4) | <1ms | <2% | Minimal |
Astropad Workbench uses VideoToolbox exclusively (VideoToolbox.framework linked in binary). Their internal module liquid_codec::video_encoder::apple wraps the VTCompressionSession API.
The Encoding Pipeline
CVPixelBuffer (from ScreenCaptureKit)
→ VTCompressionSession (hardware encoder)
→ CMSampleBuffer (encoded NAL units)
→ Extract H.264/HEVC bitstream
→ Network transport
Step 1: Create the Compression Session
var session: VTCompressionSession?
let status = VTCompressionSessionCreate(
allocator: kCFAllocatorDefault,
width: Int32(width),
height: Int32(height),
codecType: kCMVideoCodecType_HEVC, // or kCMVideoCodecType_H264
encoderSpecification: nil, // Use default (hardware) encoder
imageBufferAttributes: nil,
compressedDataAllocator: nil,
outputHandler: { status, flags, sampleBuffer in
guard status == noErr, let sampleBuffer = sampleBuffer else { return }
self.handleEncodedFrame(sampleBuffer)
},
refcon: nil,
compressionSessionOut: &session
)
guard status == noErr, let session = session else {
fatalError("Failed to create compression session: \(status)")
}
Step 2: Configure for Real-Time Streaming
This is where most developers go wrong. The default settings optimize for file recording, not streaming.
// CRITICAL: Enable real-time encoding
VTSessionSetProperty(session, key: kVTCompressionPropertyKey_RealTime, value: kCFBooleanTrue)
// Disable B-frames (they add latency due to reordering)
VTSessionSetProperty(session, key: kVTCompressionPropertyKey_AllowFrameReordering, value: kCFBooleanFalse)
// Keyframe interval — every 2 seconds for recovery
VTSessionSetProperty(session, key: kVTCompressionPropertyKey_MaxKeyFrameIntervalDuration, value: 2.0 as CFNumber)
// Target bitrate — 20 Mbps for Retina quality
VTSessionSetProperty(session, key: kVTCompressionPropertyKey_AverageBitRate, value: 20_000_000 as CFNumber)
// Expected frame rate — helps the rate controller
VTSessionSetProperty(session, key: kVTCompressionPropertyKey_ExpectedFrameRate, value: 60 as CFNumber)
// Profile — Main for broad compatibility
VTSessionSetProperty(session, key: kVTCompressionPropertyKey_ProfileLevel,
value: kVTProfileLevel_HEVC_Main_AutoLevel)
// Allow open GOP — slightly better compression
VTSessionSetProperty(session, key: kVTCompressionPropertyKey_AllowOpenGOP, value: kCFBooleanTrue)
// Prepare the session
VTCompressionSessionPrepareToEncodeFrames(session)
Key insight from Lumen (Sunshine fork): The upstream Sunshine had a bug on Apple Silicon where it set max_ref_frames=1 for H.264, forcing every frame to be a keyframe. This tripled bandwidth. The fix: only set max_ref_frames=1 for HEVC, not H.264. Let the encoder use P-frames.
Step 3: Encode Frames
func encode(pixelBuffer: CVPixelBuffer, presentationTime: CMTime) {
let duration = CMTime(value: 1, timescale: 60) // 60fps
VTCompressionSessionEncodeFrame(
session,
imageBuffer: pixelBuffer,
presentationTimeStamp: presentationTime,
duration: duration,
frameProperties: nil,
infoFlagsOut: nil,
outputHandler: nil // Using session-level handler
)
}
The CVPixelBuffer from ScreenCaptureKit is IOSurface-backed, so VideoToolbox processes it directly on the GPU — zero CPU copy.
Step 4: Extract the Bitstream
The output handler receives CMSampleBuffer containing encoded data. Extract it:
func handleEncodedFrame(_ sampleBuffer: CMSampleBuffer) {
// Check if this is a keyframe
let attachments = CMSampleBufferGetSampleAttachmentsArray(sampleBuffer, createIfNecessary: false)
let isKeyframe: Bool
if let dict = (attachments as? [[CFString: Any]])?.first {
isKeyframe = !(dict[kCMSampleAttachmentKey_NotSync] as? Bool ?? false)
} else {
isKeyframe = true
}
// For keyframes, extract parameter sets (SPS/PPS for H.264, VPS/SPS/PPS for HEVC)
if isKeyframe {
extractParameterSets(from: sampleBuffer)
}
// Get the encoded data
guard let dataBuffer = CMSampleBufferGetDataBuffer(sampleBuffer) else { return }
var length: Int = 0
var dataPointer: UnsafeMutablePointer<Int8>?
CMBlockBufferGetDataPointer(dataBuffer, atOffset: 0, lengthAtOffsetOut: nil,
totalLengthOut: &length, dataPointerOut: &dataPointer)
guard let pointer = dataPointer else { return }
let data = Data(bytes: pointer, count: length)
// data now contains AVCC/HVCC formatted NAL units
// Convert to Annex B format for network transport if needed
let pts = CMSampleBufferGetPresentationTimeStamp(sampleBuffer)
sendToNetwork(data: data, isKeyframe: isKeyframe, pts: pts)
}
Step 5: Extract Parameter Sets
The decoder needs SPS/PPS (H.264) or VPS/SPS/PPS (HEVC) to initialize:
func extractParameterSets(from sampleBuffer: CMSampleBuffer) {
guard let formatDesc = CMSampleBufferGetFormatDescription(sampleBuffer) else { return }
// For HEVC
var parameterSetCount: Int = 0
CMVideoFormatDescriptionGetHEVCParameterSetAtIndex(
formatDesc, parameterSetIndex: 0, parameterSetPointerOut: nil,
parameterSetSizeOut: nil, parameterSetCountOut: ¶meterSetCount,
nalUnitHeaderLengthOut: nil
)
var parameterSets: [Data] = []
for i in 0..<parameterSetCount {
var pointer: UnsafePointer<UInt8>?
var size: Int = 0
CMVideoFormatDescriptionGetHEVCParameterSetAtIndex(
formatDesc, parameterSetIndex: i,
parameterSetPointerOut: &pointer, parameterSetSizeOut: &size,
parameterSetCountOut: nil, nalUnitHeaderLengthOut: nil
)
if let pointer = pointer {
parameterSets.append(Data(bytes: pointer, count: size))
}
}
// Send parameter sets to decoder via reliable control channel
sendParameterSets(parameterSets)
}
HEVC vs H.264: Which to Choose?
| Aspect | H.264 | HEVC (H.265) | |--------|-------|------| | Encode speed (Apple Silicon) | ~2ms | ~2ms | | Compression ratio | Baseline | 30-40% better | | Hardware support | All Macs | Apple Silicon + some Intel | | Decode compatibility | Universal | Apple Silicon preferred | | Latency | Same | Same |
Recommendation: Use HEVC as primary, fall back to H.264 for Intel Macs. Astropad does exactly this — their binary shows h264_support and hevc_support capability flags exchanged during handshake.
Bitrate Control for Streaming
For streaming, you want Constant Bitrate (CBR) or constrained VBR:
// Average bitrate
VTSessionSetProperty(session, key: kVTCompressionPropertyKey_AverageBitRate,
value: targetBitrate as CFNumber)
// Data rate limits (for constrained VBR)
// Format: [bytes_per_second, time_window_in_seconds]
let limits: [Int] = [targetBitrate / 8, 1] // Allow burst within 1-second window
VTSessionSetProperty(session, key: kVTCompressionPropertyKey_DataRateLimits,
value: limits as CFArray)
Astropad uses adaptive bitrate via their liquid_rate_control module with a Kalman filter for bandwidth estimation. For a simpler approach, adjust bitrate based on round-trip time:
func adjustBitrate(currentRTT: TimeInterval) {
let targetBitrate: Int
switch currentRTT {
case ..<0.01: targetBitrate = 30_000_000 // <10ms RTT: 30 Mbps
case ..<0.03: targetBitrate = 20_000_000 // <30ms: 20 Mbps
case ..<0.1: targetBitrate = 10_000_000 // <100ms: 10 Mbps
default: targetBitrate = 5_000_000 // >100ms: 5 Mbps
}
VTSessionSetProperty(session, key: kVTCompressionPropertyKey_AverageBitRate,
value: targetBitrate as CFNumber)
}
Requesting Keyframes on Demand
When a new client connects or packet loss is detected, force an immediate keyframe:
let frameProperties: [CFString: Any] = [
kVTEncodeFrameOptionKey_ForceKeyFrame: true
]
VTCompressionSessionEncodeFrame(
session, imageBuffer: pixelBuffer,
presentationTimeStamp: pts, duration: duration,
frameProperties: frameProperties as CFDictionary,
infoFlagsOut: nil, outputHandler: nil
)
Astropad tracks codec.keyframes.regular and codec.keyframes.recovery separately — recovery keyframes are requested when the decoder detects corruption.
Measuring Encode Latency
let startTime = mach_absolute_time()
VTCompressionSessionEncodeFrame(session, ...)
// In the output handler:
let endTime = mach_absolute_time()
let elapsedNs = machTimeToNanoseconds(endTime - startTime)
print("Encode latency: \(elapsedNs / 1_000_000)ms")
On an M1 Pro, expect ~2ms for 1080p60 HEVC. On M4, under 1ms.
Part 3 of the "Building a Remote Desktop from Scratch" series. Based on reverse engineering analysis of Astropad Workbench 1.1.0.