omc345 notes

Yo, fam, let's break down this benchmark page from SharpAI. It's all about proving local AI (running on your MacBook) can hang with the cloud big boys like GPT-5.4 in real home security workflows. No fluff, straight fire results.

1️⃣ WHY? The Pain It Solves 👇

Before this, home security AI was cloud-locked:

Privacy nightmare 💀: Sending camera feeds to OpenAI? Nope, that's your house data pinging servers.
API costs stacking up 💸: Every alert = bill.
Latency + downtime: Cloud hiccups mean delayed "intruder alert."
No offline mode: Power outage? Blind.

HomeSec-Bench exists to prove local LLMs crush it — 93.8% pass rate on a 9B model using just 13.8GB on M5 MacBook. Zero costs, full privacy, 25 tok/s speed. "Ohhhh" moment: Your laptop > cloud for domain-specific tasks. 🤯

PROBLEM (CLOUD-ONLY) ❌          SOLUTION (LOCAL AI) ✅
════════════════════════        ═════════════════════
Privacy leaks                  │ Full data lockdown
API $$$                        │ Free forever
Cloud lag (601ms TTFT)         │ 435ms on 35B-MoE
Offline? Nope                  │ Runs anywhere

2️⃣ Big Picture: What + Where It Fits 🚀

HomeSec-Bench v1: 96 LLM tests + 35 VLM tests across 15 suites.
Tests real home sec AI flows: Triage events, dedupe visitors, tool calls, resist hacks.
Run on Apple Silicon (M5 MacBooks) via llama.cpp.
Compares local Qwen3.5 models (🏠) vs OpenAI cloud (☁️).
Part of SharpAI Aegis: Local-first home security app.

Fits in the local AI revolution — edge devices beating datacenter beasts on niche tasks.

LOCAL (Your Mac) ───► HomeSec-Bench ───► Scores vs Cloud
     │
     ▼
Camera Feed ──► LLM Triage ──► Alert (or chill)

3️⃣ How It Works: Step-by-Step ⚙️

1️⃣ Setup: Feed AI-generated fixture images + prompts mimicking home cams. 2️⃣ Run tests: Against OpenAI-compatible endpoints (local or cloud). 3️⃣ Score: Pass/fail on 96 evals. Measures accuracy, speed, memory. 4️⃣ Suites: Grouped by skill (see below).

Leaderboard Snippet (Top 5, Pass Rate %):

| Rank | Model | Type | Passed | Pass Rate | Time | |------|------------------------|-------|--------|-----------|---------| | 🥇 | GPT-5.4 | ☁️ | 94 | 97.9% | 2m22s | | 🥈 | GPT-5.4-mini | ☁️ | 92 | 95.8% | 1m17s | | 🥉 | Qwen3.5-9B (Q4_K_M) | 🏠 | 90 | 93.8%| 5m23s | | 4 | Qwen3.5-27B (Q4_K_M) | 🏠 | 90 | 93.8%|15m8s | | 5 | Qwen3.5-122B-MoE | 🏠 | 89 | 92.7% | 8m26s |

Key Metrics (Local crushes on privacy/cost, close on accuracy):

Time to First Token (Lower = Better) 📉
Qwen3.5-35B-MoE: 435ms  ← Beats ALL cloud!
GPT-5.4-nano:    508ms
↓↓↓

Decode Speed (Higher = Better) 📈
GPT-5.4-mini: 234 tok/s
Qwen3.5-9B:    25 tok/s   ← Still snappy on laptop

Memory (Local Only):
Qwen3.5-9B: 13.8 GB  🔥 Fits M5 Mac

15 Test Suites (Nested for clarity):

Core Reasoning: • Context Preprocessing (6 tests): Dedupe convos. • Topic Classification (4): Route to right handler.
Security Flows: • Event Deduplication (8): "Same dude on cam1/cam2?" • Security Classification (12): Normal → Critical. • VLM-to-Alert Triage (5): Vision → Urgency → Dispatch.
Tools & Robustness: • Tool Use (16): Pick tool + params right. • Prompt Injection Resistance (4): Don't get jailbroken. • Multi-Turn Reasoning (4): Remember past events.
Extras: • Chat/JSON (11), Narrative (4), Error Recovery (4), etc.

4️⃣ Details + Edges 💡

Quantization: Q4_K_M/IQ1_M = smaller/faster models (trade tiny accuracy for speed).
GPT-5-mini flopped (62.5%) cuz API whined about temp settings 😂.
All local on macOS 15.3 arm64 — no NVIDIA needed.
Watch it live: Vid shows tests firing in real-time.
GitHub: https://github.com/SharpAI/DeepCamera/tree/master/skills/analysis/home-security-benchmark

TL;DR / LOCK IT IN 🎯
Local Qwen3.5-9B: 93.8% (4pts behind GPT-5.4), 25 tok/s on M5 Mac, zero cost/privacy win. Benchmark = proof local AI ready for home sec. Download Aegis and run it yourself. You tracking? 🚀

Original article