posts / projects / about

Blog

Writing about AI tooling, Apple Silicon ML, and developer infrastructure.

2026

Design test — Typography, components, and patterns

Visual reference for type scale, callouts, dense data tables, data cards, bento grids, forms, and all markdown elements.
Site Launch

What this site is and what's coming.
Hybrid UI Detection: Why We Split Vision and Intelligence

uitag combines Apple Vision's text detection with a fine-tuned YOLO model to hit 90.8% coverage on ScreenSpot-Pro — faster and cheaper than VLM-only approaches.
Markdown & Component Reference

Every visual element this site supports — typography, tables, callouts, code, cards, grids, and forms.
Benchmarking UI Detection on ScreenSpot-Pro

How we evaluated uitag against 1,581 annotations across 26 professional macOS applications — methodology, results, and what the numbers actually mean.
Why Detection and Intelligence Should Be Separate Layers

The architectural argument for splitting UI perception from UI reasoning — and what happens when you don't.
GUI-Specialized Apple Silicon VLM Matrix

Which vision-language models actually work for UI tasks on M-series chips — tested configurations, latency numbers, and the models worth your time.
A Failure Mode Watchlist for Multi-Agent Systems

Naming the drift patterns that cause multi-agent work to silently degrade — because you can't fix what you haven't named.
mlx-triage: Preflight Validation for MLX Models

A practitioner-facing CLI that answers the question every Apple Silicon ML developer asks before benchmarking: is this model structurally sound?
CoT Suppression in Spatial UI Tasks

Chain-of-thought reasoning helps most LLM tasks. For spatial UI interaction, it actively hurts — here's what we found and why.
Building the Operations Layer for a Multi-Agent Development System

44 projects, 11 agent entities, 3 runtimes — what it takes to keep them coordinated without losing your mind.
The Operator Orchestration Workstation

Every agentic framework assumes a shared runtime. None are designed for the case where the human operator is the coordination mechanism — and that's the case most practitioners are living in.
Trust Calibration Without Confidence Scores

VLMs don't reliably report their own confidence. Leith builds trust from behavioral signals instead — and it works better.
Multi-Signal Verification for VLM UI Agents

One detection method is a guess. Two that agree are evidence. How Leith uses signal redundancy to make UI interaction reliable.
Memory Architecture for a Multi-Agent Ecosystem

How 11 agent entities share memory across sessions and projects without it decaying into noise — drift detection, extraction normalization, and metacognitive reliability scoring.

2025

Apple Silicon VLM Benchmark Roundup

A short public narrative covering what we tested, what we found, and what you should run if you're doing local multimodal inference on M-series hardware.
Interagent: A Coordination Protocol for Multi-Agent LLM Systems

When the human operator is the routing authority — not a bottleneck to be automated away — you need a different kind of protocol.
Portable Agent Memory: The Problem That Started Everything

When you close a Claude session, everything the agent learned disappears. This is the problem that led to the Metacognitive Memory System, the Interagent Protocol, and the entire multi-agent ecosystem.

© 2026 Swaylen Hayes

Press Esc or click anywhere to close