2 min read

Leith

Table of Contents

What it does

The reasoning layer between detection and action for macOS UI automation. Takes structured UI element detection from uitag and orchestrates VLM-powered interactions that adapt to uncertainty, learn from failure, and degrade gracefully.

Architecture / Key capabilities

  • Prompt engineering for spatial tasks — CoT suppression for coordinate-level work where chain-of-thought reasoning introduces spatial drift, paired with structured prompting for higher-level planning tasks
  • Multi-signal verification — Cross-references multiple detection signals before committing to an action, reducing false positives from any single detection method
  • Tiered fallback chains — When a primary interaction path fails, cascades through progressively more conservative strategies rather than hard-failing
  • Adaptive trust calibration — Dynamically adjusts confidence thresholds based on recent interaction success rates and UI complexity
  • Episodic memory for UI agents — Maintains interaction history so agents can learn from their own successes and failures across sessions

Key numbers

  • 570 tests passing
  • F5.1 + F5.2 fully implemented
  • All mechanical P3 work complete

Current phase

Phase P3 (Interaction Planning) in progress. Classification caching identified as hard requirement. Episodic memory (3-layer with rollback) and trust calibration UX blocked on T007.

Status

Active — next milestones: F5.3 episodic memory, F5.4 trust calibration lifecycle, F6.1 app settings indexer

MISSING — Repository URL