What it does
The reasoning layer between detection and action for macOS UI automation. Takes structured UI element detection from uitag and orchestrates VLM-powered interactions that adapt to uncertainty, learn from failure, and degrade gracefully.
Architecture / Key capabilities
- Prompt engineering for spatial tasks — CoT suppression for coordinate-level work where chain-of-thought reasoning introduces spatial drift, paired with structured prompting for higher-level planning tasks
- Multi-signal verification — Cross-references multiple detection signals before committing to an action, reducing false positives from any single detection method
- Tiered fallback chains — When a primary interaction path fails, cascades through progressively more conservative strategies rather than hard-failing
- Adaptive trust calibration — Dynamically adjusts confidence thresholds based on recent interaction success rates and UI complexity
- Episodic memory for UI agents — Maintains interaction history so agents can learn from their own successes and failures across sessions
Key numbers
- 570 tests passing
- F5.1 + F5.2 fully implemented
- All mechanical P3 work complete
Current phase
Phase P3 (Interaction Planning) in progress. Classification caching identified as hard requirement. Episodic memory (3-layer with rollback) and trust calibration UX blocked on T007.
Status
Active — next milestones: F5.3 episodic memory, F5.4 trust calibration lifecycle, F6.1 app settings indexer
Links
MISSING — Repository URL