2 min read

CoT Suppression in Spatial UI Tasks

Table of Contents

[YOUR VOICE] The Claim

Chain-of-thought prompting is the default recommendation for complex LLM tasks. But spatial UI tasks — clicking a specific button, reading a specific label, enumerating visible elements — degrade when the model is asked to reason step-by-step. The reasoning introduces spatial hallucinations.


The Mechanism

MISSING — Experimental setup: same UI tasks with and without CoT prompting across multiple VLMs

MISSING — Specific failure patterns observed (coordinate drift during reasoning, element hallucination in enumeration, spatial confusion in multi-step CoT)

MISSING — The suppression technique used in Leith and its effect on accuracy


The Evidence

MISSING — Comparative accuracy table: CoT-enabled vs CoT-suppressed across task types

MISSING — Example failure cases showing spatial hallucination during CoT


[YOUR VOICE] Implications

MISSING — When to use CoT and when to suppress it. The broader lesson about prompt engineering for spatial tasks.


Open Questions

  • Is this a VLM architecture limitation or a training data gap?
  • Do models fine-tuned on spatial tasks still exhibit this problem?
  • What’s the minimum reasoning the model needs to complete multi-step UI tasks without CoT?

Reference Documents

DocumentWhat it covers
Leith _docs/MISSING — CoT suppression implementation and results
Prompt engineering experimentsMISSING — Full experimental protocol