[YOUR VOICE] The Claim
Chain-of-thought prompting is the default recommendation for complex LLM tasks. But spatial UI tasks — clicking a specific button, reading a specific label, enumerating visible elements — degrade when the model is asked to reason step-by-step. The reasoning introduces spatial hallucinations.
The Mechanism
MISSING — Experimental setup: same UI tasks with and without CoT prompting across multiple VLMs
MISSING — Specific failure patterns observed (coordinate drift during reasoning, element hallucination in enumeration, spatial confusion in multi-step CoT)
MISSING — The suppression technique used in Leith and its effect on accuracy
The Evidence
MISSING — Comparative accuracy table: CoT-enabled vs CoT-suppressed across task types
MISSING — Example failure cases showing spatial hallucination during CoT
[YOUR VOICE] Implications
MISSING — When to use CoT and when to suppress it. The broader lesson about prompt engineering for spatial tasks.
Open Questions
- Is this a VLM architecture limitation or a training data gap?
- Do models fine-tuned on spatial tasks still exhibit this problem?
- What’s the minimum reasoning the model needs to complete multi-step UI tasks without CoT?
Reference Documents
| Document | What it covers |
|---|---|
| Leith _docs/ | MISSING — CoT suppression implementation and results |
| Prompt engineering experiments | MISSING — Full experimental protocol |