The Thought That Counts: Exploring Chain of Thought Prompting
We reproduce chain-of-thought experiments across base, instruction-tuned, and reasoning models. The results suggest that reported CoT gains on modern models are mostly artifacts of suppression, not necessarily reasoning improvements.
