login
Home / Papers / The Expressive Power of Transformers with Chain of Thought

The Expressive Power of Transformers with Chain of Thought

64 Citations2024
William Merrill, Ashish Sabharwal
ArXiv

This paper aims to demonstrate how transformers’ reasoning can be improved by allowing them to use a “chain of thought” or “scratchpad”, i.e., generate and condition on a sequence of intermediate tokens before answering.

Abstract

Recent theoretical work has identified surprisingly simple reasoning problems, such as checking if two nodes in a graph are connected or simulating finite-state machines, that are provably unsolvable by standard transformers that answer immediately after reading their input. However, in practice, transformers’ reasoning can be improved by allowing them to use a “chain of thought” or “scratchpad”, i.e., generate and condition on a sequence of intermediate tokens before answering. Motivated by this