Home / Papers / The Expressive Power of Transformers with Chain of Thought

The Expressive Power of Transformers with Chain of Thought

DOI: 10.48550/arXiv.2310.07923Semantic Scholar

64 Citations•2024•

William Merrill, Ashish Sabharwal

ArXiv

This paper aims to demonstrate how transformers’ reasoning can be improved by allowing them to use a “chain of thought” or “scratchpad”, i.e., generate and condition on a sequence of intermediate tokens before answering.

Abstract

Recent theoretical work has identiﬁed surprisingly simple reasoning problems, such as checking if two nodes in a graph are connected or simulating ﬁnite-state machines, that are provably unsolvable by standard transformers that answer immediately after reading their input. However, in practice, transformers’ reasoning can be improved by allowing them to use a “chain of thought” or “scratchpad”, i.e., generate and condition on a sequence of intermediate tokens before answering. Motivated by this