Essay: Prediction and Entropy of Printed English
Overview
Claude Shannon’s 1951 essay tackles a concrete question raised by his earlier information theory: how uncertain is the next symbol in ordinary English text? By estimating the entropy rate of printed English, Shannon connects the statistical structure of language to practical limits on prediction, compression, and cryptanalysis. He treats English as a stochastic source that emits characters from a simplified alphabet (typically the 26 letters plus a space) and asks how much information each character carries once context is taken into account.
Approach
Shannon pursues two complementary strategies. First, he constructs sequences using successively richer statistical models of English. Starting from a zero-order model with independent, equiprobable letters, he moves to models that match letter frequencies, digram frequencies, trigrams, and higher-order dependencies. As the order increases, the generated text shifts from nonsense to strings with plausible syllables, words, and short phrases, illustrating how constraints accumulate and reduce uncertainty.
Second, he runs a human prediction experiment, the now-classic guessing game. A subject is shown a passage and asked to guess the next character; if wrong, the subject keeps guessing until correct. Assuming the subject orders guesses by perceived likelihood, the distribution of guesses can be converted to an estimate of the conditional entropy of the next symbol given the preceding context. By varying the amount of context and averaging across passages and subjects, Shannon obtains empirical estimates that reflect real long-range dependencies, including spelling regularities, morphology, syntax, and semantics.
Findings
The estimates show a dramatic drop in uncertainty as context grows. Short-range models (based on single-letter or digram statistics) still leave several bits of uncertainty per character, but human prediction with ample context drives the estimated entropy rate much lower. Shannon argues that the entropy of printed English is on the order of 1.0–1.5 bits per character for a 27-symbol alphabet that includes the space, with about 1.3 bits per character as a representative figure. Compared with the maximum 4.75 bits per character for that alphabet, this implies a high redundancy, roughly 70–75%, arising from the many constraints that tie characters together across long spans of text.
He also observes that meaningful context extends far beyond local n-grams. Dependencies operate over words and phrases, allowing readers to anticipate upcoming letters with high accuracy across stretches of tens or even hundreds of characters. This long-range structure is what the guessing game captures and what low-order Markov models miss.
Interpretation
Entropy measures the irreducible unpredictability in text, not the richness of meaning. English is highly redundant because its symbols are organized to support error tolerance and comprehension: the same ideas can be conveyed with many overlapping cues, and readers can recover from noise or missing information. High redundancy also sets generous upper bounds for compression and aids cryptanalysis; a cipher transducing English inherits its predictability, enabling recovery of plaintext under noise or partial knowledge.
Impact
The essay provided one of the first empirical anchors for language entropy, complementing Shannon’s theoretical framework with measurable numbers. It anticipated core ideas in modern language modeling, using context to reduce uncertainty, and it supplied benchmark targets for text compression. The staged text-generation examples became a staple illustration of how increasing-order models approach the “look and feel” of natural language, while the human-guessing paradigm foreshadowed cloze tests and later psycholinguistic methods. By quantifying the statistical structure of English, Shannon showed how deeply information-theoretic limits shape communication, storage, coding, and the practical predictability of language.
Claude Shannon’s 1951 essay tackles a concrete question raised by his earlier information theory: how uncertain is the next symbol in ordinary English text? By estimating the entropy rate of printed English, Shannon connects the statistical structure of language to practical limits on prediction, compression, and cryptanalysis. He treats English as a stochastic source that emits characters from a simplified alphabet (typically the 26 letters plus a space) and asks how much information each character carries once context is taken into account.
Approach
Shannon pursues two complementary strategies. First, he constructs sequences using successively richer statistical models of English. Starting from a zero-order model with independent, equiprobable letters, he moves to models that match letter frequencies, digram frequencies, trigrams, and higher-order dependencies. As the order increases, the generated text shifts from nonsense to strings with plausible syllables, words, and short phrases, illustrating how constraints accumulate and reduce uncertainty.
Second, he runs a human prediction experiment, the now-classic guessing game. A subject is shown a passage and asked to guess the next character; if wrong, the subject keeps guessing until correct. Assuming the subject orders guesses by perceived likelihood, the distribution of guesses can be converted to an estimate of the conditional entropy of the next symbol given the preceding context. By varying the amount of context and averaging across passages and subjects, Shannon obtains empirical estimates that reflect real long-range dependencies, including spelling regularities, morphology, syntax, and semantics.
Findings
The estimates show a dramatic drop in uncertainty as context grows. Short-range models (based on single-letter or digram statistics) still leave several bits of uncertainty per character, but human prediction with ample context drives the estimated entropy rate much lower. Shannon argues that the entropy of printed English is on the order of 1.0–1.5 bits per character for a 27-symbol alphabet that includes the space, with about 1.3 bits per character as a representative figure. Compared with the maximum 4.75 bits per character for that alphabet, this implies a high redundancy, roughly 70–75%, arising from the many constraints that tie characters together across long spans of text.
He also observes that meaningful context extends far beyond local n-grams. Dependencies operate over words and phrases, allowing readers to anticipate upcoming letters with high accuracy across stretches of tens or even hundreds of characters. This long-range structure is what the guessing game captures and what low-order Markov models miss.
Interpretation
Entropy measures the irreducible unpredictability in text, not the richness of meaning. English is highly redundant because its symbols are organized to support error tolerance and comprehension: the same ideas can be conveyed with many overlapping cues, and readers can recover from noise or missing information. High redundancy also sets generous upper bounds for compression and aids cryptanalysis; a cipher transducing English inherits its predictability, enabling recovery of plaintext under noise or partial knowledge.
Impact
The essay provided one of the first empirical anchors for language entropy, complementing Shannon’s theoretical framework with measurable numbers. It anticipated core ideas in modern language modeling, using context to reduce uncertainty, and it supplied benchmark targets for text compression. The staged text-generation examples became a staple illustration of how increasing-order models approach the “look and feel” of natural language, while the human-guessing paradigm foreshadowed cloze tests and later psycholinguistic methods. By quantifying the statistical structure of English, Shannon showed how deeply information-theoretic limits shape communication, storage, coding, and the practical predictability of language.
Prediction and Entropy of Printed English
Empirical and theoretical study estimating the entropy and predictability of English text using human prediction experiments and statistical methods, providing practical estimates of redundancy in natural language.
- Publication Year: 1951
- Type: Essay
- Genre: Information theory, Linguistics, Statistics
- Language: en
- View all works by Claude Shannon on Amazon
Author: Claude Shannon

More about Claude Shannon
- Occup.: Mathematician
- From: USA
- Other works:
- A Symbolic Analysis of Relay and Switching Circuits (1937 Non-fiction)
- An Algebra for Theoretical Genetics (1940 Non-fiction)
- A Mathematical Theory of Communication (1948 Essay)
- Communication Theory of Secrecy Systems (1949 Essay)
- The Mathematical Theory of Communication (1949 Book)
- Programming a Computer for Playing Chess (1950 Essay)
- The Bandwagon (1956 Essay)
- The Zero Error Capacity of a Noisy Channel (1956 Essay)