From chaos to cognition
“Nothing in biology makes sense except in the light of evolution.” — Theodosius Dobzhansky
The computational observation
We have exactly one confirmed example of general intelligence in the known universe. Not one architecture, not one algorithm: one entire process. It spans roughly 3.8 billion years of evolution, operating across trillions of organisms in parallel, each testing a slightly different strategy against the hard constraints of a physical world. Before we debate whether machines can think, we should ask the more elementary question: how much did the only known solution cost?
This chapter attempts an answer. Specifically, we want to measure the computational distance from a state of high cognitive entropy, where no organism reasons, plans, or models the world, to the low-entropy state where general intelligence emerges. Not the cost of building the machinery that makes cognition possible, the molecular substrates, the basic neural wiring, but the cost of the journey itself.
We will estimate this distance in learning instances: cycles of interaction between organisms and their environment from which adaptive information was extracted. The number we arrive at is large. It may also be wrong by several orders of magnitude in either direction. Let us be honest about that from the start: this estimate is impossible to get right. The organisms are dead, the environments are gone, and the quantities we need were never measured directly. What we can do is assemble the best numbers that biology, neuroscience, and ecology have produced, chain them together carefully, and see where they land.
Our prediction is that traversing this distance required between \(10^{25}\) and \(10^{40}\) learning instances, depending on how much of evolution’s work we classify as “building machinery” versus “making the journey.” As we will show, the largest artificial training runs, measured in comparable units, have performed roughly \(10^7\). The gap demands explanation.
The ground rules
Throughout this estimate, and throughout this book, we adopt a single methodological principle: when in doubt, choose the number that makes the gap smaller. Every uncertain parameter is resolved in favor of the optimist. Every ambiguous definition is read in the way most generous to current AI. If, after all this generosity, the remaining gap is still large, the argument is strengthened precisely because we did not inflate it.
This is a lower bound argument. We are not trying to prove that AGI is impossible. We are trying to find the floor: the smallest defensible estimate of what nature spent, using every reasonable discount and concession. If that floor is still far above what artificial systems have achieved, we have learned something important.
What are we counting?
The tempting metric is synaptic operations: count every time a neuron fires, across every organism, across all of evolutionary time. But this overcounts in a way that distorts the picture. A lizard basking on a rock for an hour has neurons firing continuously, maintaining homeostasis, regulating body temperature, keeping its heart beating. Very little of that activity constitutes learning. If we want to know what it would cost to replicate evolution’s achievement in silicon, raw neural firing is not the right unit. We do not need to simulate a lizard’s heartbeat.
The unit we want is a learning instance: a cycle of interaction between an organism and its environment from which adaptive information can be extracted. For an organism with a nervous system, this means a meaningful encounter: a predator detected, a food source located, a mate assessed, a threat survived or not survived. For a bacterium, which has no neurons at all, the learning instance is a generation in which a genuinely novel genetic variant is tested against the environment: a new mutation expressed, a new strategy tried, an outcome recorded by natural selection.
This is what we would actually need to replicate. Not the idle hum of a billion neural circuits, but the moments where organism meets world and something is at stake. The failed experiments count too: evolution has no foresight, and an organism that dies in infancy is as much a data point as one that reproduces. A genetic algorithm does not get to subtract the fitness evaluations of discarded candidates from its computational budget.
We split the estimate into three tiers: the long pre-neural era dominated by microbial life, the neural era beginning with the Cambrian explosion roughly 500 million years ago and dominated by invertebrates, and the relatively brief era of vertebrate sophistication.
Tier 1: The microbial foundation
Life originated approximately 3.8 billion years ago. For the first 3.3 billion years, until the Cambrian explosion roughly 500 million years ago, Earth was dominated by bacteria and archaea: organisms with no nervous systems, no neurons, no synapses. They could not learn within their lifetimes in any neural sense. They executed their genetic programs and either divided or died.
Yet this era was not computationally idle. It was, by any measure, the most prolific optimization process in Earth’s history. The molecular machinery that neurons would later use, the entire signaling and communication infrastructure of neural computation, was forged during this period through the brute trial and error of microbial evolution.
Timeline:
\[T_1 = 3.3 \times 10^9 \text{ years} \times 3.15 \times 10^7 \text{ s/year} \approx 1.04 \times 10^{17} \text{ s}\]
Population: Whitman, Coleman, and Wiebe estimated in their landmark 1998 census that the number of prokaryotic cells alive on Earth at any given time is approximately \(4\text{--}6 \times 10^{30}\). Bar-On and colleagues revisited global biomass in 2018 and broadly confirmed the order of magnitude. We use \(10^{30}\).
Learning rate: Here we apply our first optimistic discount. A bacterium divides, on average, every few hours. We use \(\tau \approx 10^4\) seconds (roughly three hours) as a reasonable mean across species and conditions. But most divisions produce offspring nearly identical to the parent. The “learning” happens only when a genuinely novel variant is tested: a new mutation expressed against the environment. Drake’s foundational work on mutation rates established that bacteria mutate at approximately \(\mu \approx 0.003\) mutations per genome per generation, a figure that has held up across decades of subsequent measurement. We count only these novel-variant generations:
\[R_1 = \frac{\mu}{\tau} = \frac{0.003}{10^4} = 3 \times 10^{-7} \text{ instances/s/organism}\]
Subtotal:
\[\Omega_1 = N_1 \times R_1 \times T_1 = 10^{30} \times 3 \times 10^{-7} \times 1.04 \times 10^{17}\]
\[\Omega_1 \approx 3 \times 10^{40}\]
Three followed by forty zeros. We will verify this number shortly.
Tier 2: The invertebrate era
The Cambrian explosion, roughly 500 million years ago, marks the appearance of nervous systems in the fossil record. From this point forward, organisms could learn within their own lifetimes: adjusting behavior in response to experience, not merely across generations through genetic variation.
Timeline:
\[T_2 = 500 \times 10^6 \text{ years} \times 3.15 \times 10^7 \text{ s/year} \approx 1.58 \times 10^{16} \text{ s}\]
Population: The number of animals with nervous systems alive at any time is dominated by invertebrates. Nematodes alone number in the hundreds of trillions. The Entomological Society of America and the Smithsonian estimate roughly \(10^{19}\) individual insects alive at any given time, a figure consistent with Williams’s earlier census work. Fish, amphibians, reptiles, birds, and mammals are rounding errors against this population.
\[N_2 \approx 10^{19} \text{ organisms}\]
Learning rate: This is the hardest parameter in the estimate. How often does the average invertebrate encounter a genuinely novel stimulus, one that triggers actual learning rather than habituated routine? No one has measured this directly in the wild. But we can triangulate from the species whose learning has been studied most carefully, then ask what the population-weighted average should be.
Menzel and colleagues documented that foraging honeybees learn and retain flower colors, shapes, scents, locations, reward quality, time-of-day patterns, navigation landmarks, and routes. A forager makes 10 to 15 trips per day and accumulates this repertoire, hundreds of distinct learned items, over roughly three weeks of active foraging. That implies roughly 10 to 15 novel learned associations per day. The nematode C. elegans, with only 302 neurons, demonstrates habituation, sensitization, and associative learning across its two-to-three-week lifetime, as documented by Rankin and others; back-calculating from its known behavioral repertoire yields a comparable rate of roughly 7 to 14 learned items per day. Drosophila shows robust associative conditioning in the lab, with synaptic plasticity in the mushroom body operating on timescales of tens of seconds, though the ecological relevance of this rate in the wild remains debated.
These are the best-studied invertebrate learners, and they converge on roughly 10 novel learning events per day during active life. The \(10^{19}\) population is dominated by insects, so the population-weighted average is driven by insect learning rates. Ten per day sits at the conservative end of what the data from studied species suggests:
\[R_2 = \frac{10}{86400} \approx 10^{-4} \text{ instances/s/organism}\]
We can sanity-check this against structural data. Neurons connect to each other through synapses, and most excitatory synapses in the brain sit on tiny protrusions called dendritic spines. When the brain learns, spines grow, shrink, appear, and disappear; the rate at which this happens, spine turnover, is the most direct measurable proxy for learning-related synaptic change. Pfeiffer and colleagues measured roughly 10% spine turnover per day in the mouse hippocampus. Trachtenberg and colleagues found roughly 0.5 to 1% per day in the mouse cortex. For a typical invertebrate with \(10^4\) to \(10^7\) synapses and a turnover rate in this range, we would expect roughly \(10^1\) to \(10^4\) individual spine changes per day. At ten learning instances per day, that implies roughly 1 to 1,000 spine changes per learning instance: a single episode of learning rewiring a handful to a thousand synapses. This is consistent with what neuroscience observes for associative learning in small nervous systems.
Subtotal:
\[\Omega_2 = N_2 \times R_2 \times T_2 = 10^{19} \times 10^{-4} \times 1.58 \times 10^{16}\]
\[\Omega_2 \approx 10^{31}\]
Tier 3: The vertebrate refinement
Vertebrates learn more per individual than any invertebrate. A crow solving a novel puzzle, a rat navigating a maze, a dolphin coordinating a hunt: these are learning-dense lives. But vertebrate populations are tiny compared to invertebrates, and they arrived late. The first vertebrates, small jawless filter-feeders, appear in the Cambrian fossil record roughly 525 million years ago, but they did not diversify substantially until the Devonian. We generously credit them with the full post-Cambrian timeline.
Timeline:
\[T_3 \approx 1.58 \times 10^{16} \text{ s}\]
Population: Callaghan, Nakagawa, and Cornwell estimated roughly 50 billion wild birds alive at any given time. Greenspoon and colleagues at the Weizmann Institute estimated approximately 130 billion wild mammals, dominated by bats (roughly 56 billion) and rodents (roughly 25 billion). Reptile and amphibian populations are less precisely known but on the order of \(10^{11}\). Fish dominate: estimates range from one to three trillion individuals, depending on assumptions about mesopelagic species. We use the conservative lower bound:
\[N_3 \approx 10^{12} \text{ organisms}\]
Fish outnumber all other vertebrates combined by roughly an order of magnitude. The population-weighted average vertebrate is a fish.
Learning rate: As with invertebrates, we triangulate from the species whose learning has been most carefully documented, then ask what the population-weighted average should be.
Fish are capable of rapid, flexible learning. Blank and colleagues showed that adult zebrafish form NMDA-dependent long-term memories from a single aversive experience: one-trial inhibitory avoidance. Rodriguez and others documented that goldfish learn spatial tasks in four-arm mazes, using both place-based and cue-based strategies, and can take spontaneous shortcuts suggesting map-like spatial representations. In the wild, a fish navigates its territory, locates food, avoids predators, and interacts with conspecifics. How many of these encounters constitute genuinely novel learning? For a schooling pelagic fish whose daily routine is largely repetitive, the number is modest: perhaps a few dozen per day. For an actively foraging territorial species, it may reach the low hundreds.
Among birds, the food-caching corvids and parids provide the most striking quantitative data. Balda and Kamil documented that a single Clark’s nutcracker caches 22,000 to 33,000 pine seeds across 5,000 to 6,000 distinct locations during autumn, recovering them with high accuracy up to 285 days later: roughly 100 to 200 novel spatial memories per day during the caching season. Applegate and Aronov showed that black-capped chickadees cache hundreds of food items daily, each in a unique site that generates a distinct hippocampal firing pattern. These are exceptional species, not typical vertebrates, but they demonstrate the ceiling of vertebrate individual learning capacity.
Among mammals, O’Keefe and Dostrovsky’s discovery of hippocampal place cells established that rodents form new spatial representations within seconds of entering a novel environment: a single pass through a new location is sufficient to generate a stable place field. Fear conditioning is reliably one-trial: a single aversive event produces long-term contextual memory. But the 130 billion wild mammals are dominated by bats and small rodents, whose daily learning budgets in their natural habitats are far below these laboratory demonstrations of capacity.
The population is dominated by fish, and the population-weighted average is driven by fish learning rates. We estimate roughly 100 novel learning events per day for the average vertebrate: an order of magnitude above the invertebrate rate, reflecting greater neural complexity, but discounting for the fact that most fish and small mammals spend much of their time in repetitive behavioral routines:
\[R_3 = \frac{100}{86400} \approx 10^{-3} \text{ instances/s/organism}\]
Subtotal:
\[\Omega_3 = N_3 \times R_3 \times T_3 = 10^{12} \times 10^{-3} \times 1.58 \times 10^{16}\]
\[\Omega_3 \approx 10^{25}\]
Even if we raise the learning rate by an order of magnitude, to a thousand novel events per day (consistent with what caching birds and actively exploring rodents achieve), the total reaches only \(10^{26}\). The population deficit is decisive: \(10^{12}\) vertebrates cannot overcome a \(10^{19}\)-strong invertebrate population, regardless of how much more each individual learns. Vertebrates matter enormously for the quality of intelligence that evolution produced, but they are a rounding error in the quantity of learning instances.
The total
| Tier | Learning instances |
|---|---|
| Pre-neural (bacteria, mutation-discounted) | \(3 \times 10^{40}\) |
| Invertebrates (behavioral, optimistic) | \(\sim 10^{31}\) |
| Vertebrates | \(\sim 10^{25}\) |
| Full total | \(\sim 3 \times 10^{40}\) |
The pre-neural phase dominates by nine orders of magnitude. The microbial era, with its vast populations and relentless generational turnover, performed the overwhelming bulk of evolution’s optimization work. The neural era, for all its sophistication, is a refinement.
But this raises an important question.
From chaos to cognition
We have computed the total across all three tiers. But not all of that computation is equally relevant to the question we are asking.
Consider what we are actually trying to measure: the computational distance from a state of high cognitive entropy, where no organism reasons, plans, or models the world, to the low-entropy state where general intelligence emerges. The first two tiers did not traverse that distance. They built the machinery that makes the traversal possible. Bacteria assembled the molecular substrate of computation. Invertebrates developed the basic algorithms of learning: habituation, conditioning, sensory integration, spatial memory. These are the mechanics, the engine and the chassis. They are not the journey.
The journey, the actual transition from “learning exists” to “general intelligence exists,” happened during the vertebrate era. It happened in organisms that inherited a working nervous system with established learning algorithms, and then used those tools, across 500 million years and a trillion parallel lives, to develop abstract reasoning, social cognition, planning, and flexible problem-solving.
This distinction matters because it is substrate-independent. We are not asking whether silicon can replicate biology’s specific molecular machinery, or whether backpropagation is equivalent to synaptic plasticity. We are asking a simpler question: how much computation separates cognitive disorder from cognitive order? The vehicle, biological or artificial, provides the capacity to compute. The vertebrate learning instances are the computation itself.
Silicon provides its own mechanics. At the substrate level: transistors, memory hierarchies, GPU architectures, decades of computer science and engineering. At the algorithmic level: backpropagation, attention mechanisms, reinforcement learning, and the mathematical theory that underpins them. Whether these mechanics are equivalent to biology’s is genuinely debatable, and we will return to that question in later chapters. But even granting the equivalence, the distance remains. The question is how far, not how.
\[\Omega_{\text{vertebrate}} \approx 10^{25}\]
This is the maximally optimistic estimate: the computation that occurred after both foundations were in place, during the era when general intelligence actually emerged. But what is the right number on the AI side?
The conventional comparison uses floating-point operations: frontier language models consume roughly \(10^{25}\) FLOP during training, and \(10^{25}\) against \(10^{25}\) would suggest the gap is already closed. But this comparison is inconsistent with our own methodology.
We rejected total synaptic operations as the metric for biology because most neural activity is not learning: it is maintenance, homeostasis, routine processing. The same logic applies to FLOP. The vast majority of floating-point operations in a training run are spent on the forward and backward passes: computing gradients, not applying them. The moment the model actually learns, the moment its weights change, is the gradient update step. Everything else is computation in service of that step, just as a lizard’s routine neural firing is activity in service of staying alive, not learning. If we insist on counting only the moments that matter on the biology side, intellectual honesty demands we do the same on the silicon side.
A frontier training run processes roughly \(10^{13}\) tokens in batches of several million, yielding approximately \(10^6\) to \(10^7\) gradient update steps. Each step is one adjustment: the model sees a batch of data, computes how wrong it was, and updates its parameters. That is the atom of learning in stochastic gradient descent, just as a learning instance is the atom of learning in biology.
The honest comparison is adjustments to adjustments:
\[\frac{\Omega_{\text{vertebrate}}}{\Omega_{\text{SGD}}} \approx \frac{10^{25}}{10^7} = 10^{18}\]
Even at maximum generosity, the gap is eighteen orders of magnitude.
For now, we note the range. The full estimate, counting all three tiers, is \(\sim 10^{40}\). The vertebrate-only estimate, measuring only the distance from cognitive chaos to cognition, is \(\sim 10^{25}\). Against \(\sim 10^7\) gradient updates, the gap ranges from eighteen to thirty-three orders of magnitude.
Verification
We derived these numbers from populations, rates, and timelines. Can we check them against something independent? Two approaches.
Energy consistency
If our learning instance count is correct, the energy cost per instance should be physically plausible. We can estimate total neural energy expenditure independently and divide.
Attwell and Laughlin’s foundational work on the brain’s energy budget established that a single synaptic transmission event costs roughly \(8 \times 10^{-15}\) joules. Mink, Blumenschine, and Adams showed that the ratio of central nervous system metabolism to body metabolism is remarkably constant across vertebrate classes, at roughly 2 to 8 percent. Herculano-Houzel demonstrated that glucose consumption per neuron is nearly constant across rodent and primate species, varying by only 40 percent.
During the neural era, average biosphere power was roughly 70 TW (Hoehler and colleagues). Animals consume approximately 5% of biosphere energy. Neural tissue accounts for roughly 5% of animal energy on a population-weighted basis (lower than the vertebrate average, since invertebrate nervous systems are proportionally smaller). Total neural energy:
\[E_2 \approx 70 \times 10^{12} \times 0.05 \times 0.05 \times 1.58 \times 10^{16} \approx 3 \times 10^{25} \text{ J}\]
Energy per learning instance: \(3 \times 10^{25} / 10^{31} = 3 \times 10^{-6}\) J, or about 3 microjoules. A small insect brain consumes roughly \(10^{-5}\) joules per second. At that rate, 3 microjoules is about 0.3 seconds of full brain activity: a brief sensory-motor cycle. Physically plausible. ✓
Behavioral plausibility
If our per-organism learning rate (\(10^{-4}\) instances per second) is correct, individual organisms should accumulate a reasonable number of learning instances over their lifetimes. We can check this against species whose learning has been studied in detail.
Honeybee. A forager bee lives roughly six weeks, with about three weeks of active foraging. At \(10^{-4}\) instances per second, it accumulates roughly 200 to 400 learning instances over its lifetime. The literature on honeybee cognition, particularly the work of Menzel and colleagues, documents that foraging bees learn and retain flower colors, shapes, scents, locations, reward quality, time-of-day patterns, panoramic landscape views, compass directions, and navigational routes. Several hundred distinct learned items over a lifetime is consistent with this. ✓
C. elegans. The nematode C. elegans lives two to three weeks. At \(10^{-4}\) instances per second, it accumulates roughly 100 to 200 learning instances. Despite having only 302 neurons (the complete connectome was mapped by White and colleagues in 1986), C. elegans demonstrates habituation to mechanical and chemical stimuli, sensitization, associative learning linking temperature, smell, and taste to food availability, and both short-term and long-term memory, as documented extensively by Rankin and others. One to two hundred learning events over a lifetime is plausible for this repertoire. ✓
Mouse. A mouse lives roughly two years. Mice are among the most learning-intensive vertebrates: place cells form within seconds of encountering a novel environment, and fear conditioning is reliably one-trial. In the wild, a mouse explores its territory, forages, avoids predators, and navigates social hierarchies, plausibly encountering several hundred novel learning situations per day. At a conservative 200 events per day, well below what laboratory studies of hippocampal plasticity suggest the brain can handle, a mouse accumulates roughly 150,000 learning instances over its lifetime. Mice in complex environments learn spatial maps, food cache locations, social hierarchies, predator avoidance strategies, and hundreds of contextual associations. Over a hundred thousand learning events across a two-year life in a rich environment is reasonable. ✓
These checks do not prove the estimate is correct. They demonstrate internal consistency: the per-organism rate produces lifetime learning counts that match what behavioral science has documented.
The human question
The vertebrate calculation treats all \(10^{12}\) organisms as contributing equally to the journey toward general intelligence. But this obscures an important detail: humans are qualitatively different. Language, abstract reasoning, cumulative culture, technology—these capabilities emerged very recently and in a very small population. How much of the \(10^{25}\) learning instances was specifically required for human-level cognition?
The genus Homo appeared roughly 2.5 million years ago. Anatomically modern humans (Homo sapiens) emerged approximately 300,000 years ago. Behavioral modernity—language, art, complex tools, symbolic thought—is evident in the archaeological record only within the last 100,000 years. For most of vertebrate history, the most sophisticated organisms were nowhere near human intelligence.
If we isolate the human lineage specifically, the calculation narrows dramatically. Human population over the last 100,000 years averaged perhaps \(10^6\) to \(10^7\) individuals (peaking only recently at \(10^{10}\)). At 100 learning instances per day over 100,000 years:
\[\Omega_{\text{human}} \approx 10^7 \text{ humans} \times 10^2 \text{ instances/day} \times 365 \text{ days/year} \times 10^5 \text{ years}\]
\[\Omega_{\text{human}} \approx 4 \times 10^{16}\]
This is nine orders of magnitude smaller than the full vertebrate estimate. Does this mean the gap to human-level intelligence is only \(10^{16} / 10^7 \approx 10^9\)—“merely” a billion-fold?
No, for two reasons. First, humans inherited the neural machinery that the previous 500 million years of vertebrate evolution built. The \(10^{16}\) human learning instances operated on top of a substrate that cost \(10^{25}\) instances to develop. You cannot train a human brain from random initialization; you need the machinery evolution built.
Second, and more fundamentally, we do not know how much of human intelligence emerges from individual learning versus evolutionary optimization. Language capacity, for instance, appears to have significant innate structure (Chomsky’s universal grammar, though debated in details, captures a real phenomenon: children acquire language with surprisingly little data). This innate structure was itself shaped by evolution, operating over millions of generations. The \(10^{16}\) human learning instances sit atop an evolutionary foundation that we cannot bypass.
The honest answer is that the human-specific portion of the journey is difficult to isolate. What we can say with confidence is that the full vertebrate estimate of \(10^{25}\) learning instances represents the cost to go from “no general intelligence” to “human-level general intelligence” starting from established neural learning mechanisms. If we had those mechanisms in silicon—the consolidation cycle, the co-located memory, the architectural substrate—perhaps \(10^{16}\) or even \(10^{18}\) learning instances would suffice. But we do not have those mechanisms, and building them is part of the problem, not a solved prerequisite.
Where current AI fails
If the learning instances gap is real, it should be visible in failure modes: tasks that reveal the boundaries of what \(10^7\) gradient updates over text can learn. The failures are indeed visible, and they cluster in predictable ways.
Novel physical reasoning. Ask GPT-4: “I have a cup of water. I turn the cup upside down. Where is the water?” The model answers correctly—it has seen this pattern in text. But ask: “I have a cup of water with a plate balanced on top. I turn the cup upside down, then remove my hand. Where is the water?” The model struggles. This is a trivial problem for a toddler who has spilled water hundreds of times, but text rarely describes this specific configuration. The model has learned the statistical regularities of water-related text, not the physics of water.
Causal inference beyond correlation. Show the model data: “Every day the rooster crows, then the sun rises.” Ask: “Does the rooster cause the sun to rise?” The model answers no, because it has seen text explicitly stating that correlation is not causation. But present a novel correlation without explicit annotation, and the model confuses the two. It has learned to parrot the distinction when prompted, but not to reliably apply it.
Compositional generalization. Train the model on sentences like “the red triangle is above the blue circle” and “the green square is to the left of the yellow star.” Then ask: “Draw a scene with the purple pentagon above the orange hexagon.” The model has seen all the component concepts, but combining them in a novel configuration often fails. Human children, by contrast, effortlessly generalize compositional structure after a handful of examples, because their learning is grounded in embodied interaction with objects in space.
Out-of-distribution robustness. Adversarial examples expose this starkly. Change a single pixel in an image, imperceptible to humans, and the classifier flips from “panda” to “gibbon” with high confidence. Add a small sticker to a stop sign, and an autonomous vehicle misclassifies it. These are not edge cases; they reveal that the model has learned statistical regularities in the training distribution, not robust concepts grounded in the structure of the world.
Common sense in unfamiliar contexts. Ask: “If I put my phone in the fridge, will it get cold?” The model answers yes. Ask: “If I put my phone in the fridge for three days, will it spoil?” The model may say no, because phones do not spoil. But ask: “If I put my phone in the fridge, then take it out into a hot humid room, what happens?” The model often misses that condensation will form and potentially damage the phone. This is trivial common sense for anyone who has experienced humidity, but text rarely describes this specific scenario. The \(10^9\) bits per second of lived experience is missing.
These failures have a common structure: they occur where the model must generalize beyond the statistical regularities it has seen in text to the underlying causal, physical, or compositional structure of the world. This is precisely what we would predict from the learning instances gap. Text captures correlations, frequent patterns, and explicit human descriptions of rules. It does not capture the full sensory bandwidth of embodied experience from which robust world models are built.
The model has learned the \(10^{-8}\) of reality that made it into text. The failures reveal the 99.9999% that did not.
The case against
Let us give the opposition its strongest possible arguments.
“Evolution is wasteful.” Natural selection is not gradient descent. It does not follow the steepest path to a solution. It wanders, gets stuck in local optima, spends millions of years on body plans that lead nowhere. Surely a more directed optimization process could find intelligence with far less computation.
This is plausible. Directed search is generally more efficient than random search. But how much more efficient? Evolution is not purely random; it is a sophisticated optimization algorithm that combines random mutation with strong selection pressure, sexual recombination, and developmental constraints that bias the search toward viable phenotypes. It is closer to a well-tuned evolutionary strategy than to brute-force enumeration. Claiming a billion-fold speedup over this already-sophisticated process is extraordinary and requires evidence, not assumption.
“Intelligence might have a shortcut.” Perhaps there exists a compact algorithm, a set of principles that, once discovered, allows intelligence to be instantiated with modest computation. Evolution could not find this shortcut because evolution optimizes for survival, not for elegant algorithms.
This is the strongest version of the objection, and we cannot rule it out. It is possible. But “possible” is not “probable,” and it is certainly not a basis for confident predictions about when AGI will arrive. The shortcut hypothesis is unfalsifiable in the absence of the shortcut itself. Until someone demonstrates such a shortcut, the only empirical evidence we have is the evolutionary record, and it says the problem is very, very hard.
“Moore’s Law and algorithmic improvements will close the gap.” Compute costs have fallen exponentially for decades and may continue to do so. Even if \(10^{25}\) learning instances is the target, perhaps we simply need to wait.
The difficulty depends on which estimate we use. The optimistic gap of \(10^{18}\) represents roughly 60 doublings. At two years per doubling, that is 120 years of Moore’s Law, and this assumes the trend does not slow further (it already has). The full gap of \(10^{33}\) would require over 200 years. Algorithmic improvements could compress the timeline, but they would need to close whatever gap remains after hardware gains, and no algorithmic improvement in the history of computer science has delivered a \(10^{18}\)-fold speedup on a problem of this generality.
“Evolution’s solution is not the only solution.” Birds fly but airplanes do not flap their wings. Perhaps artificial intelligence need not recapitulate biological evolution.
This analogy is frequently invoked and frequently misapplied. Airplanes do obey the same physics as birds. Lift requires an airfoil and forward motion through a fluid. The Wright brothers studied birds obsessively. What changed was the mechanism (fixed wings instead of flapping), not the underlying principle (Bernoulli’s equation, Newton’s third law). If artificial intelligence departs from biological intelligence, it must still solve the same underlying problem: extracting reliable generalizations from experience in a world governed by physics. The question is not whether the mechanism must be identical, but whether the computational cost of solving the problem can be radically reduced. Our estimate, built entirely from optimistic assumptions, suggests the cost is high.
“Current AI systems already show signs of general intelligence.” Large language models pass bar exams, write code, reason about novel problems, and exhibit capabilities that were not explicitly trained. Perhaps current training runs are already sufficient for a meaningful degree of general intelligence.
We address this argument fully in a later chapter, but the short response here is: there is a difference between impressive performance on specific benchmarks and the kind of robust, flexible, embodied intelligence that evolution produced. A system that can discuss the concept of heat but has never been burned, that can describe a sunset but has never seen one, has a qualitatively different relationship to knowledge than an organism that has lived through these experiences. Whether this difference matters for practical applications is debatable. Whether it constitutes general intelligence in any rigorous sense is the question this book exists to explore.
The invoice stands. By the most optimistic accounting we can construct, nature spent \(10^{25}\) learning instances to produce general intelligence. The largest artificial training runs have performed roughly \(10^7\) gradient updates. The gap is eighteen orders of magnitude. We have tilted every assumption in favor of the optimist, and the distance remains.
Chapter summary
- Evolution spent approximately \(10^{40}\) learning instances across all three tiers (microbial, invertebrate, vertebrate), or \(10^{25}\) if we count only the vertebrate era where general intelligence emerged
- A learning instance is a cycle of organism-environment interaction from which adaptive information can be extracted, not merely neural activity
- Current frontier AI training runs perform roughly \(10^7\) gradient updates, yielding an eighteen-order-of-magnitude gap
- This is a lower-bound estimate using every optimistic assumption; the true gap may be larger
- Human-specific intelligence may require “only” \(10^{16}\) learning instances, but this sits atop evolved neural machinery that cost \(10^{25}\) instances to build
- Current AI failures cluster predictably in areas requiring generalization beyond text: physical reasoning, causal inference, compositional generalization, out-of-distribution robustness
- The gap is not merely quantitative but reveals the difference between learning statistical regularities in text versus learning from embodied interaction with physics