Apparent Intelligence
What do you see?

This week, my partner and I were reading Eric Carle's "Brown Bear, Brown Bear, What Do You See?" perhaps the most visually enchanting children's book we've encountered, to our nearly one-year-old son. Some pages he swiftly flipped past, while others—like the vibrant purple cat—captured his attention, prompting delighted "ooo" sounds and prolonged gazes.
A fascinating cognitive puzzle has emerged in our household: Does he understand that this stylized image represents the word "cat," or is he merely associating some deeper, more ineffable quality with the illustration (not that there’s anything wrong with that)?
Our experimental journey began on the book's final spread, which features all the animals encountered throughout the narrative. "Where's the cat?" we asked. He pauses a second, looks at the spread, and points his finger to the purple cat. Excitement bubbled up. "Where's the dog?" His finger smoothly transitioned to the neighboring canine.
We followed with another query: "Where's the cat?" Again, he navigated precisely to the feline. We repeated this several times, each successful navigation increasing our wonder. The probability of such accurate pointing given random pointing seemed quite small—approximately 3%, with a corresponding p-value of 0.03.
That moment of watching our formerly passive potato transform into a responsive, seemingly sentient potato is nothing short of magical. How does this understanding emerge? How does a being transition from random interactions to purposeful communication and interaction?
The Fragility of Perceived Intelligence
Our scientific enthusiasm demanded further testing. "Where's the elephant?" we asked. His finger, without hesitation, returned to the purple cat. Our soaring hypotheses about linguistic comprehension deflated came crashing back to earth. We followed up with, “that’s the cat, now where’s the dog?” He looked at us, then back to the book, with his finger remaining on the cat.
We swiftly recalibrated our model of his cognitive capabilities:
Comprehensive Understanding Model: Likelihood Function ≈ 0
Random Pointing Model: Likelihood Function ≈ 0.008
He Probabilistically Likes Cats Model (with a preference p for pointing at the cat, regardless of what we say): Likelihood function has a maximum value ≈ 0.015, with p≈0.7.
Of course, there are more nuanced interpretations and potentially more likely models: perhaps he understands but simply chose not to engage. Perhaps he was testing our intelligence, and was confused why we kept on saying a different animal than what he was pointing at! (Or, most likely, he saw that pointing at the cat got us to respond, so he was just trying that again, irrespective of what we said).1
Echoes of the Chinese Room
This incident reminded me of John Searle's famous Chinese Room thought experiment from 1980—a philosophical probe into the nature of understanding. Imagine a human who knows only English, following an algorithmic script to generate coherent Chinese responses given text input (originally, provided on a piece of paper). Ostensibly, such a person could carry on an intelligent conversation in Chinese, but does following the algorithm (which a computer could also do) constitute true understanding of Chinese?
In our current age, Large language models (LLMs) have only intensified this philosophical quandary. When an AI generates flawless, contextually appropriate text, is it truly comprehending?
The answer, most likely, lies not in binary distinctions but in a rich, multidimensional spectrum of understanding. Our own learning experiences reveal multiple layers of comprehension—from rote memorization to deep, intuitive mastery. And as we are all painfully aware of now through our own interactions with language models, John Searle’s thought experiment is missing an important specification: it’s not just the algorithm that matters, but also the context and input that is provided to the algorithm. What we think of as intelligent understanding involves appropriate responses to changing contexts.
What transforms mere information processing into genuine understanding? When does apparent intelligence become real intelligence? It also doesn’t escape me that in all this informal discussion, I haven’t defined intelligence yet, leaving it to be this nebulous concept free to shapeshift.
There’s also this weird quirk of the thought experiment that what is driving the intuition that the human following an algorithm doesn’t understand Chinese is that they can’t produce Chinese, unassisted. However, can we say that the algorithm understands Chinese, or that it represents a coherent understanding and representation of Chinese? (On a related note, with the breakthrough DeepSeek R1 model, it’s going to be interesting to probe exactly what kinds of internal reasoning patterns it has assembled on its own via reinforcement learning, without the extensive supervision process of the OpenAI models.)
Intelligent or not, understanding or not, playing the pointing game with our son was nevertheless a key delight of the week.
One of the quirks of maximum likelihood-style reasoning is that it promotes overfitting to the observed data, especially if one wants to evaluate the likelihood of the exact sequence of data. There’s also the problem that you can construct multiple models with the same likelihood on the observed data, but very different behavior on future data. The answer to this underspecification and overfitting naturally lies in accounting for the goals of generalization and prediction, not only fitting likelihoods.
For example, for a given sequence of N coin flips, the likelihood to observe this exact sequence under a random model is 0.5^N, whereas the likelihood of a fixed coin is 1! However, the fixed coin model is useless for making predictions, and the random coin model performs quite well for predicting properties of the distribution, e.g. the average and the standard deviation, of coin flips beyond the observed sequence.



