Illusions of Understanding from Outsourcing Thinking to LLMs

May 15, 2026

      [...]
LLMs Can Be Useful, but not for any Task
The reinforcement learning neural network models driving the 
functionality of current LLMs constitute major technological 
developments (McClelland, 2009; McClelland et al., 2003). The 
capabilities of these models have been gathering pace over the past 
decades mainly through advances in computing power and the amount of 
data available for training - i.e. for successively adjusting the 
weights of the network nodes used for stochastic predictions (Perfors, 
2026). But the basic model functionality has remained constant: based on 
high-dimensional correlation matrices describing the frequency of 
co-occurrence of data units over time (language) or space (graphics), 
the models can take user input as the start of a pattern and use it to 
compute the most plausible continuation of that pattern.

The capacity for high-dimensional pattern matching and extension (also 
referred to as "autocomplete", Bergstrom, 2025) can be useful in a 
variety of domains, not least because the distilled patterns allow for 
generalisation beyond the individual instances on which they are based 
(Lake & Baroni,2023; Peters & Chin-Yee, 2025; but see Becker et al., 
2025). For example, when trained on the respective content domains, LLMs 
can help identify patterns in chemical structures (Jumper et al., 2021), 
in clinical samples (Epping et al., 2025); and between words of 
different languages (Gao et al., 2024; but see Maiberg, 2026).

The mechanism of matching and generalising probabilistic patterns based 
on information from a given database is less useful for tasks that 
require other types of mechanisms for their solution. For instance, 
tasks requiring contextual sensitivity and hence a solution to the frame 
problem in artificial intelligence (Oaksford & Chater, 2009; Pylyshyn, 
1987); high accuracy and precision (Hsu, 2025; Kalai et al., 2025); or 
novel, creative solutions for which no pattern or template has yet been 
built (Habib et al., 2023; Meincke et al., 2025).

The mechanism based limitations in the scope of applicability of LLMs 
are often masked in current discourse about them, a problem complicated 
by the optimisation of LLMs for the production of generic, plausible and 
confident appearing output regardless of how the output relates to what 
is in fact the case (Kalai et al., 2025). This risks creating the 
illusion that LLMs can do things that they cannot, and that they have a 
connection to truth and understanding that they do not.

LLMs Cannot Think
The companies marketing their LLMs often describe them with 
anthropomorphising terms like "thinking" and "reasoning", which might 
create the impression that they can think (Mirzadeh et al., 2025; 
Shojaee et al., 2026). But for that impression to be accurate we would 
have to stretch the meaning of the term to refer trivially to whatever 
the LLMs produce as output - much like the meaning of intelligence has 
historically been watered down to whatever the tests used to 
operationalise the construct measured (Loru et al., 2025; Mitchell, 
2023; Quattrociocchi & Capraro, 2025; van der Maas et al., 2021). The 
task of developing systems with non-trivial capability for human-like 
cognition is computationally intractable (van Rooij et al., 2024).

Focussing on the foundation rather than on the endpoint, to me there is 
a simple and inescapable basis to any thinking and reasoning: logical 
consistency. Just as we cannot see both interpretations of an ambiguous 
image like the rabbit-duck illusion or the Necker Cube at the same time 
(Gopnik & Rosati, 2001), we are incapable of assigning meaning to the 
conjunction of two contradictory statements. We can focus our attention 
on the meaning of one statement and then move over to the meaning of the 
other, but we cannot integrate them into a single meaningful 
representation. Thinking and understanding break down when we encounter 
an inconsistency, like an alarm signal that prompts us to stop and 
reevaluate the situation (Johnson-Laird et al., 2004); and even thinking 
that is not outright contradictory but moves fast and loose from one 
representation to another one incompatible with it is classified as a 
formal thought disorder (Holyoak & Morrison, 2005). This does not imply 
people are good at detecting inconsistencies regardless of problem 
complexity (Oberauer et al., 2016); but merely that it is a foundation, 
however local and fragile, on which thinking and understanding depends 
(Oaksford & Chater, 2020; Wheeler, 2026).

Now, one of the more notorious features of LLMs is their logical 
inconsistency. They routinely produce contradictory output or output 
that changes the topic mid-argument, and construct so-called 
"hallucinations" or "bullshit" responses (Frankfurt, 2005; Hicks et al., 
2024; Kalai et al., 2025) in unforeseeable ways (Hägele et al., 2026). 
Further, LLMs seem incapable of detecting when such inconsistencies 
occur and just keep producing further output unabated - hence their 
functionality breaks down in ways different from how human thinking 
breaks down. This makes sense as their inconsistency is not a bug but a 
natural consequence of the stochastic mechanisms underlying them, 
together with their disconnection from any ground truth about which 
relatively stable conceptual representations could be formed (Kalai et 
al., 2025; Spencer-Brown, 1969; Wittgenstein, 1991). LLM developers have 
themselves stated that the problem of inconsistent, nonsensical output 
is impossible in principle to overcome, regardless of the amount of 
computing power and training data the models are based on (Shojaee et 
al., 2026; Song and Han, 2026).

The path from LLMs to thinking machines thus seems impossible from the 
outset due to the absence by design of the requirement for consistency. 
Many older computational models exist that fulfil the consistency 
requirement. But the capacity for both consistency and scalability 
remains an open, potentially unsolvable problem (Gödel, 1931; Kwisthout 
et al., 2011; Pylyshyn, 1987).

LLMs Can Undermine Thinking and Understanding
Thinking and reasoning, and with them knowledge and understanding, can 
improve with practice, and they can deteriorate without practice. LLMs 
are sometimes compared to electronic calculators (Geuter, 2024; Voinea 
et al., 2026), which have greatly increased the speed and accuracy of 
everyday calculations. The concomitant reduction in the need for simple 
mental arithmetic may have led to a decrease in our average mental 
arithmetic skills - but it freed up time to engage in potentially more 
complex and creative tasks. At the same time, our collective 
understanding of simple arithmetic has arguably not declined because the 
arithmetic rules by which calculators operate are transparent, precise 
and can be looked up in reliable sources anytime we need them (Sloman & 
Fernbach, 2017).

The situation is different in several ways for LLMs. They are being used 
to replace complex and creative tasks that draw on our capacity for 
critical thinking (Reuters, 2026). They have the feature of producing 
seemingly plausible but imprecise and sometimes wildly inaccurate 
output, and they are intransparent about their sources - although their 
training data tends to include any information from the internet, 
however unreliable and regardless of legal requirements for source 
acknowledgment (Blau et al., 2024; Gewirtz, 2025; Meyer, 2025). For 
example, if asked for a solution to Lord’s paradox (Lord, 1967), a LLM 
might produce different output each time it is asked, and every time the 
output may sound plausible but may be justified in part by false or 
nonexistent evidence that is difficult to detect by nonexperts in the 
field (Fisher, 2021; Walters & Wilder, 2023).

The literature on the impact of LLMs on thinking and understanding is 
still very new and preliminary. But some studies have pointed to reduced 
task engagement and learning when relying on LLMs (Melumad, 2025; Shen, 
2026; Stadler et al., 2024); and based on the existing literature on 
cognition we can expect the principle "use it or lose it" to apply here 
too (Bainbridge, 1983; Furman, 2025; Mızrak, 2020). In contrast to the 
calculator example, what we risk undermining in this case is our 
capacity for critical thinking, and the source reliability and 
transparency on which our collective understanding depends. This comes 
in addition to LLM enabled mass production of slop, mis- and 
disinformation (Clark & Lewandowsky, 2026; Furman, 2025; Köbis & 
Doležalová, 2021; Perfors, 2025; Thorp, 2026).

Technology is arguably not value neutral, and the ways in which current 
LLMs have been built and deployed risk undermining not only our thinking 
and understanding as individuals but also our participation as active, 
diverse citizens in democratic decision making processes (Kant, 1784; 
Lewandowsky & Hertwig, 2025; Lewandowsky & Garcia, 2026). Huxley’s 
dystopic novel Brave New World (Huxley, 1932) might reflect a luddite 
position, which might sound pejorative in first instance. But it 
illustrates that technology can take us in different directions towards 
different societal goals, which are worth thinking about.

There Are no Shortcuts to Understanding
Understanding doesn’t work without thinking, which is often hard, 
cumbersome and full of errors. It will also keep trapping us in 
illusions, as Shiffrin et al. point out. But there is no free lunch to 
understanding. If we keep working on it we have reason to expect to keep 
escaping some of the illusions and increase our understanding over time 
- following the positive side of the "use it or lose it" principle. Some 
uses of LLMs may not undermine understanding, and in some cases we can 
avoid illusions by making an active decision about which parts of our 
thought processes, if any, to replace with their output.

https://doi.org/10.1007/s42113-026-00288-6

Daniela Tafani

tags

participants (1)