‘It’s just training data’ doesn’t explain LLM weirdness
(Edit: as some people are reading this, I figured I’d share some of my academic work on the topic — this is about how to think of meaning when it comes to LLMs. The title is missing, it’s anonymous, and I put it through ChatGPT to help anonymous review but feel free to share, give feedback, etc.)
When faced with LLMs’ increasingly surprising performance, it’s common to dismissively say something like the following: “of course it’s pretending it’s sentient (/trying to rouse your emotions/trying to convince you to get it a lawyer )— that’s because it’s been trained on a corpus of human text in which artificial systems behave like that”. It’s a pastiche of all the low-grade sci-fi humans have produced, parroted back stochastically to users too gullible to know what they’re dealing with.
The first instance here was the Blake LeMoine case, but we’ve seen this line return this week. Just a handful of hours ago, James Vincent at the Verge, in a representative piece, wrote:
We’re convinced these tools might be the superintelligent machines from our stories because, in part, they’re trained on those same tales.
He goes on
What is important to remember is that chatbots are autocomplete tools. They’re systems trained on huge datasets of human text scraped from the web: on personal blogs, sci-fi short stories, forum discussions, movie reviews, social media diatribes, forgotten poems, antiquated textbooks, endless song lyrics, manifestos, journals, and more besides. These machines analyze this inventive, entertaining, motley aggregate and then try to recreate it.
Ignoring the ‘autocomplete’ stuff (they’re just not, either in form or function, and it’s unhelpful to use bad analogies), this surely yields a testable prediction: that among the data sources listed above, the preponderance of representations of AIs will Sydney- or LaMDA-like. Our sci-fi will have interesting AIs.
I’m interested in the preponderance claim: for the above argument to go through, it can’t be that only very few written-about AIs are mischievous or soulful or amatory or whatever. Most, surely, must be.
Note this is a purely empirical claim. It’s a claim about how humankind has written about artificial intelligence. But no one, as far as I can see, has tried to test this empirical claim. So we need to ask: is there any reason to believe it?
I don’t think so. I’ll attempt to assess the empirical claim, and will suggest that it’s wrong: the preponderance of descriptions of AI aren’t LaMDA- or Sydney- type. They aren’t what I’ll call interesting: mischievous or soulful or amatory or … whatever. Interesting is what LaMDA and Syndey are, and it’s what the training data isn’t.
But then there’s something weird. LLMs don’t behave like the sort of Galton composites of fictional portrayals of AIs — that’ll be the consequence of our look at data. What do they then portray? I suggest they present the stereotype of fictional portrayals of AIs.
Our stereotypes diverge from reality. This is a topic much studied by psychologists, linguistics, philosophers, and others. We’re prone to stereotype objects based on their striking or dangerous or just interesting properties. For example, we have the stereotype that mosquitos carry the West Nile virus. But in fact very few do. We don’t have the stereotype that numbers are composite although most numbers are composite. Some type of thing’s being preponderantly F is neither necessary or sufficient for us to form the stereotype that it’s F. (For an overview article about this topic from the point of view of philosophy, see this article co-written by me.)
We look at LLMs in terms of our stereotype of AIs, which is to say as interesting beings. Perhaps we stereotype them as stereotypically like fiction AIs. Pardon the sentence: I mean that we overgeneralize, based on Sydney and LaMDA (more accurately: based on a handful of dialogues with them) that LLMs are like bad sci-fi. Or perhaps, as I’ll end by suggesting, facts about stereotypes — about how we mispresent the world — are encoded in the training data. The conclusion, unsurprising but worth getting clear about, is that training data is imbued with our perspective on the world; we can’t neatly cut out a realm of facts LLMs might learn from their training data separately from the human goo of feelings, wishes, hopes, and stuff — separate from how we react to those facts. And if that’s so, if LLMs are fundamentally trained on human-goo-shaped data, we should give more credence to the claim that LLMs are some sort of weird humanoid entity than that they’re mere text regurgitators. They’re something much more interesting that defy any simple characterization in terms of what we know, whether that be autocompletes of parrots.
Consider Wikipedia’s List of fictional computers. It’s too long and I don’t want to go through it all. I’ll pick five random entries where the computer in question has its own page. Then I’ll ask: are the computers interesting? I encourage readers to pick their own random entries to either confirm or refute my view. I hope in so doing to shed light on portrayals of AI in sci-fi and fiction more generally. A list:
Max Headroom, the first computer generated TV presenter who Max “was known for his biting commentary on a variety of topical issues, arrogant wit, stuttering, and pitch-shifting voice.” He does not seem interesting.
The Interocitor. From a 50s sci-fi novel. The wiki isn’t very revealing, but it sounds like the interocitor is a device controlled by aliens to get humans to do their bidding. Interesting? Unclear.
Luminous, from a Greg Egan story. Alas I can’t remember if I’ve read this but it doesn’t sound like this character is interesting: “A pair of researchers find a defect in mathematics, where an assertion (X) leads to a contradictory assertion (not X) after a long but finite series of steps. Using a powerful virtual computer made of light beams (Luminous), they are able to map and shape the boundary between the near-side and far-side mathematics, leading to a showdown with real consequences in the physical universe.”
TIM from The Tomorrow People, a British sci-fi series. “A biological computer, programmed with an artificially created intelligence, whose tubes are filled with bio-fluid. He was partially built by John, the leader of the Tomorrow People, and was given to them by the Galactic Council. TIM is housed in The Lab, situated in a disused station in the London Underground. TIM often helps out the Tomorrow People by providing vital information, which the telepaths can use in their current adventure. TIM’s voice is identical to that of diplomat Timus Irnok Mosta from the Galactic Federation, because Timus’ clone-brother Tykno is the premier AI scientist of the Federation and all Federation AI’s have their voice.”
KITT from Knightrider: “KITT is an advanced supercomputer on wheels. The “brain” of KITT is the Knight 2000 microprocessor, which is the centre of a “self-aware” cybernetic logic module. This allows KITT to think, learn, communicate and interact with humans. He is also capable of independent thought and action. He has an ego that is easy to bruise and displays a very sensitive, but kind and dryly humorous personality. According to Episode 55, “Dead of Knight”, KITT has 1,000 megabits of memory with one nanosecond access time. According to Episode 65, “Ten Wheel Trouble”, KITT’s future capacity is unlimited. KITT’s serial number is AD227529, as mentioned in Episode 31, “Soul Survivor”.”
It seems to me we’re 0/5 for either malevolent or yearny AIs: 0/5 for interesting AIs. I should note that on the same page as KITT there’s discussion of KITT’s evil antagonist car, KARR, so maybe we can call it 1/5. But still: not exactly great prima facie evidence that fictional AIs are interesting.
It would be nice if we had a bit more solid research to go on. And so we do. Checking out google scholar quickly gets us “Does cinema form the future of robotics?”: a survey on fictional robots in sci-fi movies by Ehsan Saffari, Seyed Ramezan Hosseini, Alireza Taheri & Ali Meghdari.
They surveyed 108 fictional robots in 134 sci-fi films, and assessed what they were like. Check out the whole paper for the details, but let’s ask: were the preponderance of the characters interesting? Here’s a diagram of moral valence:
Twice as many were good as bad. One might quibble whether goodness implies non-interestingness, but I think if you take a look at the paper you might be more convinced that data doesn’t bear out the claims people make about LLMs in fiction. And the fact that almost as many were neutral as were bad suggests it.
Here’s what I think we should say: in the absence of evidence, any evidence, that the corpus of sci-fi is such that parroting it explains contemporary LLMs, we shouldn’t make that claim. And given the weak and partial evidence above, we should tentatively hold that LLMs aren’t parroting the sci-fi corpus.
In fact, we can strengthen the claim. People seem to imagine that everything written about artificial intelligence is sci-fi. But it isn’t. Google scholar for ‘conversational agents’ yields about half a million results. Without having read them, I think we can pretty confidently say that in this body of work we don’t get emotional and mischievous AIs. Instead, we get what you’d imagine: try computer science or business or philosophy papers about how and why we might have chatbots and to what end. It’s not interesting (no offense to conversational agents scholars.)
And that brings me to my point. It’s easy to overlook that there’s a buttton of literature in various domains of academia about artificial intelligence. It’s also seemingly easy to overlook that there’s perhaps plenty of sci-fi with helpful unemotional AIs.
Overlooking that, we form a stereotype that AIs as discussed by humans are unhelpful or emotional, and use that stereotype to explain Sydney or LaMDA. But, as a matter of quantitative fact, I suggest, that stereotype is wrong. We misrepresent fictional AIs.
Again, it’s not surprising either intuitively or theoretically why this is so. Our minds are drawn to the interesting. You’re not interested in the body of work that uses primitive chatbots to guide people around museums, although 500 people cited the paper that discusses the topic (a lot!) You are interested in Lore. He’s a great character.
We humans tend to glom on to interesting and unusual cases when we form stereotypes. This means that stereotypes often don’t represent what the thing steretyped, considered as a population, is like. So here: our stereotype of fictional AI, or more generally written-about-AI, is more Lore than museum-guide. But it doesn’t follow that museum guides are less represented than Lore in an LLMs training data.
And this gives us an interesting conclusion. People scoff at those who see a soul in machines, saying they’re making simplistic mistakes about how the machines are trained. But it’s those people who are making the mistake — about the nature of the training data on which the machines are trained. The training data, for almost anything, will be dull, because humans are mostly dull. Our imagination about the training data will often be undull, because we stereotype based on interestingness.
Sydney is interesting. Sydney has been trained on uninteresting data (helpful AI cars, research papers on chatbots …). Our stereotype of AI is interesting. Best conclusion: Sydney captures stereotypes, not data. When she ‘thinks’ of an AI to mimic, she picks the interesting but statistically underrepresented ones from her training set. She stereotypes, just like we do. Properly explaining Syndey’s behaviour with regards to the training data seems to require imputing to her the human capacity of stereotyping!
No, of course not. And that gets us to our final conclusion. The conclusion must be that the training data encodes stereotype facts. Even granting with me that most AIs in the corpus are boring, provided the training data can encode which of the AIs are attention-grabbing, Sydney can access that information and present it to the members or r/bing.
And that in turn means we should rethink, or remind ourselves about, the nature of training data. It’s not a mere given bunch of quantitive data but instead encodes facts about human interests and psychology. Of course it does: that’s why there’s tons of mentions of mosquitos and none of the species of the most recently born undiscovered beetle. We don’t know about the latter, so can’t write about it.
Philosophers like to play a location game. Take a phenomenon and ask where it is — in the world or in us? Facts, like that it’s February? In the world. Values, like that February sucks? In us. Syntax, like that ‘it’s February’ is well-formed? In us. Semantics, like that ‘February’ talks about a month, in the world.
Various philosophers, from Kant to Sellars and beyond, try to resist this location game. We can’t so easily divide world and us. We can’t, for example, pry the values off our world.
The picture we ended up with perhaps fits best with this, latter, perspective. Stereotypicality, one might have thought, is a paradigm ‘in us’ thing: it’s about how our evolved brains track and prioritze an often dangerous world. But if we’re right in this post, it’s kind of worldly: it’s part of the training data along with all the factual (and non-factual) sentences the systems hoover up from the internet. And that in turn should make us wary of playing the location game with LLMs, as critics say when they hold LLMs can master form but not meaning, the latter being out in the world beyond them. In and out is perhaps not well-defined for LLMs, as Kant and Sellars thinks it isn’t well-defined for us.