LLMs are often seen from two rigorously opposite views: as stupid parrots or as evolutionary helpful tools. However, I think these views really show a misunderstanding about what this new technology is.
In this article, we will study both sides of this assumption, and see which is right, which is wrong, and which is open-minded enough to consider both assertions correct…
LLMs, which are the foundation of Generative AIs like ChatGPT, have been trained on huge corpuses of texts, by learning how to predict the end of a sentence while being given the beginning. The only thing they have been taught is to rigorously repeat some human written text (with various qualities by the way…). As an AI expert, we also know that the RLHF phase is also important, but I like to see it more as a filtering fine-tuning step (that helps take away bad answers more than learn new ones) than a real training, so we can safely ignore it in this analysis. Given that, it is perfectly valid to call ChatGPT a stupid parrot, as this is what it is.
However, there is a counterargument to that: how can we explain that it can literally “explain the theory of relativity with the words of Shakespeare” for example? It has never seen this text before, so what is it repeating? In my point of view, it is definitely repeating something.
Just that this thing does not exist. As if you asked “can you answer my previous question, but in French?”: the LLM will output the most probabilistic answer, and this answer is very easy to guess when you can perfectly speak English, French, and have the input sentences. The Shakespeare way is a bit harder, but basically equivalent.
That is why, however evolved they are, LLMs are nothing more than stupid parrots. But parrots that can repeat a text … for the first time.
When I talk or think about Machine Learning, I like to evaluate the knowledge capabilities of the model I am working on, in a theory of information point of view. Let’s forget in this article *what* the LLM knows, and focus on *how much* it knows.
It is usually at that moment that my physics teacher looks over my shoulder, sees something surprising, and gently asks me “have you checked the magnitudes of your numbers?”. If my bullet goes faster than light speed and the current topic is not about general relativity, it means that I probably made a mistake in the calculus…
Let’s obey her then. First thing: the training corpus’s.
OpenAI has not disclosed what they have trained their algorithms on. We will thus take the most common datasets used. Real numbers are hard to find, but this gives an idea:
Wikipedia: 100 GB
Stackoverflow: 400 GB
Reddit: unknown, probably between 100GB and 1TB
For a total probably between 1 and 2TB of data.
On the other hand, in 2023 good LLM are around 100 billion parameters, so roughly few hundreds of GB of information (I can hear AI engineers scream, but my physics teacher is happy since she asked for magnitudes, not real number.
But the LLM has a great advantage for it: once it has learned the language itself, it “only” needs to remember the ideas behind the facts. This is possible but depends on the architecture: technically, a series of dense layers fails at it, Recurrent models can, to some extent, learn grammars and link the facts in short sentences, while Transformers and modern attention layers are able to combine long sentences with abstract knowledge (think of attention layers as the capability of linking facts, and dense layers as the “memory” of facts), thus leveraging knowledge capacity to a better optimum for an equivalent weight size.
We can see this capability as a form of compression of information: redundancy of grammatical knowledge is “compressed” and stored only once, and shared between all sentences. If we want to compare scales, we should compare model size to a compressed version of the input corpus’s: this way, Wikipedia text becomes 20-30GB, Stackoverflow 100 GB, and so on. The magnitudes match and the teacher is happy: this time she got me, she just wanted me to validate my hypothesis…
In addition to this parroting capability we have seen that LLMs have the theoretical capability to read and learn the whole set of Wikipedia and Stackoverflow for example, and being able to predict the end of all articles and questions is really what they are good at.
Thus what makes them revolutionary tools? For me, it is the joint capability of understanding questions expressed naturally, looking in their database of knowledge, and answering in an understandable way. Think about this like a search engine, where you don’t need to find the right words but only give an idea of what you are looking for, and don’t need to scroll around pages of results.
My last useful trick: use ChatGPT to generate the words behind an acronym. For example I asked for “an English word that conveys the idea of something huge, starts with an L, but not the word Large”. And it answered exactly the word I was looking for: it did understand my question globally as a concept, used a database of words and their meanings, and was able to give the answer.
By using this kind of tricks, you can definitely leverage the full power of LLMs in general, and ChatGPT in particular : use natural language to ask questions about the database of knowledge that they are. Furthermore… when I asked for a word for my acronym, there is a very interesting thing that allowed me to ensure I was using ChatGPT properly…
We did not talk about hallucinations at all. And I know that when writing an article even slightly positive about LLMs, this is the first counterargument opposed. Indeed, the drawbacks of these models is that they are very good at giving facts that they purely invented, because they cannot properly evaluate the “correctness” of the text produced.
But I also mentioned that there is a way to use LLMs correctly. As a reminder, I asked for “an English word that conveys the idea of something huge, starts with an L, but not the word Large”. This example is very interesting, since it queries the database in a natural way, and the answer is dramatically easy to validate: even though you couldn’t manage to find the word by yourself (did you? ), you know the meaning and check the first letter: you don’t leave space for hallucinations.
For me, this is the key that allows efficient use of LLMs: query them for things that you can validate. That is why I find them very useful as code assistants, and definitely many other things. This is why LLMs are not ready (yet?) to help you learn something. They are however dramatically efficient to help you being faster in something you can already do, or can’t do but are able to validate by yourself.
I hope you enjoyed this article and do not hesitate to reach to us if you want some more! For the next articles, we will try to take you with us in a (probably short) journey about the “consciousness of AI”…