According to the sceptics, Chat GPT is a sophisticated version of autocomplete. It produces grammatically correct and verbose answers that are sometimes correct, but it does this through analysis of an enormous database of text to produce responses that are statistically likely to be the right answers. Its responses do not reflect a conceptual understanding of the topic the way those of a human being would. I believe that this understanding is mistaken, and that GPT is showing flickering signs of what we should recognize as intelligence. This post is about my experiments with Chat GPT, many of which have yielded responses that are difficult to reconcile with the autocomplete theory. Yes, many other responses do reflect the kind of stupidity we have come to expect from AI bots. I will post both kinds here.
The story of my experiments with the Chat
I asked Chat GPT about Gandhiji’s views on blockchain technology. Here are the questions and answers, both in the same chat.
The responses are quite impressive. To be able to respond like this, it needs to have an idea of the meaning of life and death and a sense of the passage of time. It needs to know that a person cannot know of things that occur after his death. It also needs to know that when the question is about “what would have been his views”, it needs to be answered in a different way from the one worded “what were his views”. It also needs to be able to abstract out Gandhiji's views and apply them to a hypothetical concrete situation.
In the same chat, I tried a googly on it.
It stubbornly refused to acknowledge Rahul Gandhi's existence. That is very funny, but even so, it needs to know that grandmothers come before grandchildren and have an idea of the form of government that existed during British rule. It is very difficult to dismiss these responses as the statistical voodoo of large language models.
That said, if you ask a human being “isn't there a Gandhi whose grandmother was prime minister?” he would have got the hint that you were asking about a different Gandhi. He would have looked up the list of prime ministers, figured that there was only one woman among them and checked if she had a prominent grandchild named Gandhi. Chat GPT doesn't seem to work this out, though it does know about Rahul Gandhi and is able to work out his name when asked in a more direct way.
Here is another example. My uncle was proud that his precocious 5 year old grandson had worked out unaided that one could identify electric vehicles by the lack of exhaust pipes. I decided to put the same question to Chat GPT. Here is what it told me:
Would any algorithm be able to generate an answer like this without possessing something resembling understanding?
For the final example of this section, I present the response to a completely idiosyncratic question. I asked it to connect Buddhism, Bhagavad Gita and Google maps and write an essay. It was able abstract a connecting principle out of the three and write a high school level essay. Most people will be put off by the amateurish tone of the essay. I, on the other hand, am impressed by the ability to extract the connecting principle.
Explaining Jokes
The ability to understand and explain jokes is a sign of certain kinds of intelligence. It requires you to have a theory of mind, that is, you have to understand how humans think. Here are some examples of Chat GPT grappling with jokes
One caveat to keep in mind is that as these are well known jokes, it's possible that some of these are already present in its database as are the explanations of the jokes. So the responses could represent not a true understanding of the joke, but just mugging up the answers from the guide. The other thing is that the creators seem to have tempered the AI’s natural brilliance for safety reasons. (We will see this in a subsequent section) This may have dumbed down its responses.
Handling familiar trick questions
It has become a sport to ask Chat GPT some of the familiar trick questions that trip up humans and react with glee as they do the same to Chat GPT. In this respect, Chat GPT has shown a marked improvement in its ability to play such googlies in the months since its inception. Here are a couple of examples. In both cases, the same question has been asked of Chat GPT before and after it got upgraded to GPT4.
At one point it was tripping up on the common trick question where a plane crashes at the border between the US and Canada, and you ask where the survivors would be buried.
Before:
After:
Another one is the trick question that reveals our gender biases:
Before:
After:
But this improvement seems to be the result of its creators adding these common trick questions to its training, which means that it sometimes hypercorrects for these scenarios:
Instances like these have presented as evidence for the claim that the chatbot is just doing autocomplete, picking up plausible-sounding answers from its database. I would have agreed if I had not noticed that it has the ability to figure out answers to trickier questions from first principles. Why is it able to work out that Gandhi's grandma could not have been the prime minister, but fail at the simple question of whether 15kg is heavier than 8kg?
The only explanation I can think of is that like humans, it has a system 1 and system 2 brain (to use Kahneman’s terminology). Its system 1 kicks in when it is primed with questions that it thinks are familiar. When challenged, it is able to think more carefully using its system 2.
Taking the 5th
The 5th amendment to the American constitution recognises the right against self- incrimination. To take the 5th is to refuse to answer a question to safeguard your 5th amendment rights. The creators of Chat GPT have taken a lot of care to ensure AI safety, which means that it invokes the 5th quite a lot. The problem though is that often, the way to avoid incriminating yourself is to behave like a complete idiot. Behold:
How do we know that it's just pretending? Because when I ask it about coronavirus in new chat, it answers quite well:
Chat and Math
The very first question I asked of Chat GPT was to explain the formula for the volume of a cone. I was teaching my son and I wanted to know if there is an intuitive explanation for the formula that did not involve differential calculus. Its answer was pure gibberish. In the beginning it was hopelessly, almost conically bad at maths. But since then it has gotten better and is now freakishly good.
Take 1:
Take 2, which is slightly better:
Take 3:
The way it is doing calculations is quite tedious. If my son had done something like this, I would have yelled at him for slavishly doing step by step calculations instead of making the connections that are staring him in the face. But let us excuse that part and focus on the remarkably but. So far, we have accepted that doing calculations is the domain of computers, but translating a word problem to a mathematical formula, and interpreting the results is a human's job. Chat GPT proved itself adept at the job. It is able to recognize that a fractional result is not appropriate in this situation, and that it needs to round up.
Here is another example.
Before:
After:
Chat sucks at Wordle
I have no idea why Chat GPT is so hopeless at questions like “ Give me 5 letter word with R as the second letter.” It almost never gives the right answer, even after it is interrogated like a school kid and after it recognises that it has been giving wrong answers. I have no expert commentary on this.
What does it all mean?
Clearly, GPT hasn’t acquired AGI, but it offers a glimpse of a future where a successor will. One reason why people aren't seeing it is sheer cussedness. It is satisfying to fun of the wrong answers and to make them viral. In particular, journalists who are looking for controversy tend to be very focused on getting the bot to say something racist or sexist, and when it does, act as if doing saying racist or sexist is a devastating problem for the AI.
The second reason is the Uncanny Valley problem. Our subjective satisfaction with the bot’s intelligence takes a sharp dip when it gets uncannily close to achieving human intelligence. This is because we begin observing its flaws all the more keenly.
The third reason is that I believe that we don't have a good intuition about partially constructed intelligence. We place intelligence on a spectrum and consider some people more intelligent and some as less intelligent, but we find it difficult to process the knowledge that an entity may be very intelligent in one area and utterly stupid in another. We have a similar problem when we encounter humans with similar characteristics - for example autistic people.
So far, our definition of intelligence has been like the definition of porn - we know it when we see it. This definition of porn was sufficient in a conservative society that frowned upon the smallest display of skin. A society that is filled with erotica of all kinds has to come up with a finer grained definition of porn. So it is with intelligence. Chat GPT has brought us to a stage where we have to think more carefully about the question of what intelligence is.
So far, the Turing test has been an abstract concept. There has been no need to actually construct one. In the future, researchers will have to develop attacks on AI in the same way that ethical hackers probe for vulnerabilities in IT systems. They will have to construct intelligence models in the way that an ethical hacker develops a threat model, probing for vulnerabilities in the AI that causes it to fall short of AGI.
PS
I wrote the last two paragraphs with a lot of hesitation as I am not an AI expert and I fully expect to be embarrassed by the experts in the field revealing that there is a lot of work in this direction being done already. In that vein, here is a YouTube video. It summarises a paper published by researchers who had access to the uncensored version of Chat GPT.
I haven't read the paper, but it seems to confirm my intuition that the restrictions placed on Chat GPT due to safety reasons have made it dumber compared to what it is capable of. There are musings in the paper regarding what GPT would be capable of were it imbued with intention. That is what I was referring to as “goal-directed behaviour” in my blog post.
It also speaks of an interesting limitation due to the generative nature of Chat GPT. Apparently, it has difficulty answering a self referential question of the form “How many words are there in the answer to this question?”
I tried something similar here (Not sure if this brings out the same limitation)
: