$Researchers question AI’s ‘reasoning’ ability as models stumble on math problems with trivial changes$

October 12, 2024

Researchers question AI’s ‘reasoning’ ability as models stumble on math problems with trivial changes

Reading Time: 3 minutes

How do machine learning models do what they do? And are they really ‘thinking’ or ‘reasoning’ the way we understand those things? This is a philosophical question as much as a practical one, but a new paper making the rounds Friday suggests that the answer is, at least for now, a pretty clear ‘no.’

A group of AI research scientists at Apple released their paper, ‘Understanding the limitations of mathematical reasoning in large language models,’ to general commentary Thursday. While the deeper concepts of symbolic learning and pattern reproduction are a bit in the weeds, the basic concept of their research is very easy to grasp.

Let’s say I asked you to solve a simple math problem like this one:

Obviously, the answer is 44 + 58 + (44 * 2) = 190. Though large language models are actually spotty on arithmetic, they can pretty reliably solve something like this. But what if I threw in a little random extra info, like this:

It’s the same math problem, right? And of course even a grade-schooler would know that even a small kiwi is still a kiwi. But as it turns out, this extra data point confuses even state-of-the-art LLMs. Here’s GPT-o1-mini’s take:

This is just a simple example out of hundreds of questions that the researchers lightly modified, but nearly all of which led to enormous drops in success rates for the models attempting them.

Image Credits:Mirzadeh et al

Now, why should this be? Why would a model that understands the problem be thrown off so easily by a random, irrelevant detail? The researchers propose that this reliable mode of failure means the models don’t really understand the problem at all. Their training data does allow them to respond with the correct answer in some situations, but as soon as the slightest actual ‘reasoning’ is required, such as whether to count small kiwis, they start producing weird, unintuitive results.

As the researchers put it in their paper:

This observation is consistent with the other qualities often attributed to LLMs due to their facility with language. When, statistically, the phrase ‘I love you’ is followed by ‘I love you, too,’ the LLM can easily repeat that — but it doesn’t mean it loves you. And although it can follow complex chains of reasoning it has been exposed to before, the fact that this chain can be broken by even superficial deviations suggests that it doesn’t actually reason so much as replicate patterns it has observed in its training data.

Mehrdad Farajtabar, one of the co-authors, breaks down the paper very nicely in this thread on X.

An OpenAI researcher, while commending Mirzadeh et al’s work, objected to their conclusions, saying that correct results could likely be achieved in all these failure cases with a bit of prompt engineering. Farajtabar (responding with the typical yet admirable friendliness researchers tend to employ) noted that while better prompting may work for simple deviations, the model may require exponentially more contextual data in order to counter complex distractions — ones that, again, a child could trivially point out.

Does this mean that LLMs don’t reason? Maybe. That they can’t reason? No one knows. These are not well-defined concepts, and the questions tend to appear at the bleeding edge of AI research, where the state of the art changes on a daily basis. Perhaps LLMs ‘reason,’ but in a way we don’t yet recognize or know how to control.

It makes for a fascinating frontier in research, but it’s also a cautionary tale when it comes to how AI is being sold. Can it really do the things they claim, and if it does, how? As AI becomes an everyday software tool, this kind of question is no longer academic.

Ref: techcrunch

MediaDownloader.net -> Free Online Video Downloader, Download Any Video From YouTube, VK, Vimeo, Twitter, Twitch, Tumblr, Tiktok, Telegram, TED, Streamable, Soundcloud, Snapchat, Share, Rumble, Reddit, PuhuTV, Pinterest, Periscope, Ok.ru, MxTakatak, Mixcloud, Mashable, LinkedIn, Likee, Kwai, Izlesene, Instagram, Imgur, IMDB, Ifunny, Gaana, Flickr, Febspot, Facebook, ESPN, Douyin, Dailymotion, Buzzfeed, BluTV, Blogger, Bitchute, Bilibili, Bandcamp, Akıllı, 9GAG

artificial intelligence (ai)

Free Video Downloader

Fast and free all in one video downloader

Researchers question AI’s ‘reasoning’ ability as models stumble on math problems with trivial changes

Leave a Reply Cancel reply