Free Video Downloader

Fast and free all in one video downloader

For Example: https://www.youtube.com/watch?v=OLCJYT5y8Bo

1

Copy shareable video URL

2

Paste it into the field

3

Click to download button


Why is AI so bad at spelling? Because image generators aren’t actually reading text
April 6, 2024

Why is AI so bad at spelling? Because image generators aren’t actually reading text

Reading Time: 4 minutes

AIs are easily acing the SAT, defeating chess grandmasters and debugging code like it’s nothing. But put an AI up against some middle schoolers at the spelling bee, and it’ll get knocked out faster than you can say diffusion.

For all the advancements we’ve seen in AI, it still can’t spell. If you ask text-to-image generators like DALL-E to create a menu for a Mexican restaurant, you might spot some appetizing items like ‘taao,’ ‘burto,’ and ‘enchida’ amid a sea of other gibberish.

‘Image generators tend to perform much better on artifacts like cars and people’s faces, and less so on smaller things like fingers and handwriting,’ said Asmelash Teka Hadgu, co-founder of Lesan and a fellow at the DAIR Institute.

The underlying technology behind image and text generators are different, yet both kinds of models have similar struggles with details like spelling. Image generators generally use diffusion models, which reconstruct an image from noise. When it comes to text generators, large language models (LLMs) might seem like they’re reading and responding to your prompts like a human brain – but they’re actually using complex math to match the prompt’s pattern with one in its latent space, letting it continue the pattern with an answer.

The algorithms are incentivized to recreate something that looks like what it’s seen in its training data, but it doesn’t natively know the rules that we take for granted – that ‘hello’ is not spelled ‘heeelllooo,’ and that human hands usually have five fingers.

‘Even just last year, all these models were really bad at fingers, and that’s exactly the same problem as text,’ said Matthew Guzdial, an AI researcher and assistant professor at the University of Alberta. ‘They’re getting really good at it locally, so if you look at a hand with six or seven fingers on it, you could say, ‘Oh wow, that looks like a finger.’ Similarly, with the generated text, you could say, that looks like an ‘H,’ and that looks like a ‘P,’ but they’re really bad at structuring these whole things together.’

Engineers can ameliorate these issues by augmenting their data sets with training models specifically designed to teach the AI what hands should look like. But experts don’t foresee these spelling issues resolving as quickly.

Some models, like Adobe Firefly, are taught to just not generate text at all. If you input something simple like ‘menu at a restaurant,’ or ‘billboard with an advertisement,’ you’ll get an image of a blank paper on a dinner table, or a white billboard on the highway. But if you put enough detail in your prompt, these guardrails are easy to bypass.

‘You can think about it almost like they’re playing Whac-A-Mole, like, ‘Okay a lot of people are complaining about our hands — we’ll add a new thing just addressing hands to the next model,’ and so on and so forth,’ Guzdial said. ‘But text is a lot harder. Because of this, even ChatGPT can’t really spell.’

On Reddit, YouTube and X, a few people have uploaded videos showing how ChatGPT fails at spelling in ASCII art, an early internet art form that uses text characters to create images. In one recent video, which was called a ‘prompt engineering hero’s journey,’ someone painstakingly tries to guide ChatGPT through creating ASCII art that says ‘Honda.’ They succeed in the end, but not without Odyssean trials and tribulations.

‘One hypothesis I have there is that they didn’t have a lot of ASCII art in their training,’ said Hagdu. ‘That’s the simplest explanation.’

But at the core, LLMs just don’t understand what letters are, even if they can write sonnets in seconds.

‘LLMs are based on this transformer architecture, which notably is not actually reading text. What happens when you input a prompt is that it’s translated into an encoding,’ Guzdial said. ‘When it sees the word ‘the,’ it has this one encoding of what ‘the’ means, but it does not know about ‘T,’ ‘H,’ ‘E.”

That’s why when you ask ChatGPT to produce a list of eight-letter words without an ‘O’ or an ‘S,’ it’s incorrect about half of the time. It doesn’t actually know what an ‘O’ or ‘S’ is (although it could probably quote you the Wikipedia history of the letter).

Though these DALL-E images of bad restaurant menus are funny, the AI’s shortcomings are useful when it comes to identifying misinformation. When we’re trying to see if a dubious image is real or AI generated, we can learn a lot by looking at street signs, t-shirts with text, book pages, or anything where a string of random letters might betray an image’s synthetic origins. And before these models got better at making hands, a sixth (or seventh, or eighth) finger could also be a giveaway.

But, Guzdial says, if we look close enough, it’s not just fingers and spelling that AI gets wrong.

‘These models are making these small, local issues all of the time – it’s just that we’re particularly well tuned to recognize some of them,’ he said.

To an average person, for example, an AI-generated image of a music store could be easily believable. But someone who knows a bit about music might see the same image and notice that some of the guitars have seven strings, or that the black and white keys on a piano are spaced out incorrectly.

Though these AI models are improving at an alarming rate, these tools are still bound to encounter issues like this, which limits the capacity of the technology.

‘This is concrete progress, there’s no doubt about it,’ Hagdu said. ‘But the kind of hype that this technology is getting is just insane.’

Reference: https://techcrunch.com/2024/03/21/why-is-ai-so-bad-at-spelling/

Ref: techcrunch

MediaDownloader.net -> Free Online Video Downloader, Download Any Video From YouTube, VK, Vimeo, Twitter, Twitch, Tumblr, Tiktok, Telegram, TED, Streamable, Soundcloud, Snapchat, Share, Rumble, Reddit, PuhuTV, Pinterest, Periscope, Ok.ru, MxTakatak, Mixcloud, Mashable, LinkedIn, Likee, Kwai, Izlesene, Instagram, Imgur, IMDB, Ifunny, Gaana, Flickr, Febspot, Facebook, ESPN, Douyin, Dailymotion, Buzzfeed, BluTV, Blogger, Bitchute, Bilibili, Bandcamp, Akıllı, 9GAG

Leave a Reply

Your email address will not be published. Required fields are marked *