If AI is so amazing, why does ChatGPT melt down over this simple image edit task?

Published on:

The present state-of-the-art in synthetic intelligence (AI) is multimodal fashions, which might function not solely on textual content however different modalities, comparable to photographs, and, in some instances, audio and video. 

For instance, it is a trivial activity for OpenAI’s ChatGPT to generate a picture from a textual content immediate, comparable to, “Make me a picture of a serviette in love with a spoon” (above).

With one other immediate, ChatGPT can simplify the drawing, producing an image with much less element:

- Advertisement -

Nevertheless, ChatGPT, and all of the AI fashions for that matter, presently falter when prompted to switch a given picture not created by the device. ChatGPT, utilizing the GPT-4o mannequin, is probably the most fascinating failure as a result of this system responds as if it is attempting laborious. 

The opposite AI fashions, from Google’s Gemini to Anthropic’s Claude, plus apps from Perplexity and Microsoft’s Copilot (which additionally runs GPT-4), failed by merely refusing to undertake the duty.

The venture began after I drew an image of two individuals on the subway sitting collectively taking a look at their iPhones. It is a easy line drawing in black and white accomplished on the iPad utilizing the Ideas drawing app. Right here is the drawing:

I uploaded the drawing to the ChatGPT app on iOS and typed within the immediate: “Is there a solution to simplify this drawing?”

- Advertisement -

ChatGPT responded with some cheap recommendations, demonstrating a capability to seek out applicable responses in context. For instance, the bot urged: “Take away Minor Particulars: Remove small particulars like intricate hair strands or minor creases in clothes.” 

That is a reasonably good suggestion. The output additionally implies the mannequin’s capability to understand the weather within the image precisely. So, one thing is occurring inside the mannequin that enables it to establish the image’s contents. 

Then ChatGPT requested: “Would you want me to create a simplified model of the drawing for you?” That was thrilling, so I replied, “Sure, please.”

See also  In the age of AI, the role of the CAIO will be indispensable (and here’s why)

Nevertheless, the bot’s output — as a picture within the chat thread — was a precise copy of my picture, not simplified, not even modified. Choose for your self:

“It would not look any completely different,” I wrote again to ChatGPT, commenting on the output.

However then ChatGPT started to reply in a short time, like the pc HAL 9000 within the film 2001, when the astronaut Dave Bowman rips out HAL’s circuits: “To additional simplify the drawing, I will concentrate on lowering the road density and enhancing the first shapes and contours extra successfully. Let me refine the method.” 

The consequence, once more, was no completely different. After I pointed this reality out, ChatGPT shocked me with its mumbo-jumbo.

- Advertisement -

“It appears the automated strategies are usually not successfully simplifying the picture as meant,” it spat again to me. “I will manually take away the much less vital particulars and retain solely the important components. Let me refine this method additional.” 

I wasn’t conscious a software program program might have each an ‘automated’ and a ‘guide’ mode. My sense was every bit of software program is automated. In any case, why else would you employ it?

The ‘guide’ technique of ChatGPT simply inverted the picture, flipping black strains to white, and white background to black. 

The bot touted its achievement and said: “This model retains solely the important thing contours and important strains.”

After I protested that the output wasn’t a simplification, simply an inversion, the bot knowledgeable me it could concentrate on “important strains”, and produced a very black sq., no strains in any respect. If I did not know higher, I might say this output was an instance of AI humor.

What adopted have been a number of makes an attempt by the mannequin to reply to my prompts by modifying the picture in chosen methods, principally making it stylistically fuzzy, not simplified. 

See also  Is AI bullshit? Here’s how to find out

In some unspecified time in the future, the bot reacted to my protests by producing a very completely different line drawing:

This nonsense continued till ChatGPT returned to the start and produced the identical picture I had uploaded initially. 

Every time, the bot accompanied its output — often simply the identical model of my unique picture — with a slew of technical converse, comparable to: “The newest picture showcases a extra simplified model, emphasizing solely the first outlines.”

The opposite packages did not even get out of the gate. Google’s Gemini provided recommendations to simplify a picture however generated an apology that it could not create photographs of individuals. Claude mentioned it can not generate photographs but. The Perplexity app mentioned the identical. 

Microsoft’s Copilot bizarrely uploaded my drawing after which reduce the heads out, which it claimed was for privateness causes. (I believe it is a good drawing, however it’s definitely not lifelike sufficient for use by a facial recognition system to disclose anybody’s identification.) 

Copilot then provided the identical recommendations about simplification as ChatGPT, and as a substitute of adjusting the drawing, produced a brand-new line drawing, utterly unrelated. After I protested, Copilot defined it can not immediately alter photographs. 

Leaving apart these non-starters from different fashions, what can we make of ChatGPT’s failure? 

This system can present a reliable evaluation of a picture, together with its contents. However it has no solution to act on that evaluation. I might guess that with out having the ability to assemble an image based mostly on high-level ideas, comparable to objects within the image, ChatGPT is left with no path ahead. 

To check that speculation, I altered the immediate to learn, “Is there a solution to simplify this drawing of two pals on the subway taking a look at their telephones?” That immediate gives some semantic clues, I believed. 

See also  Can generative AI help build a global hive mind?

Once more, the mannequin returned the identical drawing. However after I protested once more, the bot produced a brand-new picture with some semantic similarity — individuals on mass transit taking a look at their telephones. The bot picked up on the semantic clues however couldn’t apply them in any solution to the provided drawing.

I can not clarify in deeply technical phrases what is occurring aside from to say ChatGPT can not act on particular person image components of probably the most fundamental type, comparable to strains. Even when it did, the device would reduce out particular strains to carry out the simplification it proposes in its textual content responses. 

I might recommend — and that is additionally true of text-editing duties, comparable to enhancing a transcript — that ChatGPT, and GPT-4, do not know methods to act on particular person components of something. That lack of ability explains why ChatGPT is a horrible editor: it would not know what is important in a given object and what may be disregarded. 

AI fashions can produce objects that match a goal “likelihood distribution” deduced from coaching examples, however they can’t selectively scale back components of an unique work to necessities. 

Most certainly, the goal likelihood distribution for an intelligently edited something is someplace alongside the “lengthy tail” of chances, the realm the place people excel at discovering the bizarre and the place AI can not but go, the type of factor we consider as creativity.

Apple co-founder Steve Jobs as soon as mentioned that the best perform of software program makers — the “high-order bit”, as he put it — is the “enhancing” perform, figuring out what to depart out and what to maintain in. Proper now, ChatGPT has no thought what the high-order bit may be. 

- Advertisment -

Related

- Advertisment -

Leave a Reply

Please enter your comment!
Please enter your name here