10 Incredibly Creative Things You Can Do With OpenAI’s New GPT-4o

Published on:

The second that AI was not the discuss of the city was the second that we really entered the AI period. It’s turn out to be so naturalized to our society to the purpose that it’s built-in into our training, work, and on a regular basis life. 

Nevertheless, one factor that’s limiting our entry to AI is the dearth of human-computer interplay assist. Solely a handful LLMs supply multimodal assist, and even fewer do it free or precisely. OpenAI may’ve simply solved that problem.

On this article, I’ll be discussing briefly what it’s and a few of my favourite use circumstances to date of this mannequin. 

- Advertisement -

Disclaimer: All video hyperlinks supplied under are courtesy of OpenAI.

What’s GPT-4o?

GPT-4o (“o” stands for omni) is OpenAI’s latest LLM. It’s made to create extra pure human-computer interactions by increasing its multimodal capability and supercharging its nuance. It has a median response time of 320 milliseconds, which is near the human response time.

Listed here are a couple of nifty methods to make use of it:

- Advertisement -

Actual Time Translation

Ever end up misplaced out of the country with none means to speak? OpenAI has you coated.

One in every of GPT-4o’s most important options is its multilingual assist. Together with multimodal inputs, ChatGPT can simply translate from one language to a different quicker and nearly as precisely as any human translator. With a turnaround time of about 232 milliseconds for audio, ChatGPT with 4o could be your finest buddy everytime you’re touring or talking to somebody not fluent in your language.

Assembly AI Assistant

Conferences could be draining. You by no means know while you’re dozing off or when your consideration’s going elsewhere. 

With GPT-4o, you may at all times be up to the mark by utilizing it as an AI assistant for conferences. It might act as a information at any time when somebody asks you a query, take minutes of the assembly to revisit later, or clear up issues when it will get complicated.


This is likely one of the craziest issues I’ve seen from an AI. We’ve all turn out to be accustomed to AI taking inputs in several kinds, however I’ve by no means seen a really multimodal AI to the purpose that it may create beats, alter tone, and truly harmonize to create music. What makes it higher is you can give it extra context as they go alongside to nail the sound you’re on the lookout for.

See also  WriteHuman vs. Undetectable AI: Compared and Reviewed

Full Math Assignments

Okay, I do know an AI that may do assignments isn’t out of the norm at present — however wait till you see what GPT-4o can do.

This new mannequin can reply arithmetic questions in real-time. Utilizing its new desktop app, GPT-4o can take questions within the type of textual content, photos, or video, and act like a tutor by supplying you with the data you’re on the lookout for whereas explaining the way it got here to that reply. You may even slender down your questions in real-time by feeding it additional context.

- Advertisement -

From what I’ve seen, GPT-4o additionally has tremendously improved mathematical reasoning in comparison with earlier GPT fashions. 

Excellent Textual content Era

Gone are the times of imperfect textual content era from DALL-E 3. With GPT-4o, ChatGPT can now write paragraphs of textual content in a picture with little to no errors. You may even tweak the way you need the textual content to look. Let me inform you, as somebody who’s utilizing AI picture mills every day, that is insane.

Character Reference

Midjourney often is the first AI picture generator to launch their character reference function, however I feel OpenAI has them beat on accuracy. Utilizing GPT-4o, DALL-E 3 can now create constant characters based mostly on an enter or a earlier era.

Summarize Info From A Video

Much like the assembly AI assistant, GPT-4o may transcribe and summarize a whole video as enter. To do that, you could enter a video immediately and never a hyperlink. For instance, right here’s a snippet from OpenAI’s showcase of the mannequin’s enter and output:

See also  OpenAI board forms Safety and Security Committee

Actually! The presentation centered on methods for maximizing the efficiency of huge language fashions (LLMs) and was delivered by Colin Jarvis and John Allard from OpenAI. The session was a part of OpenAI’s first developer convention and aimed to supply insights into optimizing LLMs for particular duties.


Colin Jarvis is the pinnacle of OpenAI’s options apply in Europe, and **John Allard** is an engineering lead on the fine-tuning product workforce.

The presentation highlighted the challenges of optimizing LLMs, emphasizing that there isn’t a one-size-fits-all answer.

Challenges of Optimizing LLMs

**Separating Sign from Noise**: It is difficult to establish the precise drawback.

**Summary Efficiency Metrics**: Measuring efficiency could be troublesome.

**Selecting the Proper Optimization**: It is onerous to know which strategy to make use of.

Optimization Stream

The presenters launched a framework for optimizing LLMs based mostly on two axes:

**Context Optimization**: What the mannequin must know.

**LLM Optimization**: How the mannequin must act.

The framework consists of 4 quadrants:

**Immediate Engineering**: The place to begin for optimization.

**Retrieval-Augmented Era (RAG)**: For context optimization.

**Effective-Tuning**: For LLM optimization.

**All the Above**: Combining all methods.

Immediate Engineering


Write clear directions.

Break up complicated duties into easier subtasks.

Give the mannequin time to assume.

Check adjustments systematically.

Good for:

Testing and studying early.

Setting a baseline.

Not good for:

Introducing new info.

Replicating complicated kinds.

Minimizing token utilization.

Retrieval-Augmented Era (RAG)


RAG includes retrieving related paperwork and utilizing them to generate responses.

Good for:

Introducing new info.

Lowering hallucinations.

Not good for:

Embedding broad area information.

Educating new codecs or kinds.

Minimizing token utilization.

Success Story:

The presenters shared a hit story the place they improved accuracy from 45% to 98% utilizing RAG.



Effective-tuning includes persevering with the coaching course of on a smaller, domain-specific dataset.


Improves efficiency on particular duties.

Improves effectivity.

Good for:

Emphasizing present information.

Customizing construction or tone.

Educating complicated directions.

Not good for:

Including new information.

Fast iteration.

Success Story:

The presenters shared a hit story from Canva, the place fine-tuning improved efficiency considerably.

Greatest Practices

**Begin with Immediate Engineering and Few-Shot Studying**.

**Set up a Baseline**.

**Begin Small and Give attention to High quality**.

Combining Effective-Tuning and RAG

The presenters highlighted the advantages of mixing fine-tuning and RAG for optimum efficiency.

Software of Concept

The presenters utilized the speculation to a sensible problem, the Spider 1.0 benchmark, attaining excessive accuracy utilizing each RAG and fine-tuning.


The presentation concluded with a abstract of the optimization circulate and emphasised the significance of iteratively bettering LLM efficiency utilizing the mentioned methods.


The presenters invited questions from the viewers and had been obtainable for additional dialogue.

As somebody who watched the video in its entirety, I can affirm that GPT-4o didn’t miss any key info. It is a enormous evolution in comparison with its earlier iteration.

See also  Watch Apple kick off WWDC 2024 right here

Transcribe Illegible Textual content

Have you ever ever unearthed an outdated piece of paper with textual content you may barely — if in any respect — learn? Let OpenAI do its magic.

GPT-4o combines multimodal assist with enhanced pure language processing to show illegible handwriting into string utilizing contextual understanding. Right here’s an instance from Generative Historical past on Twitter:

Create A Fb Messenger Clone

I used to be searching Twitter final night time and located what is perhaps the largest case for GPT-4o’s improved capabilities. Sawyer Hood from Twitter needed to check this new mannequin by asking it to create a Fb Messenger clone. 

The outcome? It labored. Not solely that, however GPT-4o did all of those in below six seconds. Positive, it’s only a single HTML file — however think about the implications of this in front-end improvement usually.

Perceive Intonation

And now, we’re right down to what I think about GPT-4o’s largest accomplishment, although some won’t agree. Prior to now, LLMs have at all times taken what we feed into them at face worth. They not often think about our tone or phrasing in processing our inputs. 

That’s why I’ve at all times thought-about fashions that may do sarcasm as science fiction. Properly, OpenAI simply proved me mistaken.

All Mentioned And Carried out

There’s a whole lot of discuss Gemini, Claude, and different LLMs probably passing OpenAI by way of nuance and options. Properly, that is OpenAI’s reply to them.

GPT-4o is the primary mannequin I’ve seen that feels really multimodal. Not solely that, but it surely’s additionally solved among the points that plagued GPT-4 up to now by way of being lazy and missing in nuance. 

OpenAI is an organization that’s been means too conversant in controversies up to now, however I’ve a intestine feeling that individuals are going to neglect these quickly with GPT-4o. I can’t wait to see the place OpenAI takes LLMs from right here. At this price, GPT-5 might break the world.Need to be taught extra in regards to the current OpenAI drama? You may learn our article on Sam Altman right here or our different articles like this one.

- Advertisment -


- Advertisment -

Leave a Reply

Please enter your comment!
Please enter your name here