Google I/O 2024 - Here are the AI highlights Google revealed

Google’s I/O 2024 occasion kicked off on Tuesday with a number of new AI product developments introduced.

OpenAI could have tried to upstage Google with the discharge of GPT-4o on Monday, however the Google I/O 2024 keynote was stuffed with thrilling bulletins.

Right here’s a have a look at the standout AI developments, new instruments, and prototypes Google is experimenting with.

- Advertisement -

Ask Photographs

Google Photographs, Google’s photograph storage and sharing service, will likely be searchable utilizing pure language queries with Ask Photographs. Customers can already seek for particular gadgets or folks of their pictures however Ask Photographs takes this to the subsequent stage.

Google CEO Sundar Pichai confirmed how you would use Ask Photographs to remind you what your automobile’s license plate quantity was or present suggestions on how a baby’s swimming capabilities had progressed.

Powered by Gemini, Ask Photographs understands context throughout pictures and might extract textual content, create spotlight compilations, or reply queries about saved pictures.

With greater than 6 billion pictures uploaded to Google Photographs every day, Ask Photographs will want an enormous context window to be helpful.

What in case your pictures might reply your questions? 🤔 At #GoogleIO at the moment, we introduced Ask Photographs, a brand new Google Photographs characteristic that does simply that. Ask Photographs is the brand new approach to search your pictures with the assistance of Gemini. #AskPhotos https://t.co/KhPeCauFAf pic.twitter.com/3MZg55SgdD
— Google Photographs (@googlephotos) Might 14, 2024

- Advertisement -

Gemini 1.5 Professional

Pichai introduced that Gemini 1.5 Professional with a 1M token context window will likely be accessible to Gemini Superior customers. This equates to round 1,500 pages of textual content, hours of audio, and a full hour of video.

Builders can join a waitlist to strive Gemini 1.5 Professional with a formidable 2M context window which is able to quickly be usually accessible. Pichai says that is the subsequent step in Google’s journey towards the last word purpose of infinite context.

Gemini 1.5 Professional has additionally had a efficiency enhance in translation, reasoning, and coding and will likely be really multimodal with the power to research uploaded video and audio.

“It nailed it.”
“This modifications all the things.”
“It’s a mindblowing expertise.”
“I felt like I had a superpower.”
“That is going to be wonderful.”
Hear from builders who’ve been attempting out Gemini 1.5 Professional with a 1 million token context window. #GoogleIO pic.twitter.com/odOfI4lvOL
— Google (@Google) Might 14, 2024

Google Workspace

The expanded context and multimodal capabilities allow Gemini to be extraordinarily helpful when built-in with Google Workspace.

Customers can use pure language queries to ask Gemini questions associated to their emails. The demo gave an instance of a guardian asking for a abstract of current emails from their little one’s faculty.

Gemini can even be capable of extract highlights from and reply questions on Google Meet conferences of as much as an hour.

NotebookLM – Audio Overview

Google launched NotebookLM final 12 months. It permits customers to add their very own notes and paperwork which NotebookLM turns into an knowledgeable on.

- Advertisement -

That is extraordinarily helpful as a analysis information or tutor and Google demonstrated an experimental improve known as Audio Overview.

Audio Overview makes use of the enter supply paperwork and generates an audio dialogue based mostly on the content material. Customers can be a part of the dialog and use speech to question NotebookLM and steer the dialogue.

NotebookLM! Love this undertaking a lot, the AI powered Arcades Challenge. With the multimodality of Gemini Professional 1.5, it could actually robotically create audio discussions of the supply materials you’ve added to your sources. pic.twitter.com/IhhSfj8AqR
— Dieter Bohn (@backlon) Might 14, 2024

There’s no phrase on when Audio Overview will likely be rolled out but it surely may very well be an enormous assist for anybody wanting a tutor or sounding board to work via an issue.

Google additionally introduced LearnLM, a brand new household of fashions based mostly on Gemini and fine-tuned for studying and schooling. LearnLM will energy NotebookLM, YouTube, Search, and different academic instruments to be extra interactive.

The demo was very spectacular however already it looks as if among the errors Google made with its authentic Gemini launch movies crept into this occasion.

The notebooklm demo just isn’t real-time. I want that they had set that expectation with out burying it in a footnote within the tiniest attainable font. pic.twitter.com/tGN5i3fsVD
— Delip Rao e/σ (@deliprao) Might 14, 2024

AI brokers and Challenge Astra

Pichai says that AI brokers powered by Gemini will quickly be capable of deal with our mundane day-to-day duties. Google is prototyping brokers that can be capable of work throughout platforms and browsers.

The instance Pichai gave was of a person instructing Gemini to return a pair of footwear after which having the agent work via a number of emails to search out the related particulars, log the return with the net retailer, and e-book the gathering with a courier.

Demis Hassabis launched Challenge Astra, Google’s prototype conversational AI assistant. The demo of its multimodal capabilities gave a glimpse of the longer term the place an AI solutions questions in real-time based mostly on reside video and remembers particulars from earlier video.

Hassabis mentioned a few of these options would roll out later this 12 months.

For a very long time, we’ve been working in direction of a common AI agent that may be really useful in on a regular basis life. Right now at #GoogleIO we confirmed off our newest progress in direction of this: Challenge Astra. Right here’s a video of our prototype, captured in actual time. pic.twitter.com/TSGDJZVslg
— Demis Hassabis (@demishassabis) Might 14, 2024

Generative AI

Google gave us a peek on the picture, music, and video generative AI instruments it’s been engaged on.

Google launched Imagen 3, its most superior picture generator. It reportedly responds extra precisely to particulars in nuanced prompts and delivers extra photorealistic pictures.

Hassabis mentioned Imagen 3 is Google’s “finest mannequin but for rendering textual content, which has been a problem for picture technology fashions.”

Right now we’re introducing Imagen 3, DeepMind?ref_src=twsrcpercent5Etfw”>@GoogleDeepMind’s most succesful picture technology mannequin but. It understands prompts the way in which folks write, creates extra photorealistic pictures and is our greatest mannequin for rendering textual content. #GoogleIO pic.twitter.com/6bjidsz6pJ
— Google (@Google) Might 14, 2024

Music AI Sandbox is an AI music generator that’s designed as an expert collaborative music creation instrument, moderately than a full monitor generator.

Along with @YouTube, we’ve been constructing Music AI Sandbox, a collection of AI instruments to rework how music will be created. 🎵
To assist us design and take a look at them, we’ve been working carefully with musicians, songwriters and producers. ↓ #GoogleIO pic.twitter.com/pMLa3aCveu
— Google DeepMind (@GoogleDeepMind) DeepMind/standing/1790435413682975043?ref_src=twsrcpercent5Etfw”>Might 14, 2024

Veo is Google’s video generator that turns textual content, picture, or video prompts into minute-long clips at 1080p. It additionally permits for textual content prompts to make video edits. Will Veo be nearly as good as Sora?

Google will roll out its SynthID digital watermarking to textual content, audio, pictures, and video.

Trillium

All these new multimodal capabilities want plenty of processing energy to coach the fashions. Pichai unveiled Trillium, the sixth iteration of its Tensor Processing Items (TPUs). Trillium delivers greater than 4 occasions the compute of the earlier TPU technology.

Trillium will likely be accessible to Google’s cloud computing prospects later this 12 months and can make NVIDIA’s Blackwell GPUs accessible in early 2025.

AI Search

Google will combine Gemini into its search platform because it strikes towards utilizing generative AI in answering queries.

With AI Overview a search question leads to a complete reply collated from a number of on-line sources. This turns Google Search into extra of a analysis assistant than merely discovering a web site that will include the reply.

Gemini permits Google Search to make use of multistep reasoning to interrupt down advanced multipart questions and return essentially the most related data from a number of sources.

Gemini’s video understanding will quickly permit customers to make use of a video to question Google Search.

This will likely be nice for customers of Google Search, but it surely’ll seemingly end in lots much less visitors for the websites from which Google will get the data.

That is Search within the Gemini period. #GoogleIO pic.twitter.com/JxldNjbqyn
— Google (@Google) Might 14, 2024

And also you’ll additionally be capable of ask questions with video, proper in Search. Coming quickly. #GoogleIO pic.twitter.com/zFVu8yOWI1
— Google (@Google) Might 14, 2024

Gemini 1.5 Flash

Google introduced a light-weight, cheaper, quick mannequin known as Gemini 1.5 Flash. Google says the mannequin is “optimized for narrower or high-frequency duties the place the velocity of the mannequin’s response time issues essentially the most.”

Gemini 1.5 Flash will value $0.35 per million tokens, lots lower than the $7 you’d should pay to make use of Gemini 1.5 Professional.

Every of those developments and new merchandise deserves a put up of its personal. We’ll put up updates as extra data turns into accessible or once we get to strive them out ourselves.