Google I/O 2024 – Here are the AI highlights Google revealed

Published on:

Google’s I/O 2024 occasion kicked off on Tuesday with a number of new AI product developments introduced.

OpenAI could have tried to upstage Google with the discharge of GPT-4o on Monday, however the Google I/O 2024 keynote was stuffed with thrilling bulletins.

Right here’s a have a look at the standout AI developments, new instruments, and prototypes Google is experimenting with.

- Advertisement -

Ask Photographs

Google Photographs, Google’s photograph storage and sharing service, will likely be searchable utilizing pure language queries with Ask Photographs. Customers can already seek for particular gadgets or folks of their pictures however Ask Photographs takes this to the subsequent stage.

Google CEO Sundar Pichai confirmed how you would use Ask Photographs to remind you what your automobile’s license plate quantity was or present suggestions on how a baby’s swimming capabilities had progressed.

Powered by Gemini, Ask Photographs understands context throughout pictures and might extract textual content, create spotlight compilations, or reply queries about saved pictures.

- Advertisement -

With greater than 6 billion pictures uploaded to Google Photographs every day, Ask Photographs will want an enormous context window to be helpful.

Gemini 1.5 Professional

Pichai introduced that Gemini 1.5 Professional with a 1M token context window will likely be accessible to Gemini Superior customers. This equates to round 1,500 pages of textual content, hours of audio, and a full hour of video.

Builders can join a waitlist to strive Gemini 1.5 Professional with a formidable 2M context window which is able to quickly be usually accessible. Pichai says that is the subsequent step in Google’s journey towards the last word purpose of infinite context.

Gemini 1.5 Professional has additionally had a efficiency enhance in translation, reasoning, and coding and will likely be really multimodal with the power to research uploaded video and audio.

Google Workspace

The expanded context and multimodal capabilities allow Gemini to be extraordinarily helpful when built-in with Google Workspace.

See also  Deceptive AI: Exploiting Generative Models in Criminal Schemes

Customers can use pure language queries to ask Gemini questions associated to their emails. The demo gave an instance of a guardian asking for a abstract of current emails from their little one’s faculty.

Gemini can even be capable of extract highlights from and reply questions on Google Meet conferences of as much as an hour.

- Advertisement -

NotebookLM – Audio Overview

Google launched NotebookLM final 12 months. It permits customers to add their very own notes and paperwork which NotebookLM turns into an knowledgeable on.

That is extraordinarily helpful as a analysis information or tutor and Google demonstrated an experimental improve known as Audio Overview.

Audio Overview makes use of the enter supply paperwork and generates an audio dialogue based mostly on the content material. Customers can be a part of the dialog and use speech to question NotebookLM and steer the dialogue.

There’s no phrase on when Audio Overview will likely be rolled out but it surely may very well be an enormous assist for anybody wanting a tutor or sounding board to work via an issue.

Google additionally introduced LearnLM, a brand new household of fashions based mostly on Gemini and fine-tuned for studying and schooling. LearnLM will energy NotebookLM, YouTube, Search, and different academic instruments to be extra interactive.

The demo was very spectacular however already it looks as if among the errors Google made with its authentic Gemini launch movies crept into this occasion.

AI brokers and Challenge Astra

Pichai says that AI brokers powered by Gemini will quickly be capable of deal with our mundane day-to-day duties. Google is prototyping brokers that can be capable of work throughout platforms and browsers.

See also  Six ways to eliminate the unhelpful "AI Overviews" in Google search results

The instance Pichai gave was of a person instructing Gemini to return a pair of footwear after which having the agent work via a number of emails to search out the related particulars, log the return with the net retailer, and e-book the gathering with a courier.

Demis Hassabis launched Challenge Astra, Google’s prototype conversational AI assistant. The demo of its multimodal capabilities gave a glimpse of the longer term the place an AI solutions questions in real-time based mostly on reside video and remembers particulars from earlier video.

Hassabis mentioned a few of these options would roll out later this 12 months.

Generative AI

Google gave us a peek on the picture, music, and video generative AI instruments it’s been engaged on.

Google launched Imagen 3, its most superior picture generator. It reportedly responds extra precisely to particulars in nuanced prompts and delivers extra photorealistic pictures.

Hassabis mentioned Imagen 3 is Google’s “finest mannequin but for rendering textual content, which has been a problem for picture technology fashions.”

Music AI Sandbox is an AI music generator that’s designed as an expert collaborative music creation instrument, moderately than a full monitor generator.

Veo is Google’s video generator that turns textual content, picture, or video prompts into minute-long clips at 1080p. It additionally permits for textual content prompts to make video edits. Will Veo be nearly as good as Sora?

See also  NATURAL PLAN: Benchmarking LLMs on natural language planning

Google will roll out its SynthID digital watermarking to textual content, audio, pictures, and video.



All these new multimodal capabilities want plenty of processing energy to coach the fashions. Pichai unveiled Trillium, the sixth iteration of its Tensor Processing Items (TPUs). Trillium delivers greater than 4 occasions the compute of the earlier TPU technology.

Trillium will likely be accessible to Google’s cloud computing prospects later this 12 months and can make NVIDIA’s Blackwell GPUs accessible in early 2025.

AI Search

Google will combine Gemini into its search platform because it strikes towards utilizing generative AI in answering queries.

With AI Overview a search question leads to a complete reply collated from a number of on-line sources. This turns Google Search into extra of a analysis assistant than merely discovering a web site that will include the reply.

Gemini permits Google Search to make use of multistep reasoning to interrupt down advanced multipart questions and return essentially the most related data from a number of sources.

Gemini’s video understanding will quickly permit customers to make use of a video to question Google Search.

This will likely be nice for customers of Google Search, but it surely’ll seemingly end in lots much less visitors for the websites from which Google will get the data.

Gemini 1.5 Flash

Google introduced a light-weight, cheaper, quick mannequin known as Gemini 1.5 Flash. Google says the mannequin is “optimized for narrower or high-frequency duties the place the velocity of the mannequin’s response time issues essentially the most.”

Gemini 1.5 Flash will value $0.35 per million tokens, lots lower than the $7 you’d should pay to make use of Gemini 1.5 Professional.

Every of those developments and new merchandise deserves a put up of its personal. We’ll put up updates as extra data turns into accessible or once we get to strive them out ourselves.

- Advertisment -


- Advertisment -

Leave a Reply

Please enter your comment!
Please enter your name here