Grok gets an impressive upgrade - and unchecked AI image generation apparently

Elon Musk was an OpenAI investor when the corporate was based in 2015, however he has since not solely severed ties with the corporate but in addition criticized its strategy to political correctness and security. Because of this, Musk launched his personal AI chatbot, Grok, which simply received a fairly large improve.

On Tuesday, xAI, an AI firm based by Musk, introduced the discharge of an early preview of Grok-2, its frontier massive language mannequin (LLM) with superior chat, coding, and reasoning capabilities. The discharge additionally included Grok-2 mini, which, because the identify implies, is a light-weight model of Grok-2.

Previous to this launch, an early model of the Grok-2 was examined within the Giant Mannequin Programs Group (LMSYS) Chatbot Area below the nameless identify “sus-column-r,” a apply many AI corporations do earlier than launching a brand new mannequin.

- Advertisement -

On this crowdsourced platform, customers can consider LLMs by chatting with two fashions facet by facet and evaluating their responses with out understanding the fashions’ names, so the outcomes really present how succesful they’re. When pitted towards industry-leading fashions corresponding to OpenAI’s GPT-4o and Google’s Gemini 1.5 Professional, Grok-2 held its personal, putting third within the “General” class and tying with GPT-4o, as seen beneath.

Woah, one other thrilling replace from Chatbot Area❤️‍🔥
The outcomes for @xAI’s sus-column-r (Grok 2 early model) at the moment are public**!
With over 12,000 group votes, sus-column-r has secured the #3 spot on the general leaderboard, even matching GPT-4o! It excels in Coding (#2),… https://t.co/gqSWSwYN0z pic.twitter.com/j9UYDBYNt4
— lmsys.org (@lmsysorg) August 14, 2024

In case you, like myself, visited the Chatbot Area leaderboard and had been stunned to not see the identical outcomes, the LMSYS disclosed that it posts early outcomes on Twitter (X), with “The official replace for Grok 2 coming quickly..!”

Another noteworthy Chatbot Area outcomes embrace Grok-2’s proficiency within the math and coding classes, through which it positioned second in each, and Onerous Prompts, through which it positioned fourth. If you wish to check it within the Area, go to the web site, click on Area side-by-side, and enter a pattern immediate.

- Advertisement -

The corporate additionally evaluated Grok-2’s efficiency on fashionable LLM efficiency benchmarks, together with the Large Multitask Language Understanding (MMLU) and MATH benchmarks. The outcomes had been higher than its predecessor, Grok 1.5, and aggressive with industry-leading fashions, together with GPT-4o, Claude 3 Opus, Llama 3, and extra.

Past its superior textual efficiency, Grok 2 permits customers to generate high-quality pictures by way of a collaboration with Black Forest Labs’s FLUX.1 image-generating mannequin.

Regardless of many picture mills in the marketplace having strict restrictions towards creating pictures involving public figures corresponding to celebrities and politicians, Grok-2 doesn’t, as many beta testers have already gone wild on the platform, producing pictures of politicians in provocative conditions. Under, I’m together with one of many much less provocative generations.

The pictures rendered are high-quality and practical, but there appears to be no disclosure on the platform that makes it clear that a picture was generated, one other strategy many social media platforms take to maintain person security.

Grok-2 and Grok-2 mini are being rolled out in beta on X to X Premium and Premium+ customers. These premium X plans are $8 and $16 per thirty days, respectively, and embrace different perks corresponding to a blue checkmark, restricted or no adverts, reply prioritization, ID verification, and extra. Each fashions shall be launched to builders by way of a brand new enterprise API platform later this month.

Grok gets an impressive upgrade – and unchecked AI image generation apparently

Related

9 hacks for a better nightly build

How Cerebras is breaking the GPU bottleneck on AI...

Microsoft Copilot’s Wave 2 is here. Everything you need...

AI and bots allegedly used to fraudulently boost music...

When your cloud strategy is ‘it depends’

Leave a Reply Cancel reply