Baidu restricts Google and Bing from scraping content for AI training

Chinese language web search supplier Baidu has up to date its Wikipedia-like Baike service to forestall Google and Microsoft Bing from scraping its content material.

This alteration was noticed within the newest replace to the Baidu Baike robots.txt file, which denies entry to Googlebot and Bingbot crawlers.

In keeping with the Wayback Machine, the change came about on August 8. Beforehand, Google and Bing search engines like google have been allowed to index Baidu Baike’s central repository, which incorporates virtually 30 million entries, though some goal subdomains on the web site have been restricted.

- Advertisement -

This motion by Baidu comes amid growing demand for giant datasets utilized in coaching synthetic intelligence fashions and purposes. It follows comparable strikes by different corporations to guard their on-line content material. In July, Reddit blocked numerous search engines like google, besides Google, from indexing its posts and discussions. Google, like Reddit, has a monetary settlement with Reddit for information entry to coach its AI companies.

In keeping with sources, prior to now yr, Microsoft thought-about proscribing entry to internet-search information for rival search engine operators; this was most related for individuals who used the info for chatbots and generative AI companies.

In the meantime, the Chinese language Wikipedia, with its 1.43 million entries, stays obtainable to look engine crawlers. A survey performed by the South China Morning Publish discovered that entries from Baidu Baike nonetheless seem on each Bing and Google searches. Maybe the major search engines proceed to make use of older cached content material.

Such a transfer is rising in opposition to the background the place builders of generative AI all over the world are more and more working with content material publishers in a bid to entry the highest-quality content material for his or her initiatives. As an illustration, comparatively just lately, OpenAI signed an settlement with Time journal to entry the complete archive, courting again to the very first day of the journal’s publication over a century in the past. An analogous partnership was inked with the Monetary Occasions in April.

- Advertisement -

Baidu’s determination to limit entry to its Baidu Baike content material for main search engines like google highlights the rising significance of information within the AI period. As corporations make investments closely in AI improvement, the worth of enormous, curated datasets has considerably elevated. This has led to a shift in how on-line platforms handle entry to their content material, with many selecting to restrict or monetise entry to their information.

Because the AI trade continues to evolve, it’s probably that extra corporations will reassess their data-sharing insurance policies, doubtlessly resulting in additional modifications in how info is listed and accessed throughout the web.

(Photograph by Kelli McClintock)

See additionally: Google advances cellular AI in Pixel 9 smartphones

Wish to study extra about AI and large information from trade leaders? Take a look at AI & Large Knowledge Expo happening in Amsterdam, California, and London. The excellent occasion is co-located with different main occasions together with Clever Automation Convention, BlockX, Digital Transformation Week, and Cyber Safety & Cloud Expo.

Discover different upcoming enterprise know-how occasions and webinars powered by TechForge right here.

Baidu restricts Google and Bing from scraping content for AI training

Related

9 hacks for a better nightly build

How Cerebras is breaking the GPU bottleneck on AI...

Microsoft Copilot’s Wave 2 is here. Everything you need...

AI and bots allegedly used to fraudulently boost music...

When your cloud strategy is ‘it depends’

Leave a Reply Cancel reply