OpenAI Messed With the Wrong Mega-Popular Parenting Forum | EUROtoday

On Sep 16, 2024

Get real time updates directly on you device, subscribe now.

Think of any matter vaguely associated to elevating children conceivable, and there’s in all probability a publish about it on Mumsnet, the long-running, enormously widespread, controversy-spurring UK-based parenting discussion board for moms. Over its greater than two decade-long historical past, Mumsnet has amassed an archive of greater than six billion phrases written by its extremely engaged person base, on subjects comparable to soiled diapers and lazy husbands. (Not to say a bonkers rant about dolphins.)

This spring, after Mumsnet found that AI corporations have been scraping its information, the corporate says it determined to attempt to strike licensing offers with a number of the main gamers within the area, together with OpenAI, which initially expressed willingness to discover an association after Mumsnet first reached out. After talks with OpenAI fell aside, Mumsnet in July introduced its intention to pursue authorized motion.

According to Mumsnet, throughout these early conversations, an OpenAI strategic partnership lead advised the corporate that datasets over 1 billion phrases have been of curiosity to the AI big. Mumsnet’s management was excited. “We spent quite some time in a back-and-forth with them,” Mumsnet founder and CEO Justine Roberts tells WIRED. “We had to sign some NDAs, and they wanted a lot of information from us.”

However, over a month later, OpenAI advised Mumsnet that the corporate was now not concerned with partnering at the moment, in response to an e mail alternate reviewed by WIRED. When requested why, the OpenAI staffer characterised Mumsnet’s 6 billion phrase dataset as too small to warrant a licensing association, Roberts says. They additionally famous that OpenAI is primarily concerned with massive datasets that the general public can not already entry on-line, and that it needed datasets that captured broad human expertise.

This sentiment was echoed by the corporate when requested for remark from WIRED. “We pursue partnerships for large-scale datasets that reflect human society and do not pursue partnerships solely for publicly available information,” says OpenAI spokesperson Kayla Wood. “We support publisher and creator choice, offering them ways to express their preferences about how their sites and content work with AI in search results and training generative AI foundation models.”

Roberts says she was “irritated” by this improvement. She remembers that OpenAI at first had appeared particularly concerned with Mumsnet due to the platform’s closely female-written content material. “It’s very high-quality conversational data,” she says. “It’s 90 percent female conversation, which is quite unusual.”

OpenAI has struck a wide range of data-licensing offers with media retailers and platforms up to now yr, getting into into agreements with Vox Media, the AtlanticAxel Springer, Time, and WIRED mother or father firm Condé Nast, in addition to platforms crammed with user-generated content material like Reddit. (Automattic, the proprietor of WordPress.com and Tumblr, was additionally mentioned to be in licensing talks earlier this yr.) As the particulars of these offers haven’t been revealed, it’s not clear what the scale of their respective corpuses are.

When WIRED requested concerning the measurement of datasets it would think about for business licensing, OpenAI declined to share that info. But spokesperson Kayla Wood emphasizes that the corporate’s partnerships with publishers are “focused on displaying their content in our products and driving traffic to them.”

https://www.wired.com/story/mumsnet-openai-copyright-allegations/