Project Analyzing Human Language Usage Shuts Down Because ‘Generative AI Has Polluted the Data’

Ben Werdmuller

Project Analyzing Human Language Usage Shuts Down Because ‘Generative AI Has Polluted the Data’

"The creator of an open source project that scraped the internet to determine the ever-changing popularity of different words in human language usage says that they are sunsetting the project because generative AI spam has poisoned the internet to a level where the project no longer has any utility."

Robyn Speer, who created the project, went so far as to say that she doesn't think "anyone has reliable information about post-2021 language used by humans." That's a big statement about the state of the web. While spam was always present, it was easier to identify and silo; AI has rendered spam unfilterable.

She no longer wants to be part of the industry at all:

"“I don't want to work on anything that could be confused with generative AI, or that could benefit generative AI,” she wrote. “OpenAI and Google can collect their own damn data. I hope they have to pay a very high price for it, and I hope they're constantly cursing the mess that they made themselves.”"

It's a relatable sentiment.

#AI

[Link]

September 19, 2024 · Links · Share this post

I’m writing about the intersection of the internet, media, and society. Sign up to my newsletter to receive every post and a weekly digest of the most important stories from around the web.