Fighting bots is fighting humans

Ben Werdmuller

25 Jun 2024 — 1 min read

"I fear that media outlets and other websites, in attempting to "protect" their material from AI scrapers, will go too far in the anti-human direction."

I've been struggling with this.

I'm not in favor of the 404 Media approach, which is to stick an auth wall in front of your content, forcing everyone to register before they can load your article. That isn't a great experience for anyone, and I don't think it's sustainable for a publisher in the long run.

At the same time, I think it's fair to try and prevent some bot access at the moment. Adding AI agents to your robots.txt - although, as recent news has shown, perhaps not as effective a move as it might be - seems like the right call to me.

Clearly an AI agent isn't a human. For ad hoc queries - where an agent is retrieving content from a website in direct response to a user query - it clearly is acting on behalf of a human. Is it a browser, then? Maybe? If it is, we should just let it through.

It's accessing articles as training data that I really take issue with (as well as the subterfuge of not always advertising what it is when it accesses a site). In these cases, content is copied into a corpus in a manner that's outside of its licensing, without the author's knowledge. That sucks - not because I'm in favor of DRM, but because often the people whose work is being taken are living on a shoestring, and the software is run by very large corporations who will make a fortune.

But yes: I don't think auth walls, CAPTCHAs, paywalls, or any added friction between content and audience are a good idea. These things make the web worse for everybody.

Molly's post is in response to an original by Manu Moreale, which is also worth reading.

#AI