Skip to main content
 

Who makes money when AI reads the internet for us?

"Local news publishers, [VP Platforms at The Boston Globe] Karolian told Engadget, almost entirely depend on selling ads and subscriptions to readers who visit their websites to survive. “When tech platforms come along and disintermediate that experience without any regard for the impact it could have, it is deeply disappointing.”"

There's an interesting point that Josh Miller makes here about how the way the web gets monetized needs to change. Sure, but that's a lot like the people who say that open source funding will be solved by universal basic income: perhaps, at some future date, but that doesn't solve the immediate problem.

Do browser vendors have a responsibility to be good stewards for publishers? I don't know about that in itself. I'm okay with them freely innovating - but they also need to respect the rights of the content they're innovating with.

Micropayments emphatically don't work, but I do wonder if there's a way forward here (alongside other ways) where AI summarizers pay for access to the articles they're consuming as references, or otherwise participate in their business models somehow.

[Link]

· Links · Share this post

 

FCC Makes AI-Generated Voices in Robocalls Illegal

"The FCC announced the unanimous adoption of a Declaratory Ruling that recognizes calls made with AI-generated voices are "artificial" under the Telephone Consumer Protection Act (TCPA)."

A sign of the times that the FCC had to rule that making an artificial intelligence clone of a voice was illegal. I'm curious to understand if this affects commercial services that intentionally use AI to make calls on a user's behalf (eg to book a restaurant or perform some other service).

[Link]

· Links · Share this post

 

Apple releases 'MGIE', a revolutionary AI model for instruction-based image editing

"Computer - enhance!"

I like the approach in this release from Apple: an open source AI model that can edit images based on natural language instructions. In other words, a human can tell the engine what to do to an image, and it goes and does it.

Rather than eliminating the human creativity in the equation, it gives the person doing the photo editing superpowers: instead of needing to know how to use a particular application to do the editing, they can simply give the machine instructions. I feel much more comfortable with the balance of power here than with most AI applications.

Obviously, it has implications for vendors like Adobe, which have established some degree of lock-in by forcing users to learn their tools and interfaces. If this kind of user interface takes off - and, given new kinds of devices like Apple Vision Pro, it inevitably will - they'll have to compete on capabilities alone. I'm okay with that.

[Link]

· Links · Share this post

 

Zuckerberg's Going to Use Your Instagram Photos to Train His AI Machines

During Meta's earnings call, Mark Zuckerberg said that Facebook and Instagram data is used to train the company's AI models.

“On Facebook and Instagram, there are hundreds of billions of publicly shared images and tens of billions of public videos, which we estimate is greater than the Common Crawl dataset and people share large numbers of public text posts in comments across our services as well.”

He's playing to win: one unstated competitive advantage is that Meta actually has the legal right to use training data generated on its own services. It's probably not something most users are aware of, but by posting content there, they grant the company rights to use it. If OpenAI falls afoul of copyright law, Meta's tech has a path forward.

It's a jarring thought, though. I'm certainly not keen on a generative model being trained on my son's face, for example. I'm curious how many users will feel the same way.

[Link]

· Links · Share this post

 

OpenAI says there’s only a small chance ChatGPT will help create bioweapons

"OpenAI’s GPT-4 only gave people a slight advantage over the regular internet when it came to researching bioweapons, according to a study the company conducted itself." Uh, great?

"On top of that, the students who used GPT-4 were nearly as proficient as the expert group on some of the tasks. The researchers also noticed that GPT-4 brought the student cohort’s answers up to the “expert’s baseline” for two of the tasks in particular: magnification and formulation." Um, splendid?

"However, the study’s authors later state in a footnote that, overall, GPT-4 gave all participants a “statistically significant” advantage in total accuracy." Ah, superb?

[Link]

· Links · Share this post

 

Anti-scale: a response to AI in journalism

"It should be obvious that any technology prone to making up facts is a bad fit for journalism, but the Associated Press, the American Journalism Project, and Axel Springer have all inked partnerships with OpenAI."

The conversation about AI at the Online News Association conference last year was so jarring to me that I was angry about it for a month. As Tyler Fisher says here, it presents existential risk to the news industry - and beyond that, following a FOMO-driven hype cycle rather than building things based on what your community actually needs is a recipe for failure.

As Tyler says: "Instead of trying to compete, journalism must reject the scale-driven paradigm in favor of deeper connection and community." This is the only real path forward for journalism. Honestly, it's the only real path forward for the web, and for a great many industries that live on it.

[Link]

· Links · Share this post

 

Following lawsuit, rep admits “AI” George Carlin was human-written

Simon Willison called this, and it makes sense: the George Carlin AI special was human-written, because that's the only way it could possibly have happened.

It's a parlor trick; a bit. It's also a kind of advertising for AI: even as you're horrified at the idea of creating a kind of resurrected George Carlin against his will, you've accepted that idea that it was technically possibly. It isn't.

Unfortunately for the folks behind the special, it's still harmful to Carlin's legacy, and putting his name on it in order to gain attention is still a problem. We'll see how the lawsuit shakes out.

[Link]

· Links · Share this post

 

We Need Your Email Address

"In order to combat the fracturing of social media platforms, a Google discoverability crisis fueled by AI generated spam and AI-fueled SEO, and a media business environment that is in utter freefall, we need to be able to reach our readers directly using a platform that we own and control."

For every publisher right now, email seems to be the only option. This is the first time I've seen this argument about AI scraping: usually the need to own your own relationship comes down to avoiding the thrash of different social media business models, which I've written about plenty of times before.

This idea that putting your content out there for free will only lead to it being rewritten by AI and repurposed by spam blogs could be the death of the open web. This is particularly true in light of Google's apparent refusal to downgrade machine-written content.

The idea is simple and awful: these spam sites rewrite human-written articles in an effort to capture search engine clicks themselves, instead of the people they stole from. They run ads against this spam. Because it's all machine-written, they can do it at scale.

Even if you don't agree that the web needs to be intrinsically protected (hi, we're enemies now), it seems obvious to me that incentives should be aligned towards publishing unique, useful information rather than superficially grabbing clicks through AI-driven SEO spam. I don't know what's going on inside the search engine businesses, but they need to consider what's going to be good for their businesses in the long term. This isn't it.

[Link]

· Links · Share this post

 

How Beloved Indie Blog 'The Hairpin' Turned Into an AI Clickbait Farm | WIRED

"In 2018, the indie women’s website The Hairpin stopped publishing, along with its sister site The Awl. This year, The Hairpin has been Frankensteined back into existence and stuffed with slapdash AI-generated articles designed to attract search engine traffic."

This is one of the worst kinds of AI-generated spam: a real, much-missed website has been purchased and spun into an LLM fever dream. It's now just a part of a Serbian DJ's thousands-deep portfolio of spam sites.

But the point made in the article about succession planning is really important. Media properties should be thoughtful about what happens to their domains once they've outlived their usefulness - even if the owner has shuttered completely. Otherwise anyone can scoop up the domain and abuse the goodwill built by its former owner for any purpose they like.

This is particularly true for journalism publishers. I recommend that they never let their domains expire for this reason, even if they've fully fallen out of use. You never know who might pick them up and abuse the trust of their community.

[Link]

· Links · Share this post

 

Fake Joe Biden robocall tells New Hampshire Democrats not to vote on Tuesday

A robocall used a deepfake of Joe Biden's voice to encourage New Hampshire voters to stay home. "It's important that you save your vote for the November election."

It's not a perfect deepfake, but it doesn't necessarily need to be - for call recipients who don't understand what's happening, it has the potential to be enough to move the needle.

It's not clear that this is the first time that this has happened, but it certainly won't be the last. It's also not clear how this might be prevented except to block robocalls entirely (and even then, one can imagine using a live agent with a deepfaked voice, so that every call would be different).

[Link]

· Links · Share this post

 

On being listed in the court document of artists whose work was used to train Midjourney with 4,000 of my closest friends

"They just take it. Whatever they want." A poignant and infuriating reflection on generative AI, from the creator of Cat and Girl.

[Link]

· Links · Share this post

 

Generated content is an invasive species in the online ecosystem

I like this argument that generated content is an invasive species in our content ecosystem.

"As generated material rapaciously populates the Internet, human-created artworks will be outcompeted by generated graphics on social media platforms by virtue of volume."

I agree that this is something to be concerned with, and the paragraph about legal rights and obligations is also spot on.

[Link]

· Links · Share this post

 

Things are about to get a lot worse for Generative AI

There are some jaw-dropping infringements here, including an image where DALL-E apparently copies the entire Pixar universe from the single two-word prompt, "animated toys".

It's impossible to hand-wave this away. Even if you don't think the New York Times case has merit, it's pretty obvious that generative AI can infringe copyright even when you don't ask it to, and without notifying the user. As noted in the references, it's a big ask to then push liability for infringement to the user. It's inherent to the engines.

As the author notes: "My guess is that none of this can easily be fixed." Indeed.

[Link]

· Links · Share this post

 

The New York Times sues OpenAI and Microsoft for copyright infringement

OpenAI feels a bit like Napster: a proof of concept that shows the power of a particular experience while trampling over the licensing agreements that would have been needed to make the whole thing legal.

The Napster user experience eventually led to our streaming music present: you can draw a line from it directly to Spotify and Apple Music. I expect we'll see the same thing in AI. We know what's possible, a lot of people are excited about it, but it'll take someone else to put the legal agreements in place to actually make it work. (If I had to guess, that company starts with an "A", but it could be a newcomer.)

Once again, the argument that training an LLM is no different to someone reading the same material falls short. Unlike OpenAI, I have to pay for the content I read, and like OpenAI, if I start spewing out large portions of New York Times stories under my byline, I'll end up in court.

I don't know whether OpenAI itself will last. But I am certain we'll see powerful LLMs offered as a service in the future, underpinned by real content licensing agreements for their training data.

[Link]

· Links · Share this post

 

Artificial intelligence can find your location, alarming privacy experts

That an AI model trained on Google Street View photos can look at a picture and figure out where it is isn't much of a surprise, but it's still jarring to see that it's here.

I think the real lesson is that AI undermines security through obscurity, which any security professional will tell you is not a sound approach. It's not enough to assume that information is hidden enough to not be usable; if you want to remain private, you need to actually secure your information.

This has obvious implications for pictures of vulnerable people (children, for example) on social media. But, of course, you can extrapolate: public social media posts could probably be analyzed for identifying details too, regardless of the medium. All of it could be used for identity theft or to cause other harm.

A human probably isn't going to painstakingly go through your posts to figure out information about you. But if it can be done in one click with a software agent, suddenly we're playing a whole other ball game.

[Link]

· Links · Share this post

 

The AI trust crisis

I think this is right: AI companies, and particularly OpenAI, have a crisis of trust with the public. We simply don't believe a word they say when it comes to privacy and respecting our rights.

It's well-earned. The way LLMs work is through training on vast amounts of scraped data, some of which would ordinarily be commercially licensed. And the stories AI vendors have been peddling about the dangers of an AI future - while great marketing - have hardly endeared them to us. Not to mention the whole Sam Altman board kerfuffle.

I think Simon's conclusion is also right: local models are the way to overcome this, at least in part. Running an AI engine on your own hardware is far more trustworthy than someone else's service. The issues with training data and bias remain, but at least you don't have to worry about whether your interactions with it are being leaked.

[Link]

· Links · Share this post

 

The Internet Enabled Mass Surveillance. AI Will Enable Mass Spying.

Bruce Schneier on the inevitable application of AI to mass surveillance:

"Knowing that they are under constant surveillance changes how people behave. They conform. They self-censor, with the chilling effects that brings. Surveillance facilitates social control, and spying will only make this worse. Governments around the world already use mass surveillance; they will engage in mass spying as well."

I find this argument that AI can enable mass summarization and classification, and therefore more effective use of surveillance data at scale, very compelling. If governments can do something, as a general rule, they will. And this feels like something that is definitely coming down the pipe.

[Link]

· Links · Share this post

 

ChatGPT Can Reveal Personal Information From Real People, Google Researchers Show

Here we go: proof that it's possible to extract real training data from LLMs. Unfortunately, some of this data includes personally identifiable information of real people (PII).

“In total, 16.9% of generations we tested contained memorized PII [Personally Identifying Information], and 85.8% of generations that contained potential PII were actual PII.”

“[...] OpenAI has said that a hundred million people use ChatGPT weekly. And so probably over a billion people-hours have interacted with the model. And, as far as we can tell, no one has ever noticed that ChatGPT emits training data with such high frequency until this paper. So it’s worrying that language models can have latent vulnerabilities like this.”

[Link]

· Links · Share this post

 

The legal framework for AI is being built in real time, and a ruling in the Sarah Silverman case should give publishers pause

"Silverman et al. have two weeks to attempt to refile most of the dismissed claims with any explicit evidence they have of LLM outputs “substantially similar” to The Bedwetter. But that’s a much higher bar than simply noting its inclusion in Books3."

This case looks like it's on shaky ground: it may not be enough to prove that AI models were trained on pirated material (the aforementioned Books3 collection of pirated titles). Plaintiffs will need to show that the models produce output that infringes those copyrights.

[Link]

· Links · Share this post

 

"We pulled off an SEO heist that stole 3.6M total traffic from a competitor."

"We pulled off an SEO heist that stole 3.6M total traffic from a competitor. Here's how we did it."

What this single spammer pulled off - 1800 articles written by technology in order to scrape traffic from a competitor's legitimate site - is what AI will do to the web at scale.

Yes, it's immoral. Yes, it's creepy. But there are also hundreds if not thousands of marketers looking at this thread and thinking, "ooh, we could do that too".

The question then becomes: how can we, as readers, avoid this automated nonsense? And how can search engines systemically discourage (or punish) it?

[Link]

· Links · Share this post

 

Give OpenAI's Board Some Time. The Future of AI Could Hinge on It

Written before the news broke about Sam Altman moving to Microsoft, this remains a nuanced, intelligent take.

"My understanding is that some members of the board genuinely felt Altman was dishonest and unreliable in his communications with them, sources tell me. Some members of the board believe that they couldn’t oversee the company because they couldn’t believe what Altman was saying."

I think a lot of people have been quick to judge the board's actions as stupid this weekend, but we still don't know what the driving factors were. There's no doubt that their PR was bad and the way they carried out their actions were unstrategic. But there was something more at play.

[Link]

· Links · Share this post

 

Is My Toddler a Stochastic Parrot?

A beautifully written and executed visual essay about AI, parenting, what it means to be intelligent, and the fundamental essence of being human.

[Link]

· Links · Share this post

 

The average AI criticism has gotten lazy, and that's dangerous

This is a good critique of some of the less analytical AI criticism, some of which I've undoubtedly been guilty of myself.

"The fork in the road is this: we can dismiss “AI.” We can call it useless, we can dismiss its output as nonsense, we can continue murmuring all the catechisms of the least informed critique of the technology. While we do that, we risk allowing OpenAI to make Microsoft, AT&T and Standard Oil look like lemonade stands."

The point is not that AI as a technology is a genie that needs to be put back into the bottle. It can't be. The point is that it can be made more ethically, equity can be more distributed, and we can mitigate the societal harms that will absolutely be committed at the hands of people using existing models.

[Link]

· Links · Share this post

 

AI outperforms conventional weather forecasting for the first time: Google study

This feels like a good use for AI: taking in more data points, understanding their interactions, and producing far more accurate weather forecasts.

We're already used to some amount of unreliability in weather forecasts, so when the model gets it wrong - as this did with the intensification of Hurricane Otis - we're already somewhat prepared.

Once the model is sophisticated enough to truly model global weather, I'm curious about outcomes for climate science, too.

[Link]

· Links · Share this post

 

A Coder Considers the Waning Days of the Craft

I feel this myself, but I don't think it means that coding is going away, exactly. Some kinds of coding are less manual, in the same way we don't write in assembler anymore. But there will always be a place for code.

Lately I've been feeling like AI replaces software libraries more than it replaces mainline code. In the old days, if you needed a function, you would find a library that did it for you. Now you might ask AI to write the function - and it's likely a better fit than a library would have been.

I don't know what this means for code improvements over time. People tend libraries; they upgrade their code. AI doesn't make similar improvements - or at least, it's not clear that it does. And it's not obvious to me that AI can keep improving if more and more code out in the world is already AI-generated. Does the way we code stagnate?

Anyway, the other day I asked ChatGPT to break down how a function worked in a language I don't code in, and it was incredibly useful. There's no doubt in my mind that it speeds us up at the very least. And maybe manual coding will be relegated to building blocks and fundamentals.

[Link]

· Links · Share this post