I’m focusing on the intersection of technology, media, and democracy. Subscribe by email to get every update.
By now, you’ve been exposed to Generative AI and Large Language Models (LLMs) like OpenAI’s ChatGPT, DALL-E 2, and GPT-4. It seems a lot like magic: a bot that seems to speak like a human being, provides confident-sounding answers, and can even write poetry if you ask it to. As an advance, it’s been compared in significance to the advent of the web: a complete paradigm shift of the kind that comes along very rarely.
I want to examine their development through the lens of a non-profit newsroom: specifically, I’d like to consider how newsrooms might think about LLMs like ChatGPT, both as a topic at the center of reporting, as well as a technology that presents dangers and opportunities internally.
I’ve picked newsrooms because that’s the area I’m particularly interested in, but also because they’re a useful analogue: technology-dependent organizations that need to move quickly but haven’t always turned technology into a first-class competence. In other words, if you’re not a member of a non-profit newsroom, you might still find this discussion useful.
What are generative AI and Large Language Models?
Generative AI is just an umbrella term for algorithms that have the ability to create new content. The ones receiving attention right now are mostly Large Language Models: probability engines that are trained to predict the next word in a sentence based on a very large corpus of written information that has often been scraped from the web.
That’s important to understand because when we think of artificial intelligence, we often think of android characters from science fiction movies: HAL 9000, perhaps, or the Terminator. Those stories have trained us to believe that artificial intelligence can reason like a human. But LLMs are much more like someone put the autocomplete function on your phone on steroids. Although their probabilistic models generate plausible answers that often look like real intelligence, the algorithms have no understanding of what they’re saying and are incapable of reasoning. Just as autocomplete on your phone sometimes gets it amazingly wrong, LLM agents will sometimes reply with information that sounds right but is entirely fictional. For example, the Guardian recently discovered that ChatGPT makes up entire news articles.
It’s also worth understanding because of the provenance of the datasets behind those models. My website — which at the time of writing does not license its content to be re-used — is among the sites scraped to join the corpus; if you have a site, it may well be too. There’s some informed conjecture that these scraped sites are joined by pirated books and more. Because LLMs make probabilistic decisions based on these corpuses, in many ways their apparent intelligence could be said to be derived from this unlicensed material. There’s no guarantee that an LLM’s outputs won’t contain sections that are directly identifiable as copyrighted material.
This data has often been labeled and processed by low-paid workers in emerging nations. For example, African content moderators just voted to unionize in Nairobi.
Finally, existing biases that are prevalent in the corpus will be reiterated by the agent. In a world where people of color are disproportionately targeted by police, it’s dangerous to use an advanced form of autocomplete to determine who might be guilty of a crime — particularly as a software agent might be more likely to be incorrectly assumed to be impartial. As any science fiction fan will tell you, robots are supposed to be logical entities who are free from bias; in reality they’re only as unbiased as their underlying data and algorithms.
In other words, content produced by generative AI may look great but is likely to be deeply, sometimes dangerously flawed.
Practically, the way one interacts with them is different to most software systems: whereas a standard system might have a user interface with defined controls, a command line argument structure, or an API, you interact with an LLM agent through a natural language prompt. Prompt engineering is an emergent field.
Should I use LLMs to generate content?
At the beginning of this year, it emerged that CNET had been using generative AI to write whole articles. It was a disaster: riddled with factual errors and plodding, mediocre writing.
WIRED has published a transparent primer on how it will be using the technology.
From the text:
The current AI tools are prone to both errors and bias, and often produce dull, unoriginal writing. In addition, we think someone who writes for a living needs to constantly be thinking about the best way to express complex ideas in their own words. Finally, an AI tool may inadvertently plagiarize someone else’s words. If a writer uses it to create text for publication without a disclosure, we’ll treat that as tantamount to plagiarism.
For all the reasons stated above, using AI to generate articles from scratch, or to write passages inside a published article otherwise written by a human, is not likely to be a good idea.
The people who will use AI to generate articles won’t surprise you: spammers will be all over it as a way to cheaply generate clickbait content without having to hire writers. The web will be saturated with this kind of low-quality, machine-written content — which means that it will be incumbent on search engines like Google to filter it out. Well-written, informative, high-quality writing will rise to the top.
There’s another danger, too, for people who are tempted to use LLMs to power chat-based experiences, or to use them to process user-generated content. Because LLM agents use natural language prompts with little distinction between the prompt and the data the LLM is acting on, prompt injection attacks are becoming a serious risk.
And they’re hard to mitigate. As Simon Willison points out in the above link:
To date, I have not yet seen a robust defense against this vulnerability which is guaranteed to work 100% of the time. If you’ve found one, congratulations: you’ve made an impressive breakthrough in the field of LLM research and you will be widely celebrated for it when you share it with the world!
Finally, let’s not forget that unless you’re running an LLM on your own infrastructure, all your prompts and outputs are being saved on a centralized service where your data almost certainly will be used for further training the model. There is little to no expectation of privacy here (although some models are beginning to offer enterprise subscriptions that promise but don’t demonstrate data privacy).
Then what can I use LLMs for?
Just as autocomplete can be really useful even if you’d never use it to write a whole essay that you’d show to anyone else, LLMs have lots of internal uses. You can think of them as software helpers that add to your process and potentially speed you up, rather than a robot that will take your job tomorrow. Because they’re helping you build human-written content rather than you publishing their machine-written output, you’re not at risk of violating someone’s copyright or putting a falsehood out into the world unchecked. Prompt injection attacks are less hazardous, assuming you trust your team and don’t expose agents to unchecked user-generated content.
Some suggestions for how LLMs can be used in journalism include:
- Suggesting headlines
- Speeding up transformations between media (for example, articles to short explainers, or to scripts for a video)
- Automatic transcription from audio or video into readable notes (arguably the most prevalent existing use of AI in newsrooms)
- Extracting topics (that can then be linked to topic archive pages)
- Discovering references to funders that must be declared
- Suggesting ideas for further reporting
- Uncovering patterns in data provided by a source
- Community sentiment analysis
- Summarizing large documents
All of these processes can sit within a content management system or toolset as just another editing tool. They don’t do away with the journalist or editor: they simply provide another tool to help them to do their work. In many cases they can be built as CMS add-ons like WordPress plugins.
Hosting is another matter. When newsrooms receive sensitive leaks or information from sources, interrogating that data with a commercial, centrally-hosted LLM may not be advisable: doing so would reveal that sensitive data to the service provider. Instead, newsrooms likely to receive this kind of information would be better placed to run their own internal service on their own infrastructure. This is potentially expensive, but it also carries another advantage: advanced newsrooms may also be able to build and train their own corpus of training data rather than using more generic models.
Will LLMs be a part of the newsroom?
Of course — but beware of the hype machine. This kind of AI is a step forward in computing, but it is not a replacement for what we already use. Nor is it going to be the job-destroyer or civilization-changer some have predicted it to be (including VCs, who currently have a lot to lose if AI doesn’t live up to its frothily declared potential).
It’s another creative ingredient. A building block; an accelerator. It’s just as if — imagine that — autocomplete was put on steroids. That’s not nothing, but it’s not everything, either. There will be plenty of really interesting tools designed to help newsrooms do more with scant resources, but I confidently predict that human journalists and editors will still be at the center of it all, doing what they do best. They’ll be reporting, with a human eye — only faster.