During Meta's earnings call, Mark Zuckerberg said that Facebook and Instagram data is used to train the company's AI models.
“On Facebook and Instagram, there are hundreds of billions of publicly shared images and tens of billions of public videos, which we estimate is greater than the Common Crawl dataset and people share large numbers of public text posts in comments across our services as well.”
He's playing to win: one unstated competitive advantage is that Meta actually has the legal right to use training data generated on its own services. It's probably not something most users are aware of, but by posting content there, they grant the company rights to use it. If OpenAI falls afoul of copyright law, Meta's tech has a path forward.
It's a jarring thought, though. I'm certainly not keen on a generative model being trained on my son's face, for example. I'm curious how many users will feel the same way. #AI