What data was used to train OpenAI's Sora?
A transcript from an interview done by Joanna Stern from The Wall Street Journal and Mira Murati, the Chief Technology Officer at OpenAI.
Below is a transcript from an interview done by Joanna Stern from The Wall Street Journal and Mira Murati, the Chief Technology Officer at OpenAI.
WSJ’s Joanna Stern: Every time I watch a Sora clip, I wonder what videos did this AI model learn from? Did the model see any clips of Ferdinand to know what a bull in a China shop should look like? Was it a fan of SpongeBob?
"Wow... You look real good with a mustache Mr. Crabs".
- From SpongeBob SquarePants / Nickelodeon
By the way, my prompt for this crab said nothing about a mustache.
Joanna Stern: What data was used to train Sora?
Mira Murati: We used publicly available data and licensed data.
Joanna Stern: So, videos on YouTube?
Mira: I'm actually not sure about that.
Joanna Stern: Okay... Videos from Facebook, Instagram?
Mira Murati: You know... If they were publicly available um available yeah publicly available to use um there might be that data but um I'm I'm not sure I'm not confident about it.
Joanna Stern: What about shutter stock? I know you guys have a deal with them.
Mira Murati: I'm I'm just not going to go into the details of of the data that was that was used but it was publicly available or licensed data.
After the interview, Murati confirmed that the licensed data does include content from Shutterstock.
So, what do you think of this?
The question "What data was used to train Sora?" is a good reminder that our data out there is used by these big techs with or without permission. And Mira Murati's response just made it obvious.
Now's a great time to dive into the Terms and Conditions and Privacy Policy of the platform we're using, especially if it incorporates any AI features.
Here’s a thread of updates tracked by Steph Ango (CEO • Obsidian)