OpenAI's Voice Selection Process: A Deep Dive into ChatGPT's New Sonic Identity
Quick Summary
- OpenAI's 'Sky' voice caused a big problem because people thought it sounded like Scarlett Johansson.
- The company says they spent 5 months and talked to over 400 actors to find the right voices.
- Making AI sound good is now a big part of making AI products people can trust.
- They're working on ways to stop fake audio and video, like adding hidden codes to AI voices.
Did OpenAI create the 'Sky' voice from scratch, or did they try to make it sound like a famous actress? The fuss over ChatGPT's new voice has put OpenAI's promises about being ethical under a spotlight. It shows how important a voice can be for AI products and how people feel about them. I've looked into what OpenAI says and how people reacted to figure out what's really going on.
Table of Contents
- How OpenAI Chose Voices: What They Say vs. What People Think
- The 'Sky' Problem: A PR Mess
- The 5-Month Search for Voices: OpenAI's Ethical Plan
- How the Voice Tech Works
- Safety and Risks: The Future of AI Voice Copying
- Why 'Sounding Good' Matters for GPT-4o
- The Bottom Line: A Needed Break in the AI Voice Race
- Looking Closer: How OpenAI Picked Its Voices
- What People Are Saying: How the Public Feels
- My Final Thoughts: A Big Lesson in AI Ethics and Trust
How OpenAI Chose Voices: What They Say vs. What People Think
OpenAI recently released GPT-4o with new, more natural-sounding voices. But one voice, called 'Sky,' quickly caused a stir. The company insists they followed a careful and ethical process to choose their voices. However, most people immediately thought it sounded just like Scarlett Johansson. This article will examine OpenAI's claims and the public's reaction, looking at the technology behind the voices and why they matter so much.
Watch the Video Summary
The 'Sky' Problem: A PR Mess
The moment OpenAI showed off the new voices for GPT-4o, the internet went wild. Many users said the 'Sky' voice sounded exactly like Scarlett Johansson, causing a huge public relations headache. OpenAI's CEO, Sam Altman, quickly posted online on May 20, 2024, saying: 'The voice of Sky is not Scarlett Johansson's, and we never intended it to sound like her. We hired the voice actor for Sky's voice before we ever talked to Ms. Johansson. Out of respect for Ms. Johansson, we've stopped using Sky's voice in our products. We apologize to Ms. Johansson for not communicating this better.' (OpenAI Blog, May 2024)
I've looked at the timeline OpenAI shared, and it seems like their process started before they even talked to the celebrity. OpenAI says that on September 11, 2023, Sam Altman talked to Ms. Johansson and her team about her possibly being a voice actor. She said no a week later, on September 18, 2023. Importantly, the 'Sky' voice, along with four others, was released on ChatGPT on September 25, 2023, *after* Johansson had already declined. But the controversy flared up again when OpenAI contacted Ms. Johansson's team again on May 10, 2024, asking her to consider being a voice for the GPT-4o launch. After her team raised concerns again, OpenAI stopped using 'Sky' on May 19, 2024 (OpenAI Blog, May 2024). This whole situation shows how important a voice can be for an AI and the tricky balance between making it sound good and not accidentally copying a famous person.
The 5-Month Search for Voices: OpenAI's Ethical Plan
OpenAI claims they didn't rush to pick voices. Instead, they say it was a careful, five-month process (from May to September 2023) where they worked with professional voice actors, talent agencies, and casting directors (OpenAI Blog, May 2024). From what I can tell from their official statement, they really tried to set a good example for how AI voices should be chosen ethically.
They worked with casting directors who have won awards to decide what they were looking for in the voices. They wanted voices that were 'timeless,' 'approachable,' 'inspire trust,' and came from 'diverse backgrounds' (OpenAI Blog, May 2024). In May 2023, they asked for actors, and they got over 400 auditions from voice and screen actors. They narrowed this down to a list of 14, and finally picked the five voices: Breeze, Cove, Ember, Juniper, and Sky. OpenAI also points out that each actor gets paid 'much more than the usual top rates' and continues to get paid as long as their voices are used. This is a great way to support actors and sets a good ethical standard for the growing AI voice industry.
How the Voice Tech Works
Putting the controversy aside, it's important to understand the technology behind these voices. OpenAI uses text-to-speech (TTS) models, like gpt-4o-mini-tts, tts-1, and tts-1-hd, to create these voices (OpenAI Docs). These models are designed to turn written text into realistic-sounding audio, understanding the different ways people speak, their accents, and how they say things.
If you want to create your own unique AI voice, OpenAI has a process for that. You need to provide two specific audio recordings: a 15-second sample of the voice you want and a recording where the voice actor gives permission for OpenAI to create a synthetic voice model (OpenAI Docs). To get high-quality custom voices, you need to record in a quiet place with no echo and use a professional XLR microphone (OpenAI Docs). This shows how advanced the technology is to copy human speech so well.
Safety and Risks: The Future of AI Voice Copying
Being able to create very realistic voices comes with big risks, especially when it comes to faking someone's voice or creating deepfakes. OpenAI knows this well, and their safety research shows the steps they're taking to reduce these dangers (OpenAI Blog, Voice Engine). I've seen that they are working on adding hidden codes to AI-generated audio and actively watching for misuse.
Also, OpenAI has strict rules for using custom voices. They clearly forbid impersonating someone without their permission and require users to say when a voice is AI-generated (OpenAI Blog, Voice Engine). This is a really important step to keep things honest and build trust. Looking ahead, OpenAI even plans to stop using voice for security checks because they know how risky advanced voice copying can be. This proactive approach is key to dealing with the bigger problem of deepfakes, something we talked about more when comparing Pindrop's Deepfake Warranty vs. Google & OpenAI.
Why 'Sounding Good' Matters for GPT-4o
Choosing a voice isn't just about avoiding trouble; it's a key part of OpenAI's plan for its products, especially with the new GPT-4o. This new model has amazing voice abilities, letting it handle interruptions, join group chats, block out background noise, and change its tone (OpenAI Blog, May 2024). The goal is to make talking with AI feel truly 'natural,' so it's more like talking to a person and less like talking to a robot.
The personality of the voice itself – whether it sounds 'warm, engaging, confidence-inspiring,' or 'approachable' – directly affects how much people trust and interact with it (OpenAI Blog, May 2024). This focus on 'sonic identity' is a big part of the trend we saw in Beyond the 'Sky' Controversy: Unpacking OpenAI's Strategic Voice Selection and AI Agent Future, where AI assistants become more like personal companions. OpenAI gets this, and they plan to add more voices later to suit different user preferences, making the AI experience even more personal.
The Bottom Line: A Needed Break in the AI Voice Race
The 'Sky' issue made OpenAI pause and rethink things, showing just how tricky it is to create AI personalities ethically. While OpenAI's technical explanation says 'Sky' was 'not an imitation' of Scarlett Johansson (OpenAI Blog, May 2024), people clearly saw it differently. The decision to stop using the 'Sky' voice 'out of respect' for Ms. Johansson (OpenAI Blog, May 2024) really shows how much public opinion matters.
Here's my view: OpenAI has explained a solid and ethical process for choosing voices and creating custom ones. But managing how the public sees things, especially when it looks like a celebrity, is still a big challenge for all AI companies. It's a reminder that in the race to create advanced AI, we can't forget the human side and what people expect.
Looking Closer: How OpenAI Picked Its Voices
To really understand the effort OpenAI claims to have put into choosing its voices, I've gathered some key numbers from their official statements. This isn't just about finding a nice voice; it's a structured process with several steps.
| What We're Measuring | OpenAI's Process | Typical Process (Estimate) |
|---|---|---|
| How long it took | 5 months | 1-3 months |
| How many auditions they got | Over 400 | 50-200 |
| How many final voices were chosen | 5 | 3-5 |
| How actors are paid | Better than top rates + ongoing pay | Standard rates |
As you can see, OpenAI's process seems more thorough and pays better than what's usually done. The large number of auditions and the longer time suggest they were serious about finding the 'right' voices, not just any voices.
What People Are Saying: How the Public Feels
While this article didn't look at specific online comments, the public's reaction to the 'Sky' voice itself is a strong sign of how people feel. The fact that everyone immediately compared it to Scarlett Johansson's voice shows a big challenge for AI companies: the difference between what the tech can do and how people perceive it.
People quickly pointed out the similarities, leading to lots of talk on social media. This wasn't just about a voice; it was about being real, owning creative work, and the ethical lines of AI copying. The fact that OpenAI felt they had to stop using the voice 'out of respect' for Ms. Johansson, even though they said it was an original creation, shows how much public opinion matters in deciding how AI products are made.
My Final Thoughts: A Big Lesson in AI Ethics and Trust
OpenAI's detailed explanation of how they chose voices, including their promise to find them ethically and pay actors well, shows a company trying to develop AI responsibly. However, the 'Sky' controversy clearly proves that just following ethical rules isn't always enough to convince people. The way it sounded like Scarlett Johansson, no matter what OpenAI intended or when they did it, created a big gap in trust.
For anyone interested in AI ethics, tech, or product development, this is a really important case study. It shows how hard it is to create engaging AI personalities while dealing with celebrity rights, public perception, and our natural tendency to find patterns and similarities. OpenAI's decision to pause 'Sky' was necessary, not just to be respectful, but as a smart move acknowledging that in the world of AI, how something sounds is as important as how it looks, and we always need to be careful about the ethical side.
Frequently Asked Questions
Even though OpenAI says the voice was recorded by a different actor, laws about a person's right to control their public image suggest that using a voice that sounds 'too similar' to a famous person could lead to legal problems, even if it's not a direct copy.
OpenAI says they pay actors 'better than the usual top rates' and give them ongoing payments for as long as the AI voice is used in their products. They aim to set a higher ethical standard for the industry.
OpenAI has admitted there are risks and is actually planning to stop using voice for security checks. They realize that high-quality voice copying technology makes it less reliable.
