OpenAI Unveils Cutting-Edge AI Voice Generator—Access Limited for Now
OpenAI revealed its Voice Engine tool Friday, an AI generator that can use text and an audio sample to create “natural-sounding speech that closely resembles the original speaker,” but the company said it will not be publicly available yet due to the “potential for synthetic voice misuse.”
OpenAI said in a blog post that it first created the tool in 2022 and has been testing it with a small group of partners who have identified potential uses for it such as giving reading assistance, translating content, supporting non-verbal people and helping people with speech conditions recover their voices.
For Voice Engine to “create emotive and realistic voices”—which can also be done in languages other than the user’s own—it just needs a 15-second audio sample and text sample.
The company said people who have had access to the tool for testing have agreed to usage policies that ban the use of impersonating a person or organization without consent and gaining “explicit and informed consent” from original speakers.
OpenAI also said it is incorporating feedback from the government and others as it builds the product, in part because the company understands “generating speech that resembles people's voices has serious risks” and said those risks are “top of mind in an election year.”
The formal announcement of the voice generating tool follows a similar rollout of Sora—an OpenAI tool that allows people to create video from text—which was publicly previewed and announced last month, but only became available to a small group, not the general public.
This remains unclear, if it happens at all. The OpenAI blog post said the company will make “a more informed decision about whether and how to deploy this technology at scale” after more tests and conversations about the “responsible deployment of synthetic voices.”
“We hope this preview of Voice Engine both underscores its potential and also motivates the need to bolster societal resilience against the challenges brought by ever more convincing generative models,” OpenAI said in the blog post Friday.
Other companies, ranging from smaller groups like ElevenLabs to tech giants like Google, have released tools that can mimic voices and are working on text-to-voice AI services, but concerns around the usage of the tools have been high in recent months. Earlier this year in New Hampshire, robocalls mimicking President Joe Biden encouraging people not to vote in the state’s Democratic primary were circulating, and were eventually traced to a New Orleans-based magician who said he was hired by a Democratic political consultant to create the AI-generated calls. After the calls, the Federal Communications Commission unanimously adopted a ruling that effectively made the use of AI-generated voices illegal.
Comments
Post a Comment