ChatGPT maker OpenAI said on Monday it would release a new AI model called GPT-4o, capable of realistic voice conversation and able to interact across text and image.
New audio capabilities enable users to speak to ChatGPT and obtain real-time responses with no delay, as well as interrupt ChatGPT while it is speaking, both hallmarks of realistic conversations that AI voice assistants have found challenging, the OpenAI researchers showed at a livestream event.
At the livestream event, OpenAI researchers showed off ChatGPT’s new voice assistant capabilities. In one demo, ChatGPT used its vision and voice capabilities to talk a researcher through solving a math equation on a sheet of paper. In another demo, researchers showed the GPT-4o model’s capability of real-time language translation.
OpenAI’s chief technology officer, Mira Murati, said during a livestream event that the new GPT-4o model would be offered for free because it is more efficient than the company’s previous models. Paid users of GPT-4o will have greater capacity limits than the company’s paid users, she said.
“It feels like AI from the movies … Talking to a computer has never felt really natural for me; now it does,” OpenAI CEO Sam Altman wrote in a blog post.