Stability AI debuts new Stable Audio Open for sound design

Published on:

Stability AI is opening up its generative AI efforts for audio today with the release of Stable Audio Open 1.0.

Stability AI is perhaps best known for its stable diffusion text-to-image generation AI technology, but that’s only one part of the company’s broader portfolio, which includes multiple models for code, text and audio. In Sept. 2023, Stability AI first publicly launched Stable Audio, as a text-to-audio generative AI tool.  Stable Audio 2.0 was released on April 3 bringing more clarity and length to the generated audio.

While the full Stable Audio tool is available for general commercial use and can generate audio of up to 3 minutes,  the new Stable Audio Open is significantly more limited.  With Stable Audio Open the purpose is not to create full songs, rather it has a restricted focus on shorter pieces such as sound effects.

- Advertisement -

Stable Audio Open as the name implies, is also an open model, though it’s not technically open source. Rather than using an actual Open Source Initiative (OSI) approved license, Stable Audio Open is available to users under the Stability AI non-commercial research community agreement license. That license provides open access to the model, but it limits what users can do with it.

“Our goal with Stable Audio Open is to provide audio researchers and producers with hands-on access to one of our generative audio models in order to accelerate research, adoption, and practical creative use of these incredible new tools,” Zach Evans, head of audio research at Stability AI told VentureBeat.

What exactly is Stable Audio Open?

Stable Audio Open is a specialized model optimized for creating things like drum beats, instrument riffs, ambient sounds and other audio samples for music production and sound design. 

See also  Inside today’s Azure AI cloud data centers

Unlike Stability AI’s commercial Stable Audio product, which produces longer, coherent musical tracks up to three minutes in length, Stable Audio Open is focused on generating high-quality audio data up to 47 seconds long using text prompts.

- Advertisement -

Stability AI has also taken a responsible approach to how the model was trained. The model was trained on audio data from FreeSound and the Free Music Archive, ensuring that no copyrighted or proprietary material was used without permission.

Unleashing creativity with fine-tuning on Stable Audio Open

One of the key benefits of the Stable Audio Open release is that users can fine-tune the model on their own custom audio data. For instance, a drummer could fine-tune the model on samples of their own drum recordings to generate new, unique beats.

The fine-tuning of Stable Audio is enabled via the Stable Audio Tools library, which is licensed under an actual open-source license. The Stable Audio Open Model weights are now available on Hugging Face.

“The audio research team is constantly working on ways to improve the quality and controllability of our generative audio models,” Evan said. “We look forward to further commercial and open model releases that reflect the progress made by our research.”

- Advertisment -

Related

- Advertisment -

Leave a Reply

Please enter your comment!
Please enter your name here