
Abstract
Procedural Audio involves real-time, controllable sound generation. It offers an alternative to sound design based on the use of pre-recorded samples. Recent research breakthroughs have resulted in huge advances in sound quality for procedural and generative audio. This approach can now dynamically generate all the sounds of a virtual world.
Description
Across the Creative Industries, sound effects are usually sourced from vast libraries of pre-recorded samples. Their use requires significant searching and editing, is susceptible to repetition and limited to only existing recordings. This is particularly problematic for video games, since sound generation should depend on game state, which may not be possible by just processing samples.
Procedural audio involves the real-time generation of sound, capable of dynamically adapting to changing controls. It allows sounds to change based on game state, with infinite variety and low storage requirements. Procedural audio is not a new concept, but it has not yet been widely adopted due to two challenges; the range and quality of the sounds that it produces. There are concerns that some sound effects may not be possible to generate, and that some generative sounds are lacking in perceived realism.
Nemisindo (the Zulu word for ‘sound effects’) aims to transform sound design through innovation in procedural audio. The approach taken by Nemisindo’s team showed that a single procedural audio engine can be made to produce a wide range of sounds in real-time. Using a common design technique, Nemisindo have demonstrated that a small number of versatile, lightweight procedural audio models can generate the sounds of an entire sound effects library.
In order to improve sound quality, Nemisindo showed that the sound models can be optimised using modern machine learning techniques, ensuring the closest possible match to high quality recordings that can be achieved by a given procedural model. They then went even further to provide high quality and controllable generative sound. Neural audio synthesis (use of neural networks to generate sound) has exceptional sound quality, indistinguishable from recordings, but lacks control. A recent research breakthrough by the team at Nemisindo allowed real-time neural synthesis to be guided by a procedural model. This delivered the best of both worlds; generation of high-quality sounds while inheriting the control interface of the procedural model.
These innovations have been taken out of the lab and delivered as plugins for game engines and audio middleware, online sound effect generators, and film and VR demonstrations. Further advances in procedural audio have the potential to render any auditory world, real or imagined, with real-time control based on events in the scene.
