
SmartDJ is a new AI-powered audio editing tool developed by researchers at the University of Pennsylvania that lets users reshape immersive audio experiences using simple natural language instructions.
Key Innovation
Instead of manually tweaking sounds or using rigid templates, users can give high-level prompts like “make this sound like a busy office”. SmartDJ automatically interprets the request, breaks it down into a clear sequence of editable steps (e.g., “Add the sound of a phone ringing on the right at +3dB”), and then applies those changes to the audio while preserving spatial (stereo) cues for realistic 3D sound.
How It Works
The system combines two AI components:
- An audio language model (ALM) that understands both sound and text, analyzes the original audio and user prompt, and generates the step-by-step editing plan.
- A diffusion model that carries out the actual audio modifications one step at a time.
The team created a specialized training dataset using public sound libraries, large language models to generate prompts and steps, and audio processing to link goals → actions → results.
Performance
In tests (quantitative metrics and human studies), SmartDJ outperformed previous AI audio-editing tools in:
- Audio quality
- How well the result matched the user’s instruction
- Accurate spatial sound placement
It also stands out because the editing steps are interpretable — users can review, modify, or add steps manually if desired.
Significance
SmartDJ bridges the gap between intuitive text-based control (already common for images and text) and audio editing. It makes professional-level sound design much more accessible for applications like:
- Virtual/augmented reality
- Gaming
- Sound design
- Virtual conferencing
- Interactive media
Key quote from senior author Mingmin Zhao:
“With SmartDJ, users can describe the outcome they want in natural language, and the system figures out how to make it happen… This unlocks similar possibilities for audio, making it easier for more people to bring their ideas to life.”
Overall, SmartDJ represents a significant step toward intuitive, AI-driven audio editing that feels as natural as chatting with a smart assistant.