ISD Team
18 Mar 2026
Group of diverse office colleagues enjoying drinks and a casual conversation in a bright workspace.

MIT neuroscientists have provided a computational explanation for the longstanding “cocktail party problem” — how the brain isolates a single voice from a noisy, multi-voice environment. Using a modified neural network, the team found that amplifying the activity of processing units tuned to a target voice’s features — such as pitch — is sufficient to bring that voice to the foreground of attention, a mechanism known as “multiplicative gain.”

The model closely mirrors human behavior, including the kinds of errors people make, such as struggling to distinguish between two voices of the same gender due to similar pitch. Spatial location also plays a key role: both humans and the model perform significantly better at isolating voices separated horizontally (left vs. right) than vertically (up vs. down), a finding the model helped predict before being confirmed in human experiments. The researchers hope to apply this work to improve cochlear implants, helping users more effectively filter out background noise in crowded settings.

Key Findings:

Amplifying the Right Signal: Focusing on a particular voice causes the brain’s neurons — specifically those sensitive to that voice’s characteristics like pitch — to ramp up their activity, essentially turning up the volume on that signal.

Voice as a Reference Point: The model is primed with a short sample of the target voice, which tells it which neural units to prioritize. A deep voice, for instance, boosts the units sensitive to low frequencies while dialing back those tuned to higher ones.

Left-Right Beats Up-Down: MIT researchers found that separating voices side-to-side makes them far easier to distinguish than separating them above and below — a pattern that held true for both the model and human listeners.

Human-Like Blind Spots: The model replicates the same mistakes people make, such as having difficulty telling apart two male or two female voices, since similar pitch ranges make selective listening harder.

To the study

Share on