
This is an article by Iain McGregor.
Summary
Sound in factual media has never been neutral. Long before AI, audio was shaped through mic placement, editing, and emotional framing. But AI has accelerated and obscured these practices, making sound easier to manipulate, harder to detect, and even more powerful in shaping what we believe. This article explores how audiences already know that sound lies, how manipulation often breaks down when subject matter becomes familiar, and why the stakes are now too high to excuse silence. From nature documentaries to news coverage, the need for transparency has never been clearer. A modest proposal is offered: a voluntary metadata scale, verified by AI, that discloses the extent of audio construction. Because if factual media enters our homes as a guest, it owes us the courtesy of honesty.
When we switch on a news broadcast, a documentary, or live coverage of an event, we are not just watching, we are listening to something that has been invited into our homes. That invitation carries expectations. We want what we hear to be truthful, respectful, and clear. But we also expect it to be engaging, not confusing, not so disturbing that we switch it off. In this sense, factual media behaves like a guest: polite but interesting, dramatic but not overwhelming, just controversial enough to hold our attention but not so disruptive that it must be asked to leave.
To achieve that balance, sound is often shaped. It is equalised, compressed, smoothed, layered. Environmental noise may be suppressed; breath sounds shortened; dramatic ambiance added. These are not new practices. Directional microphones, careful mic placement, and editorial cuts have always influenced what audiences hear. What we experience as “reality” has long been constructed.
Take nature documentaries. Anyone familiar with the species portrayed can usually detect the artifice. Prey animals rarely cry out while being hunted. The wingbeats of owls do not thunder through the forest. Frogs do not croak on cue with the music. But audiences are willing to suspend disbelief, even in factual genres. These sonic exaggerations are not designed to fool experts, they are designed to help general audiences follow a narrative. Music, ambient sound, and reconstructed vocalisations guide our emotions.
The same applies to sports. While the gameplay may be live, the sound rarely is. Commentary is often rerecorded. Offensive chants are mixed out. Impact effects are added. These edits are not hidden, but neither are they openly acknowledged. They serve an implicit contract: to make the experience feel real enough to believe, but shaped enough to control.
Journalism carries a different weight. Audiences rely on it for evidence, for context, for truth. But news audio has never been a purely objective record either. Distant unrest may be softened by noise reduction. Editorial choices about which mic to place where shape which voices we hear. Even before AI, clips were trimmed and spliced to maintain clarity or structure.
One example from media history illustrates how blurred this line has always been. During the Second World War, some of Winston Churchill’s most famous broadcasts were likely voiced not by the man himself, but by actor Norman Shelley. Churchill gave the speeches in Parliament, but Shelley performed them into the microphone. The public heard what they believed was Churchill’s voice rallying the nation. And in a way, it was, but it had passed through performance, editorial need, and the limits of early recording technology.
Some might argue that documentaries are first and foremost storytelling, and should not be held to the same standards as journalism. But in the context of climate change and ecological crisis, this distinction grows less defensible. When documentaries shape how people understand the natural world, or trust scientific evidence, their impact is no longer merely aesthetic. They are not just entertainment, they are epistemological tools. If a work aspires to be considered documentary, that is, a representation of reality, then the obligation to be transparent about how that reality is shaped must follow. The line between information and persuasion is too thin, and too urgent, to excuse evasiveness.
AI has not invented any of this. Every technique now used to manipulate, enhance, or fabricate audio already existed. What has changed is that these techniques, once the domain of specialists with years of training and expensive equipment, are now available to anyone with a phone or laptop. AI removes friction. It makes what was once laborious immediate, what was once imperfect seamless, and what was once detectable all but invisible.
With generative tools, a producer can now clone a voice, synthesise background noise, or edit speech mid-sentence without any audible cuts. A technique known as frankenbiting, cutting together fragments from multiple sources to create a sentence that was never spoken, can now be done in seconds with voice cloning, creating seamless results that are all but undetectable.
Frankenbiting has long existed in reality television and promotional trailers. In documentaries, it is ethically fraught. The audience may hear a voice that sounds authentic, delivering lines that carry emotional weight, without realising that those words were never spoken in that order, or not in that tone, or not in that moment. This is no longer just a matter of editing, it is composition. We are not hearing evidence. We are hearing a performance.
The ethics of sound are no longer the sole concern of broadcasters or documentary filmmakers. With smartphones, editing apps, and powerful AI tools, millions of people now produce media: remixing interviews, creating sound-rich podcasts, narrating everyday life, or constructing fictional personas. As a result, audiences are no longer naive. Many listeners now understand how sound can be compressed, clipped, sweetened, or stitched together, not just because they have seen it, but because they have done it. This growing familiarity does not make manipulation more acceptable. If anything, it sharpens the demand for transparency. When listeners know how easily sound can lie, they have every reason to expect honesty about when, how, and why it has been shaped.
This awareness often sharpens at the moment of personal recognition. Listeners may accept the illusion until the subject matter overlaps with their own lived experience, a protest they attended, a sport they play, an animal they have studied. That is when the dissonance emerges: a sound effect that could not have occurred in that space, a crowd that did not chant that phrase, a silence that felt too clean. These moments break the spell. They reveal that the supposed realism was a crafted approximation all along. Audiences do not need to be told that sound lies, they notice when it fails to lie well.
These concerns are not confined to Western media. Around the world, many audiences are already highly attuned to media manipulation, particularly in environments where state broadcasters, partisan outlets, or corporate platforms control the narrative. In such contexts, trust is often provisional, and audiences may engage with news as something to be decoded, not simply received. But AI-generated sound adds a new layer of uncertainty. When even the voice of a public figure or the noise of a protest can be fabricated or erased with perfect fluency, cynicism may harden into fatalism. The challenge is not simply to avoid deception, but to rebuild trust where it has long been in question, and to ensure that new tools do not deepen old patterns of control.
Even so-called authentic recordings are not neutral. Someone still decides when to press record, where to place the microphone, what to include, and when to stop. Every frame, every mic angle, every moment of captured sound is a choice. AI does not introduce this subjectivity. It only makes it easier to bury. When manipulation becomes frictionless and invisible, the danger is not that audiences will believe too much, but that producers will believe their work is beyond question.
Solutions do not need to be heavy-handed. One possibility is a voluntary metadata layer, a piece of software or open protocol that attaches to any media file and displays a confidence rating or editing scale. This would not dictate what producers can or cannot do, but it would allow them to declare how the sound was shaped: whether voices were spliced, ambient noise replaced, synthetic speech introduced, or levels adjusted. Viewers could then choose whether or not to display this scale, depending on their own level of curiosity or scepticism. And it could be paired with an independent verification service, using AI ironically to audit AI, evaluating whether the declared confidence and edit levels match what is actually present. This model supports informed engagement rather than imposed restriction, encouraging a culture of openness rather than compliance.
The question is not whether these tools are powerful. They are. The question is whether audiences know when they are being used. When sound is presented as truth, as documentary, as journalism, as witness, then its construction demands transparency.
At present, there are few standards. Media codes of practice have historically focused on visual manipulation, not audio. AI regulation is only beginning to consider sound. Provenance tools and watermarking systems exist in principle, but are not widely implemented. Meanwhile, our trust in what we hear remains largely unprotected.
Perhaps the most urgent ethical question is this: if someone has invited you into their home, how should you behave? Journalism, documentary, and factual broadcasting enter people’s lives as guests. They are welcomed in with a presumption of honesty. If the sounds they bring are stitched together, sweetened, or synthetic, audiences deserve to know. What they hear may still be moving, insightful, and meaningful, but it must not pretend to be something it is not.
Sound is powerful. It can persuade, provoke, soothe, or shock. And it is precisely because of this power that it must be handled with care. AI has not broken the ethics of factual audio. But it has revealed how urgently those ethics need to be made explicit.