MusicMay 2026

Vox Artificialis

img1

AutoTune in practice. Photo: Kevin Ng.

There’s something irreplaceably human about our voices. How we talk and sing is unique to each one of us, even though a voice can be broken down into its component elements of pitch, duration, volume, and timbre. Those details provide the perfect substrate for technological transformation—the threat of the inhuman.

Even opera, which jealously guards the primacy of the unamplified, unaltered human voice, has not been immune to the transforming power of technology. Opera singers sing to acoustically cut through a hundred-person orchestra; it’s not necessarily a pleasant sensation up close, with a police siren–like oscillating vibrato and belted low notes that sound like a foghorn. This sound is not amenable to recording in close quarters either, and early attempts to produce opera in studios were challenging—documentary footage of Decca’s classic recording of Richard Wagner’s Ring cycle show sound engineers endlessly fiddling with microphones to capture the enormity of soprano Birgit Nilsson’s sound.

Soon, technical constraints of microphones and the accompanying phenomenon of the recording star would favor the warm, silky tones of Kiri Te Kanawa or Renée Fleming. Contemporary opera disfavors singers with piercing, buzzy high notes or chesty low notes, even if those voices project more easily in an opera house. Instead, lighter voices with warm, covered sounds are the predominant vocal aesthetic.

But within the realm of popular music, technology has transformed the way we hear human voices. Cher’s 1998 “Believe,” with its futuristic vocal distortions, was the first commercial recording to use Auto-Tune. In the song, Cher’s voice fragments into a rapid succession of stepwise scales, as if running your hand up and down a keyboard. It’s an uncanny effect that blurs the distinction between human and machine, and after T-Pain popularized its use in rap and R&B, it is now ubiquitous in every genre of popular music. Such is its pervasiveness that the choice not to use Auto-Tune becomes an artistic statement: on Lady Gaga’s Artpop, the slurred, sliding vocals of “Dope” come as a moment of vulnerability among the glittering artifice of the rest of the album. 

Digitization of the human voice, though, is a boon for a popular music industry that prioritizes commercial success over human eccentricity. Most discussions surrounding AI today refer to the subfield of generative AI, whereby massive quantities of data are remixed to produce something resembling the statistical average. Large language models like OpenAI’s ChatGPT rely on billions of published texts in order to “talk” in a seemingly logical and convincing way; this same process can be applied to music. And it’s already happening—an AI-generated song made it onto Billboard’s country music charts last year.

Ninety years ago, the German cultural critic Walter Benjamin argued that forms of mechanical reproduction, such as film and photography, divorce art from its cultural context. Take Leonardo da Vinci’s Mona Lisa (ca. 1503–19) or Vincent van Gogh’s Starry Night (1889), which have been reproduced so many times on postcards, T-shirts, and tote bags that their ubiquity renders them trivial. Though Benjamin referred to radio and photography, the parallels in the age of AI are undeniable, particularly when the chart-topping hits produced sound so mediocre. 

The problem is not one of predictability. Gioachino Rossini, for instance, freely reused music from his previous operas in The Barber of Seville. Rossini understood that he was subject to the whims of the market, and that formulaic arias were more digestible for patrons and made his own compositional process efficient. But music making offers scale in a world where unchallenging, predictable songs with widespread appeal can flood the market without the burden of human musicians. This mechanical reproduction, Benjamin suggests, allows for cultural and financial monopoly, through automation and scaling of already existing pop music machinery designed to maximize profit. Such a monopoly runs the risk of eliminating that which makes music making, particularly that of the voice, a recognizably human endeavor.

Run your voice through an audio processing software and you’ll see little peaks and valleys along the terrain of the spectrogram. These tics and eccentricities make our voices individual and distinguishable, and therefore human. For now, most of us can recognize that Siri and other text-to-speech algorithms don’t quite sound human. But what happens when the music that surrounds us in our daily lives becomes swamped with AI-generated singing? 

Even if generative AI improves to such an extent that we are unable to distinguish an artificial voice from a human one, that only addresses the outcome rather than the process. A song, whether computer- or human-generated, is what it is, but the process of singing, of engaging your diaphragm to push air through your vocal cords, is innately human. It doesn’t matter if you don’t sound as good as Maria Callas; the act of singing itself, no matter how imperfect, is a powerful human experience. A computer can optimize a song, but it can’t optimize how it feels to sing or hear. The process of singing, with its imperfections and tics, is what makes a song worth listening to. Think of the voices that bring communities together: protest chants, national anthems, rallying cries. It’s hard to imagine those ever being replaced by AI.

The unencumbered incorporation of AI into every aspect of our daily lives poses myriad issues, from environmental cost to surveillance and privacy concerns. But there’s also the potential to erase our individual expression, our unique and unpredictable quirks, the eccentricities of our own voices that we all hate when we listen to them played back on a recording—that which makes us human.

Close

Home