Art and TechnologyNovember 2024

Taking Stock of Generative “AI”: Systematic Work of Michael Mandiberg, Penelope Umbrico, and Trevor Paglen

img1

Trevor Paglen, A Window Grid (Corpus: The Interpretation of Dreams), 2017– . Dye-Sublimation Print on aluminum, 41 3/5 x 53 inches. Print Edition: 5 + 2AP. Courtesy the artist.

Last month, I ended this column on an argument made by the philosopher Jennifer Corns that agency may be best served when we consider our various agentive forms. Thinking in multiples challenges us because it presents different perspectives, and so requires considering assorted contexts. From the strangeness of quantum physics in the 1920s to 1970s systems thinking, to globalization at the end of the twentieth century, however, much demands that the twenty-first century consider association, networks, and the uncertainties of potentialities. This moment around generative “AI” has some artists trying to express how “artificial intelligence” is not a single thing, and how it is a medium of quantities.

Though we all know about the internet scraping for the data sets that produce these public generators, it is hard to fathom the enormity of data necessary for an output. Michael Mandiberg’s series “Taking Stock” visualizes the many that are always a part of any singular generated image, without ever using generative “AI” (more on the term at the end). I am looking at a slightly blurred image of a person, hands covering mouth, eyes wide. The image print is about life size. Looking back at me, we confront each other’s participation in this system.

Starting several years ago, Mandiberg sourced 130 million stock photographs from the internet. From that data set, they work with about 75 million, having eliminated any that present multiple people or do not include a whole face (thereby eliminating close-ups of eyes, hands, legs, etc.). These are grouped, according to the position of their bodies or hand gestures, among other classifications. Machine learning is necessary; given the quantity of source material, they have to use the statistical models that anchor machine learning in order to sort and understand what is there. Such tools enable the production of subsets; grouping the data by keywords into what they call “thematic topics,” or by finding the bodies and grouping those by pose. Faces looking directly at the camera, as used for “Taking Stock,” constitute 5 million photographs.

img2

Michael Mandiberg, Topic 32, Pose 13, Gesture 2 (shock, surprise, mouth, etc.), 2024. Pigment print made from 64 found photographs selected and sequenced by custom AI and Machine Learning software. Courtesy the artist.

Further refining, for the thematic topic related to “phone, mobile, communication, etc.,” Mandiberg derived 76,593 images from the 5 million. Wanting a model’s hands centrally placed at the torso narrowed it to 5,928 images. Filtering out subtle variations in hand gestures (prayer or interlocked fingers) established a cluster of 2,618 images. Working through this process revealed a potential 5,687 clusters that contain at least 100 stock photos—a minimum to make one of their images—and may include thousands. Each thematic topic has a data point that represents its statistical center: an ideal derived from the photographs of the cluster but a point at which no photograph exists.

A lot of writing on this topic will refer to that data point, and the way it is passed through the system as a “neuron,” but both Mandiberg and I agree that the language conflating brain and machine processes needlessly obfuscates—besides reinforcing an ideology that is problematic, instantiating a relationship that many scientists would counter, and erasing distinctions that could help with the ethical questions arising from these softwares’ proliferation. The cluster produces that data point, but that data point also anchors those photographs for the thematic topic (and any new ones that may be introduced into the data set). Each photograph has a statistical relationship to the cluster’s ideal. This is how image generators work (somewhat simplified), and why Emily M. Bender and Alex Hanna have referred to them as “mathy math”  instead of “AI”—precisely in order to disrupt the grandiloquence the more popular term implies.

If you are still following me, that cluster of photographs—with a degree of probabilistic proximity to the idealized data point—is what Mandiberg extracts for the material that goes into the works that are part of the “Taking Stock” series. From the hundreds or thousands contained in the thematic topic, Mandiberg’s code selects sixty-four nearest the statistical center and merges those. This number was discovered from trial and error through a project developed over two years. In other words, not only does this project stem from an extensive computational filtering process, but also a laborious engagement with this set of images to produce a “palette” (permit me to jump practices for the sake of a metaphor) that must then still be fine tuned. Using over 128 photographs lost the relationship of the layers, merging them into something “too smooth,” as Mandiberg said. The soft edges of the figures in each image and subtle presence of the watermarks are necessary reminders of this being a compound image. And… the refinement continues.

I could not take sufficient notes on the process, so following up after a studio visit prior to Paris Photo, where these will be on display for the first time, Mandiberg explained it:

I select the median images based on the position of the body. People holding black phones are more likely to be photographed looking down at their phones in concentration, while people with pink phones are more likely to look directly at the camera and smile. To merge each image I wrote simple code that merges pairs of images, until all images have been merged.

This produces the translucent, impressionistic content of the final image.

img3

Michael Mandiberg, Topic 63, Pose 23, Gesture 21, Object 67 – Black Phone (phone, mobile, communication, etc.), 2024. Pigment print made from 64 found photographs selected and sequenced by custom AI and Machine Learning software. Courtesy the artist.


img4

Michael Mandiberg. Topic 63, Pose 23, Gesture 21, Object 67 – Pink Phone (phone, mobile, communication, etc.), 2024. Pigment print made from 64 found photographs selected and sequenced by custom AI and Machine Learning software. Courtesy the artist.

Without using “generative AI,” Mandiberg’s process laboriously emulates it to reveal the constructed nature of data sets that undergird public generators. Careful looking at the images shows faint impressions of watermarks or symbols to remind us of these images’ origins. Stock photographs are significant parts of the training data, but these have a complicated history, with unexpected geo-politics accompanying their better known gender and race biases.

Mandiberg discovered that Ukraine and Russia each authored more than twice as many images than the United States; Belarus and Serbia have each authored more than France, Germany, and the United Kingdom combined. The vast majority of those are tagged as white/caucasian/European. Before the EU even considered Ukraine, its image market was already situated there. The stock pic of the “girl next door” isn’t from some US suburb, but emulating that stereotype markets it to the millions who live in that cul-de-sac, though they may not conceive of that Ukrainian woman as a neighbor. This is the reorientation of our day.

The magic, excitement, and despair surrounding generators can be easily tempered when we begin to grasp how they work. Mandiberg has been making work about labor, commerce, and internet culture for over two decades. From AfterSherrieLevine.com (2001) that appropriated the appropriation artist for a right-click–save culture, to Print Wikipedia (2009–16) that manifested the density of the site’s content in 7,473 volumes, to Postmodern Times (2017), reimagining Chaplin’s Modern Times (1936) using Fiverr gig work, Mandiberg stages the machinations behind a culture of user design that emphasizes efficiency, functionalism, and utility over all other values. “Taking Stock” invites us to look between and through the pixels. They discern a poetics amidst the politics, illuminating the ethical quandaries produced by these probabilistic systems. That is what aesthetics shapes for our consideration.

img5

Penelope Umbrico, 5377183 Suns from Sunsets from Flickr (Partial), 2009. 104 x 288 inches. Installation view: SFMOMA, San Francisco, 2009. Courtesy the artist.

When Penelope Umbrico made Suns from Sunsets from Flickr (2006– ), the project revealed not only that object as the most common tagged photo on the site but also the similarity of what we all capture. The 541,795 suns she appropriated in 2006 became 2,303,057 suns a year later, and by 2016 the set included 30,240,577 images. Still the same pic, though. And, like Mandiberg, she also notes the copyright symbols in 72 Copyrighted Suns / Screengrabs (2009–12). When the suns are displayed—obviously, as partial installations of the sum total—the effect is profound, for me heartbreakingly so. The riot of red, orange, and yellow across the consumer lab prints of each image reminds me of standing in stores waiting to pick up my rolls of film. I don’t do that now—I haven’t printed a photo in… twenty years? I appreciate the tenderness in Umbrico’s memorializing our similarity, “our collective practice” as she calls it, even as I wonder at the visual ideology that these pictures manifest. Everyone’s Photos Any License (2015–16) shows this occurrence within the specialized practice of photographing a full moon. Umbrico’s 2020 video work, Cloud/Paper/Screen (218 Photographs of Clouds in the George Eastman Museum Collection, 1850-2006) blends these images to show not only what clouds do, but also our internet infrastructure (as Mia Stern described in the July column). This morphing brings me back to generative systems.

img6

Penelope Umbrico, Screenshot 2015-11-04 14.22.59, 2015 from Everyone’s Photos Any License. Archival pigment print 12 x 534 inches. Installation view: Bruce Silverstein Gallery, New York, 2016. Courtesy the artist.

In London, during Frieze Week, I got to see some of the works from Trevor Paglen’s series “Evolved Hallucinations” (2017– ), now showing at Paris Photo. This body of work dives into a generative adversarial network (GAN) to present the “primitives” in the latent space. The images of a trained data set have subcomponents (primitives) that relate to the focus of the set; for example “a banana is likely to have two arcs ranging from the top to the bottom of the fruit; it could have yellow color gradients, some brown spots, a stem at the bottom, and so on,” as he stated in the interview with Anthony Downey that is part of Paglen’s book, Adversarially Evolved Hallucinations (2024).

Paglen went spelunking into various data sets he created, like “OMENS AND PORTENTS” that includes categories like “rainbows” and “black cats.” The discriminator gets trained on many such images and then the generator aims to produce a rainbow or black cat that the discriminator will accept as such; the discriminator gets better and better at producing primitives that have not been expressed (tagged, described, identified) in the generator’s data set in order to generate a satisfactory image. Paglen, however, intervened by targeting a specific data point (the neuron, as many still call it) in the latent space and instructing the generator to do something with it. The project shows how the “algorithmic rationalization of data … can pick up on patterns in data that simply do not exist except, that is, within the preserve of a computational illusion,” as Downey clarifies in his essay for the book.

img7

Trevor Paglen, Rainbow Grid (Corpus: Omens and Portents), 2017– . Dye-Sublimation Print on Aluminum, 41 3/5 x 53 inches. Print Edition: 5 + 2AP. Courtesy the artist.

Some of the images Paglen captured from these places were grouped in grids and printed to be shown in the London Fellowship gallery. Their blurred, blobby, indistinct elements extrapolate the generative process. In a grid, the variations emerging from a data point cohered Paglen’s process and intention. Though the works are sold individually, the display in grids (6 by 6 or 11 by 14) visualized the associations, relations, statistical clusters, and probabilities that are significant to these machine learning processes, as Mandiberg was emphasizing too—and which I don’t think gets expressed as a single image.

As I have said before about generative art, seeing work in sets reinforces the material, or the medium. A single image lands us back into thinking of individual objects, but exploring these generative “AI” systems necessitates holding onto the associative and relational qualities of the calculus behind the extracted image. This multitude is inherent to photography, not only because of the medium’s reproducibility, but as part of assessing the artist’s style. The philosopher Nigel Warburton’s argument in 1996 established a photographer’s style as apparent through looking at a series of photographs, not a single object as it was for painting (although it is worth noting that a painter’s style was articulated by discussing brushstrokes, so we have multiples again). Thinking in multitudes like sets is part of these practices’ aesthetic criteria; sets make sensible the formations driving the artist’s intent and context, as well as logics of these science and technology systems, and highlight the context for subsequent ethical concerns.

I hate talking about “AI” because the term makes singular by encompassing significantly different technologies and protocols, thereby obscuring distinctions. The term artificial intelligence was first used in 1955 as part of a grant application— its sexiness a hope to garner funds. Many, even on that grant team, loathed the term. Imagine instead “complex information processing,” as Herbert A. Simon and Allen Newell preferred. So why does “AI” stick around? Search engine optimization, ranking, clickbait… it perseveres, now as it did then, for attention and fiscal reasons. There is a bigger picture at stake, which the work of Mandiberg, Umbrico, and Paglen make evident. In their works, we are invited to think about the systems we are a part of, what agentive form we bring to the different interactions we have with and within these systems, and what agentive forms these systems have too. We can take stock—of the language we use and the culture at play.

The research for this column was supported by Google's Artists + Machine Intelligence Research Awards, an unrestricted, annual fund for faculty pursuing cultural research related to machine learning and its impact on the arts. 

Close

Home