Seeing, Naming, Knowing
The SVA/Crossed Purposes Foundation Critical Writing Grant is made possible by the generosity of the Crossed Purposes Foundation, a 501(c)(3) tax exempt organization.
In Detroit, driving at night north up Woodward Avenue, a long, wide boulevard, one's eye is caught by emerald green lights, perched on the topmost corners of gas station signs, laundromats, corner stores, peep shows, groceries, and churches. They blink quickly, three times in a row. Their green makes for strange beacons, at first eerie, then comforting, not a warning, but an invitation.
The green lights are part of Project Green Light Detroit, an initiative undertaken by the Detroit Police Department to create safer businesses through a "public-private community partnership."1 Business owners buy real-time cameras which generate feeds that run continuously to a real-time crime center. Though there are instances of single feeds being monitored by a department officer at the very moment a crime is about to happen or is in media res, it is more often the case that there is not enough personnel to oversee.2 The footage is stored digitally and can be tracked back and used to prosecute criminality. The intention, as the Department phrases it, is to make people feel safe. Business owners and customers alike, in the collapse of community and civic services, agree: they feel safer.3 They would rather go to a remote establishment in a food desert that has a green light and feel a sense of safety and protection, however misplaced, however unsettling the idea that being surveilled provides comfort at all.
The effect is strange. There is no sense of how one's image might be taken into account, or when in the future, or to what end. Standing on the sidewalk below the green light, I have an uneasy sense, but not just of being watched. Being watched is not new for a person of color, for an immigrant. My name might be common in Detroit, but is still marked differently. My sense of dread is not from an expectation of some common, to-be-expected bias or suspicion—say, my name cross-referencing with someone actually guilty of a crime. Here, instead, dread seeps in as I recognize my total lack of access to the logic behind the green light eye: what it will choose to name from what it sees, and then what it will do with this knowledge.
What kind of eye is this? Is it crude, registering general movement, unable to tell between a person and a car? Or is it, as is the case in many CCTV systems across the world, soon to be driven by artificial intelligence, a machine vision system that can differentiate between types of cars, years of make, that can discern between an aggressive movement versus a "soft" one? How is my movement understood? What does it see in recognizing my face, and what does it see in recognizing yours? If the eye is looking "for bad guys," what understanding does it have of "bad"? What does "aberrant" or "threatening" look like, on this block in Detroit, and how is it named, and how is it understood? What does this naming mean a few miles away in the plush suburbs, in Royal Oak, or Ferndale, where there are no blinking green lights?
I imagine the complex layers of possible seeing: mostly dumb, or quite smart, maybe hazy, maybe precise seeing that are embedded in this camera. I am not quite sure if the lights are on, but no one is home, though it seems unlikely anyone will peek in on me standing here at this moment. I also sense a growing awakening of an active analytical apparatus. These possibilities all contradict: there is an eye, there, and it could or could not be looking at me, and at our car; it could or could not be tracking my head or my face, coming and going. I am possibly a jerky shadow, nervously bobbing in the snow, a ghost on a screen between many others in a room. I am possibly rendered in awfully high resolution, five feet from this camera, all the sleepless lines of my face examined impassively by an officer on her break. Am I frowning? I rearrange my expression to approximate neutral. Do I look cowed and anxious? I stand up straight. Was I pacing just then, waiting for my friend to check out? I slow down my movements, try to be clear with my hand gestures. I smooth my hair. I am, I think, transparent. I mime normalcy.
Cameras like the Green Light are more and more common. The unease comes from the fact that many are at various points of transition. They are moving from passive to active seeing, a state of transition which citizens are less able than ever before to discern or track. The cameras are portals to a number of possible analytical processes. The cameras are watched by people, while some are being paired with facial recognition technologies, while others are uploading their feeds to a computer struggling, but learning, to understand video. If the cost to hire people to sit and watch two hundred live feeds is unreasonable, then computers certainly can do the job.
In comes machine learning, the most important branch of artificial intelligence studies, in which all those thousands of feeds of video can be analyzed, all their data separated out and categorized, and connections made according to variable theories. The system teaches itself to identify patterns in the data to make future decisions. These industrious AI took years to break down "people moving" as a category as distinct from cars driving and birds in flight, and now can identify types of people with rapidly improving granularity. But many can make crucial mistakes about context. Having a neural network training on thousands of live feeds for a year results in an ability to tell that, yes, now I am waving at the camera, but is it in distress or greeting? I am crouching, but is the crouch in fear or in aggression?
Over the past year, I wrote much of this essay in Detroit, and during that time, I began to see these cameras more, and think on how being watched was no longer worrisome, how inured we have become to the watching, tracking, and gathering of all the stuff of our lives. Instead, their warning is more dimensional, more offensive, and it generates through my understanding of an imprecise logic, a high probability of misalignment of what is seen, what is named, and what is then "known" by this eye, that connects to a digital brain that is trying valiantly to know just enough to maximize profit, say, by levying fees on expired license plates. I still worry about being seen or watched. But I, we, have a new worry to really grapple with: that the poor logic of naming that results from a partial or hazy seeing can infect one's life, along a much different scale of time—our whole lives—and in unpredictable ways. As I will discuss, machine learning further compounds, amplifies, and stretches this determination of "me" and "you" along an impossible time scale. The original determination of essence creates an over-essentialized self, which proliferates and becomes immovable, incredibly difficult to revise.
Being badly watched, then, becomes secondary to being badly named because of that watching. Machines can see us both very clearly and very incorrectly, such that the incorrect or incomplete tagging is frozen into an objective, lasting state of knowing who we are, the kind of people we are likely to be. Say you are on the way to a Christmas party, and Project Green Light captures you exiting a liquor store in the middle of the afternoon. You have no memory of this visit, the circumstances around it, or that you would need any potential alibi or witnesses. How would that image potentially influence a jury in which you've been found to be in the wrong place, at the wrong time? That finds your running to your car interpreted as evidence of guilt? How could the tag of your video—"adult mid-day liquor store"—affect your insurance rates as metrics like citizen scores that evaluate your trustworthiness for a loan become more the norm? What defense would one have against social forces that can choose to interpret five seconds of you on a recorded feed from five minutes of your life through any number of illogical or violent or poor readings?
Jay Stanley of the ACLU notes the undeniable effect of these cameras is that "people will begin," inevitably, "to monitor themselves constantly, worrying that everything they do will be misinterpreted and bring negative consequences on their life."4 In the above scenario, I would slowly learn to try to be discreet in what I buy, and then avoid buying anything that could be misinterpreted. Stanley describes nightmare scenarios of AI-driven machine vision screwing up, potentially causing lost lives through outsourcing of human interpretation, which is itself flawed, but still far better than AI in interpreting.
The green lights, linked in a necklace of eyes watching on every block, add up to one symbolic occluded eye above us, opaque, unmoored, fluid, like an orb floating over the grid of the city, its impossibly long blocks, and in a cold rush, dropping many levels to float by our car, through crowds, rubbing up against people, drone-like examining their bodies, clicking, and then retreating back up. The eye wanders over a map of happenings about-to-be, potential missteps, and unforetold crimes. We walk on this unsteady map, which is tenuous, always ever-becoming, its edges and alignment produced by the eye.
If we were to look into this eye, we'd see nothing of what is described above. We can't even see this "machine eye." We would see our own reflections, if anything at all. In New Dark Age: Technology, Knowledge, and the End of the Future (2018) the artist and writer James Bridle cogently, grimly, and relentlessly outlines the horror of this state of affairs, specifically the condition of invisibility.5 "We all live inside a version of the ENIAC," he writes, referring to one of the world's first computers developed by the U.S. military, "a vast machinery of computation that encircles the entirety of the globe and extends into outer space on a network of satellites," he writes. "It is this machine, imagined by Lewis Fry Richardson and actualized by John von Neumann, that governs in one way or another every aspect of life today. And it is one of the most striking conditions of this computational regime that it has rendered itself almost invisible to us."6
The neutral, floating orb actively produces a constantly evolving simulation of society as it will function. In this simulation of future happening, people might well be empty models moving around elegantly (ideally) but choppily (in practice) from grid block to grid block. There is a reason for such reduction of human complexity to achieve statistical ends, and it is not some kind of a conspiracy. The use of the grid block in patterns is designed to homogenize humans as groups in order to make predictions and produce efficiency. Simulations are used in astrophysics, in climate change modeling, in disaster relief. However, computer scientists, who design simulations, have only been working for fifty or so years, and there is a growing field of research suggesting there needs to be greater allowance for revision, for iterations, to account for more rhetorical and scientific imagination and inventiveness.7 There needs to be a mandate to check the bounds established in simulations, especially when they erase or reduce us or compound inequities, and a porosity in the margins through which we can exist fully.
We currently might not have access to how any one specific camera in a mall, in a city square, in the bank foyer, embedded under store shelves, "sees" us. It will be years before we know how the eye is and has named us. We cannot even see the code, which might make some aspects of it at least sensible, or the hardware or the data center. Like so much else in our computationally immersed world, we have been carefully trained to learn to live with cameras and screens and devices with beautiful interfaces and all the serotonin-pinging one might dream of, embedded with complex surveillance tools that marry us to this invisible world of images and objects, that look back at us in ways we are blind to.
However, there's hope here, as people are clever, and can learn even in the most seemingly oppressive, strange, maddening psychic circumstances. We have language and interpretation, contextualization and theory. We can know where this comfort with design, with such seeing of us from above in these occluded ways, comes from, by tracing back through the imperialist history of such weird and violent seeing. Though we are denied the right to hear the machine articulate its process or reasoning, one way for us to see "into" this eye is to understand it, understand its occlusion, understand its history (because it is engineered by people), its ideology, its purpose. We have to understand how it produces reality, how its picturing of people as criminals and good citizens become universal statements, how it sees us in poor and high resolution. We have to understand how its technical creation is intended to elide ethical issues. Scholars, journalists, and activists are long and well on the case of AI and automated vision, advocating for better laws and better engineering to counter and correct the unethical violence of these specific algorithmic systems. But the everyday person, who is forming the material for these systems' training, has a different set of tools for understanding. We might already have experience with resisting being typed, that type of us based on messy historical and cultural data that says little about who we are. We might also have a lot of experience with being in low resolution to some. In understanding how a machine sees us, we might be able to make strategies—visual, conceptual, or material—that we can use to intervene.
Why focus on machine vision when there are now computational analogues for every avenue of human sensing and expression? The giants—IBM, Google, MIT, Intel, Facebook—are all deeply invested in refining every avenue of artificial intelligence, activating it, embedding it, in order to both make sense of their unwieldy universes of user personal data gathered over the last decade, and create an artificial simulation of the human brain, through a careful layering of natural language analysis, neural networks, and machine learning. This essay could have been about the ethical knots of artificial language studies (language proving arguably far more difficult to master than vision), or artificial creativity (interactive robots learning to paint from what they see) or fascinating, evolving research in generative adversarial neural networks, in which two computational models of the brain "compete" to create more accurate images.
For one, in writing about new technology, especially systems and fields as misunderstood and feared as artificial intelligence, there is often a focus on the scientific and academic discoveries, the breakthroughs made on the way to creating AI that is more like a human being. This research is often remarkable, and given more radio play as companies hope to downplay the more legitimately worrisome aspects of AI, the dismaying rise of "artificial stupidity," as Hito Steyerl describes.8 Surely, what a simple sorting AI can accomplish in minutes searching thousands of video feeds to label cars and color and animals would take a person many times over their own lifetime. During conflicts, machine vision can be mobilized to scan journalists' reported visual media to identify perpetrators, as in a riot clash with police.
More often in practice, however, computer vision tools are used to create an intensely boring and bureaucratic version of a predicted future, one that novelist William Gibson painted perfectly in Pattern Recognition (2003). Tagging processes sort geolocations, recognize landscapes, people, objects, situations from the billion or more personal images uploaded a day onto social media platforms. A virtual composite of your face becomes a focal point to tail you, create a long digital shadow of your activity, so you can be sold shoes and anxiety medication and gym memberships and home mortgages.
On a personal note that might help clarify my position: as an art critic, I have long advocated for the artistic, intellectual, critical, and philosophical potential of working with and through the discoveries of AI. Its development challenges much about how we think and process and act, how we center ourselves as a species along a vast spectrum of intelligence. In the last years, as a critic of technology, my more tempered position is that these investigations cannot remain in a solipsistic vacuum. Only this kind of critical, abstracted understanding of artificial intelligence—its design, its purpose, its drive of machines—will help position ourselves in relation to automated mass surveillance tools.
How we see or unsee is the primary ethical question in a culture and computational regime that privileges vision. And how we see, name, and know the world is increasingly influenced and shaped by how machines see, name, and know; machines read images and then produce a matrix of knowledge that deeply shapes how humans read images on the same platforms. A powerful feedback loop between human and machine vision is established, making the question of how our thinking is shaped through this dance vital. The logic and illogic of how we are seen, named, and known, is the heart of contemporary political deadlocks: who is seen, considered innocent in intention by default, then protected? Who is unseen, then: considered unruly, uncivilized, better placed in solitary confinement, pending a trial for likely crime? And of all the developments in AI, machine vision most clearly helps create mapped determinations of who we are and what we are capable of: our employability, our loan eligibility, and our trustworthiness. It can affect the quality and safety of our lives, given facial maps are made from extant data to predict perceived sexuality, criminality, potential for harm.
I stress the logic and simultaneous illogic underlying machine vision because its values amount to a belief system. Like all beliefs, the "religion" of our dominant technology is provisional, subject to change. These beliefs can be reprogrammed to be democratic or humanistic—or whatever values are chosen by consensus—and re-inscribed into our active, intelligent tools. This can happen at the level of the algorithm, and even more profoundly, in the mind of the mapmaker—in computation, the simulation programmer—herself. I will try to link how this metaphorical eye creates a map of "known" but ultimately virtual knowledge of the world, where virtual does not mean "not real" but in fact is a description of the actual process of simulation creation. (Live simulations—imagine, as the artist Ian Cheng describes, a "video game that plays itself"—define much of the forecasting and predictive modeling of contemporary society).9 The simulation offers a perfect opportunity for intervention. This is a profoundly new and emergent field, that of the rhetorical or scientific imagination, in which we can have critical discussion around what kind of systems, and simulations, and intelligences, we even want as a society.
Throughout this essay, I use "machine eye" as a metaphor for the unmoored orb, a kind of truly omnidirectional camera (meaning, a camera that can look in every direction and vector that defines the dimensions of a sphere), and as a symbolic shorthand for the sum of four distinct realms in which automated vision is deployed as a service. (Vision as a Service, reads the selling tag for a new AI surveillance camera company).10 Those four general realms are:
1. Massive AI systems fueled by the public's flexible datasets of their personal images, creating a visual culture entirely out of digitized images.
2. Facial recognition technologies and neural networks improving atop their databases.
3. The advancement of predictive policing to sort people by types.
4. The combination of location-based tracking, license plate-reading, and heat sensors to render skein-like, live, evolving maps of people moving, marked as likely to do X.
Though we live the results of its seeing, and its interpretation of its seeing, for now I would hold on blaming ourselves for this situation. We are, after all, the living instantiations of a few thousand years of such violent seeing globally, enacted through imperialism, colonialism, caste stratification, nationalist purges, internal class struggle, and all the evolving theory to support and galvanize the above. Technology simply recasts, concentrates, and amplifies these "tendencies." They can be hard to see at first because the eye's seeing seems innocuous, and is designed to seem so. It is a direct expression of the ideology of software, which reflects its makers' desires. These makers are lauded as American pioneers, innovators, genius-heroes living in the Bay Area in the late 1970s, vibrating at a highly specific frequency, the generative nexus of failed communalism and an emerging Californian Ideology. That seductive ideology has been exported all over the world, and we are only now contending with its impact.
Because the workings of machine visual culture are so remote from our sense perception, and because it so acutely determines our material (economic, social), and affective futures, I invite you to see underneath the eye's outer glass shell, its holder, beyond it, to the grid that organizes its "mind." That mind simulates a strain of ideology about who exactly gets to gather data about those on that grid below, and how that data should be mobilized to predict the movements and desires of the grid dwellers. This mind, a vast computational regime we are embedded in, drives the machine eye. And this computational regime has specific values that determine what is seen, how it is seen, and what that seeing means.
City on a Hill
Last year at the Digital Life and Design conference in Munich, Germany—a gathering of tech CEOs and entrepreneurs from throughout the world—a former Google and Facebook executive gave a presentation about a ski-cap that can read your mind after analyzing your MRI scans, and perhaps present your subconscious desires back to you. I sat in an audience that was visibly uncomfortable, shifting, the moderator himself unable to tackle how nonchalant and unshocked the presenter was about her own claims. In this presentation and others, individuals in charge of massive machine learning and AI initiatives discussed this frontier of the brain, the neurological as the next land to conquer. Even the attempts at describing why—to enhance our understanding of creativity as a culture, to unlock time for us to pursue our dreams—seemed half-hearted. How could a hyper-elite possibly abuse its "mind-reading" technological potential, we joked with some nervous resignation, in the breaks. I was vaguely aware of being witness to a shift, in which no one was trying to pretend that technology is not being galvanized to widen the gap between rich and poor along the vector of hidden information and knowledge. It wasn't something to be ashamed of. The elite was describing its own intentions, its aspirations to shamanic priest status.
During one conference break, I got into a conversation with a government arts official from the Netherlands. Agog, we tried to deconstruct the wild presentations. To what end was all this information about the "mind" being acquired when no one seemed to be able to think any more clearly than before? She described her year living in San Francisco, unfamiliar with such levels of homelessness and addiction, in the degree to which it is on display in the city. She found the whole stay appalling. We compared notes on the willful blindness practiced there. I told her about my afternoon passing block after block of homeless encampments on my way to speak at Gray Area Festival, full of lovely, ethical, earnest artists and technologists. How I paused on this long walk, feeling the dull twitch of something being seriously wrong. How could I be thinking and talking about the future of "robot rights" while passing the total abyss of the present, in which human rights are violated? There was a willful blindness I practiced there.
The Dutch arts official said she didn't understand how all the startups didn't see what was in front of their doors. How would you get someone rich in a city like that, with so many technological resources, to not see the poor or homeless as suffering some fate they brought on themselves? How would you get them to really understand their role in the city or the history and lost context of, say, city zoning, of gerrymandering, of redlining, so that they'd understand that opportunity and success are not a pure matter of willpower, of manifesting through hard work alone? In a short thought experiment, we started to plan a map together, an augmented reality (AR) map that showed, depending on which neighborhood was represented, layers upon layers: historical chronicling, visualizations of business investment dropping away over decades, white flight, divestment of public education funds, privatization of public services.
Over the next months, I thought a lot about this AR map, this layer of "real facts" and statistical records. Even if all this information were to be presented to my hypothetical, stubborn, socially conservative techno-capitalist—it should ostensibly change their feelings about the less fortunate—but it likely would not. As true as every fact in the map might be, it also holds that we are prone to frame the world as detached from us, mapping it with our own blind spots. More often than not the map is made to suit our desires for comfort, for security, for moral certitude, for power, and for control. Gaming and social media platforms atomize us and elevate our expression of our singular, individualist visions of the world to such a degree that we are encouraged to hold on to our own maps of the world, our own models of people in this world, and the bounds of our specific ideology.
But there are some identifiable trends in the ideology maps that dominate Silicon Valley. The first, as artist and theorist Sondra Perry describes in her artwork Graft and Ash for a Three Monitor Workstation (2016), is the Just World Theory—the conviction that people tend to get what they deserve.11 If you are successful, it is because you wanted to be and you deserve to be. If you have hard luck, it is because you deserve to be, or due to a fault in your character or your desire. What the Just World Theory looks like, visually, is a 3D model of an empty body (think of this body model in a gridded environment, like Maya, and with your cursor, rotating your perspective around it).
In this environment, there is no indication history plays a role (where did the body come from other than a void?) or politics, or difference; the body is devoid of these cumbersome qualities. It is a so-called blank slate—a tenacious and persistent philosophical fallacy about the human mind, recently debunked exquisitely by Steven Pinker and the field of neuroscience. Its cutaway shows it is empty, without organs. This body exists easily on a flat plane with all the other bodies, all with essentially the same ability.
This fantasy body is one without history, without politics, and it is a model at the very core of game design, of simulation design, and of social engineering. This model of the body is an ideological creation that seeps into, unexamined, how we speak about people around us, like and unlike us. This model shapes the stories we tell about the possibility of others. If this body wants "success," it will simply move and work towards it. If it fails, again, it is only because of a flaw in its construction, a deficiency it can conquer to become a fully realized model.
Now imagine that these modeled bodies—whether old or young, healthy or not—all are animated to move through the world in exactly the same way, with the same levels of ability, without much friction at all. A young girl lives and moves on the same plane as a grown woman of one ethnicity, who is on the same plane as a woman from another ethnicity, who is on the same plane as a grown man of a different class. They are all expected to move in more or less the same way. And we all are positioned on the same plane, ready to go, loaded up with willpower and strength and a good attitude to pull ourselves up by our bootstraps. The plane around extends to an infinite horizon and these bodies fill the environment with positive action, towards one goal, of increased happiness and prosperity.
That scenario—of good, normal bodies moving according to set rules—would be a very rough simulation of America as it is imagined, and as some especially punitive economic policy might suggest the country should be. The hidden fallacies evident above are not factored into the bounds of the simulation, or the individual models of the bodies. These bodies do not factor in the complexity of being a person who is a container of differences, whether genetic, or neuroatypical, in the way these medical and historical facts of living and experience are carried in the flesh. Consider the great deal of work that has been done in the last five years on epigenetic trauma, how massive traumatic events are passed down through our DNA and affect our physical and mental health. There are new studies on mental illness being experienced completely differently across socioeconomic and racial groups; tests targeted to diagnose depression in white people, reads one, often misdiagnose the disease altogether in people of color.
The stakes of a game, however, are much lower than a computer simulation; I describe games because they are easily sensible, easily recalled. Though a simulation of large groups of diverse people could account for unseen histories and complex social dynamics, it is not always statistically efficient to do so. Instead, a simulation is often meant to give answers to extremely knotty mathematical problems about complex scenarios that can't be quickly solved by people: hurricanes and flooding, crowds panicking during hurricanes and floods. It represents "who we are" in the world, at least virtually, closely enough, but rarely takes into account any of the stuff of who we are in messy systems past our predictable behaviors. This isn't to say simulations or advanced computation seek to flatten human beings in an intentionally insidious way. To create simulations of behavior on large scales, to produce real-world policies, sometimes people need to be flattened or made less dimensional so their behavior can, for the most part, be predicted. They can then be engineered and moved around, and policies are made to catalyze that real-world movement.
However, there are comparisons to be made in the shared perspective across fields, and in how one perceives the representation. For decades, urban planners, game designers, and simulation scientists have been in collaboration and exchange platforms, languages, and software. There are many instances in which one is hard pressed to tell between their default illustrations.This comparable framing and top-down, isometric perspective can cause their models to sometimes bleed one into another, such that the city model of a fictional Los Angeles in Grand Theft Auto V is so well-done that it can be reused to render city revitalization plans, or simulations of sea levels rising.
The viewpoint is often the same: society is seen from above, the perspective is isometric, and people are visible at every point of their movement. One can "see" everything. What does it mean and do to primarily sketch society from above? For one, this sky-down perspective gives a feeling of total mastery and control. One might begin to feel they can see what is best for everyone else. That they have the right to make interventions, to shape the world.
As game engines and their software have developed over the last fifteen years, the technical and aesthetic resources to model bodies realistically have evolved. In the Unreal game engine, and versions of Unity, visual precision is possible to the point of the hyperreal; viewers can barely tell between a modeled person and a real one. Skin textures, tone maps, digital editing help create virtual avatars online, like Lil' Miquela, who bring in revenue for their creators by looking real enough. While a deep dive over what "true representation" in digital media might be is past the scope of this piece, let us note that the ability to model not only any kind of body, but to a degree of uncanny perfection, fused with our visual culture's hyper-valuation of that precision, can obscure a great deal.
That modeling capacity, to have no limit on represented diversity, is a subtle and seductive trick. This world can have wildly diverse-looking, designed bodies, with the maker choosing from skins swatches and hair textures and body templates, creating a whole brilliant palette of ages, ethnicities, genders, orientations, weights, and heights. Think of the loading screen of nearly any AAA social simulation game. All the possible 3D models represent difference in their presentation, but what that difference actually means—how it activates, how race affects access to opportunity and resources, how these wildly different "ethnic packs" influence social dynamics—are not factored into the game's procedural framework. Each body model has fluid mechanics, normative movements, and they slot easily into the game's own guiding mechanics. They eat, they walk, they speak and understand each other; they move from home, to work, to play, to work. We might consider what imbibing this perfect representation of difference without activation of what that difference means as it is lived in the world, does to our understanding of the "real world" when we head back out to it. Clue: Gamergate.
But even before such rendering was possible in games there was an embedded, fixed perspective. Critic Jenn Frank writes about this perspective as it manifests in Diablo III, a dungeon crawler that has used the isometric view in all its titles, in which the player manipulates and guides characters through a dangerous world, from above. There's an important illusion at work. From this top-down vantage, every "sprite's location is readily apparent," and "every command of the cursor is the equivalent of stage direction." This perspective suggests a "space/time objectivity that is almost Godlike," a kind of "geographic omniscience," she writes, pointing out how she can see through walls.12 "But isometry is almost always distorted. Every angle and dimension is subtly tweaked: in lending you, the player, a Godlike visual objectivity, isometry has to lie to you."13
In this, we have the illusion of mastery over our avatars—an illusion effected through the perspective. She describes the figures moving below strike as "tiny miniatures," within a "toy theater," a small, crude simulation. There is great possibility allowed by this difference in scale; as a player we can see all the action, and "more easily estimate where [our] avatars are in time and space," which affords a comfort.14 Frank notes that she then senses a shift in empathy, between care and indifference; she watches the main sprites talk, debate, build little relationships. But her perspective makes them insignificant. Frank finds their attempts at meaningful lives incredibly funny, as "even the largest, most intimidating baddies are simply pygmy figures on a small stage." Their movement becomes "charming" and "twee." She concludes, "If there actually is a Christian God, I imagine this is how He might feel when He peers in at us," Frank writes. This is a position of "objective complacency," of "terrible ambivalence," in which one can imagine oneself as much "a god of destruction as of grace."
This "toy theater" aspect of simulating the social field is deeply embedded in engineering culture. The level of computation that's described in this essay—driving AI—hinges more on simulation than just modeling, on the activation and processing of gathered data from the world to create a prototype of how a system will unfold in the future. As Aimee Roundtree notes in her excellent book on simulations and rhetorical imagination, even though "simulation" and "model" are often used interchangeably, simulations are more than just models. The model—as of the body described above—has static features, but the simulation puts the model to work, running it through different hypothetical scenarios, with different driving variables and conditions. They apply and capture a model's behaviors in motion. Further, the simulation has the appearance of evidence.15
Simulations are the real-world activations of data to calculate and predict future action and movement: how a star will explode, how a hurricane will move. They are both literal products of equations that describe what actions can happen inside of a virtual world, and potent metaphors for future-casting. We simulate when we imagine ourselves and others in the future, and we base our current actions on that mental simulation.
Simulations have extremely useful qualities which also make them very difficult to "critique" in the way we might critique a camera or a piece of software: they are not static. They have no end state. And they aren't solely dependent on mathematical principles, but on a speculative theory about how people or things work or behave or think, which is applied to data. They produce a body of virtual knowledge, as Roundtree outlines in depth, that is both unreal and treated as real, as during the simulation of Hurricane Katrina.16 Their importance to governmental and social policy makes them have—as with so much technology—the appearance of truth.17 That appearance can be a matter of life or death when simulations produce a prediction of reality to come. Once the simulation has the effective appearance of truth, then people tend to assume it is truth.18 Through that assumption, they begin to base a belief system on a "false" or approximate premise; seeing has, through this theoretical science, becomes equivalent to believing.
What parameters, then, define what's seen inside of such simulations—how people are seen, is also a matter of life and death, or just quality of life. Roundtree emphasizes the hybridity of the process, to create a predictive virtual reality, programmers and scientists draw on software as well as their own "ad hoc reasoning" which gives false credence to the creator's assumptions and very recognizable, human parameters like beauty, or what's "natural."19 Further, the event that the simulation is meant to "represent hasn't happened yet; it represents events from the very distant past or a remote location, both of which preclude direction observation."20 The field of simulation science is so new that many errors can enter—often in the parameters of the simulation, embedded in singular perspectives.
According to Fred Turner, the seminal expert on Silicon Valley's founding and ethos, this perspective might be traced back to a fundamentally American, specifically Puritan, worldview. This worldview rotates around the fantasy of the restart, in which we re-establish society in the West, conquer the wild, and start over, leaving all difficulty behind. Earlier this year, I spoke with Turner on Silicon Valley's disavowal of politics, and his framing assessment of the Puritan ideology which underpins code (that generates the "machine logic" of seeing, naming, and knowing, that is in turn shaping our lives).21 He was thrillingly clear in tracing this fantasy's history:
We're supposed to be the country that left Europe. We're supposed to be the country that left the known. Why did we leave the known? Well, so we could become the unknown, the people without history, the people without a past. When you leave history behind, the realm that you enter is not the realm of nothingness. [And in American culture, this is] the realm of divine oversight. When the Pilgrims came to Massachusetts, they left the old world behind so as to be more visible to God. The landscape of New England would be an open stage and they would, under the eye of God, discover whether they were, in fact, the elect: chosen to go to Heaven after they died.22
The desire for the world to be a blank slate, to restart society from zero, to inscribe individualist striving upon a fresh page, was then married to a core belief in being God's chosen. Turner describes this generative crucible of a national ideology that would drive a powerful Protestant work ethic—and westward imperialism, to fly the banner of Manifest Destiny. A new outpost, a new colony, pushing the frontier ever further. Turner finds direct parallels between this impulse and that of technological and engineering frameworks:
No technologists today would say they're a Puritan, but that's a pattern that we still see. We see people sort of leaving behind the known world of everyday life, bodies, and all the messiness that we have with bodies of race and politics, all the troubles that we have in society, to enter a kind of ethereal realm of engineering achievement, in which they will be rewarded as the Puritans were once rewarded, if they were elect, by wealth.23
There remains a spiritual reward, as Turner goes on, for the "Puritans believed that if God loved you enough to plan to take you to heaven in the end, he wasn't going to leave you to suffer on this Earth before you came to him. Instead he would tend to make you wealthy. Puritans came to see that as a great reward. Puritans, and broad Protestant logic, deems that God rewards those whom he loves on Earth as in Heaven." American history, especially that of westward, frontier-seeking, embodies this, and "you can [still] see that in the West a lot now. Folks who leave behind the social world of politics and are rewarded with money are, in fact, living out a deep, New England, Puritan dream."24
Though some have pointed out that the Puritans had justifiable reasons to hold tenaciously to this idea (as they were suffering plague, deadly winters, general decimation), the type of seeing it required was violent. From the city on the hill, the early settlers looked down at the wilderness, planning to map out a new civilization. Their vision was to simulate the world in their own enhanced image, filled with better versions of themselves, perhaps, populated with easy-moving models with little complexity or strangeness.
What is the issue with engineering the world in this fashion? Looking down from the hill with only people of your own kind? Well, it seems pretty lonely. It's a bit boring to only look at things from your own perspective. It's also dangerous, a recipe for dying out; it's not how growth in social systems really happens. It's fundamentally American, yes, meaning, an embrace of an isolated, stoic hardness in which one needs no one else. Think of Emersonian self-reliance taken to a most punishing extreme. The position demands staying on top of the hill, because coming down might just involve looking at things as they are, at the people imagined as beasts, or half-human, as people.
I See What I Want To
We know what past "restarts" have meant: genocide, imperialist conquest, erasure of native languages, and cultures. Through technological mediation that process of violent erasure is slowed down, distributed, covered and the resulting heterogenous mono-culture, made to look like progress. Everyone is represented and given a voice, violent fascists alongside the progressives. Everyone has the illusion of complete access to unlimited information; how that information is controlled and surveilled and deployed by powerful actors is less important than this connectionalism.
The god's eye view asserts itself through simulations, literally, and ideologically, through most technology we use. When you are the worldbuilder, you can position yourself as neutral, as the origin. In software, this is an amoral, evasive point that can never actually be captured. It vanishes in the gospel of the tool; think of how Access to Tools was the tagline of the Whole Earth Catalog. Further, software feigns neutrality and a remove from politics while effecting social engineering, subtly shifting users from one desire to another over time, influencing opinion, life choices, which is political in every sense.
The very foundational design of Western technology hides its political imperatives by presenting as neutral, without values. Scholars and theorists, Wendy Hui Kyong Chun the lead among them, have long traced and articulated how ideology is embedded in software, and manifests in a technological determinism.25 This ideology seems immovable. For thirty years, immense design and financial resources have been invested in maintaining the technological tool as a neutral, and assumed good. Of course, many of the engineers and designers who develop these systems and interfaces are perfectly aware that they are persuading people to feel and think; the work of captology, or persuasive technology, is its own emerging academic field.
The fallacy of neutrality takes root precisely because of the dynamics at play. Chun, along with Alexander Galloway, have described software as a simulation of ideology, or as a "phenomenon that mimics or dimulates ideology."26 I would further describe software as a simulation of neutrality, a shifting, soothing mask, a layer which is constantly adjusting itself through design, to hide the real ideology of neoliberal, techno-positivist capitalism in which only the most elite Übermensch are best suited to survive.
Having this mask of neutrality and objectivity is essential, given how incredibly tenuous a lot of the knowledge suggested as "fact" embedded in our machines—our tools, interfaces, and simulations—turns out to be. Here is where we can start to see the logic of machinic seeing peek through: an active seeing, driven by a need to name the world by a flattening paradigm, that is effected through technology in a seamless loop, that then produces reality back to us through real-world policies, laws, and behaviors that become institutional and social and cultural narratives. In examining the technics, and the ideological illusion, of this transition from seeing to naming, we can understand a new theory of naming, that is constantly shifting, unwieldy to analyze, resistant to critique, to direct analysis.
Take the purpose of a simulation—say, of a person moving. The goal is to capture the essence of that person moving, not to capture it perfectly, as Roundtree outlines. So the result is a virtual set of evidence, meant to make clear, to "bring before our eyes," as Roundtree writes (drawing comparison with Aristotle), how events might connect, how relationships might unfold.27 They take on the appearance of fact. It is by definition, not observed, not based in the world. What then, is this "proof" and where does it reside?
Consider a famous example of the bumblebee flight simulation, in which the result is virtual evidence. "Abductive reasoning does not require that the premises have logical validity in order to be useful or to reflect the essence of the real thing," Roundtree writes. Even "when missing components vital to the actual object," the resulting simulation is virtually evidence which retains truth value: the simulation of the bumblebee's wings in flight allow it to "fly" in the simulation.28 Statistically speaking, the bumblebee should not be able to fly; it should need bigger wings to give sufficient lift to support its weight. Reality contradicts the prediction. And so in this simulation the observations are totally virtual and the data needn't be conclusive or absolute to lend the simulation meaning.29 Even through this provision, this held-nowhere "evidence" can "explain the thing itself because it does have the virtue of the thing—the worth and workings of it."30
That's all to say, the logic of the most important computational tool we have involves a sleight of hand, shifting us, moving us into taking what we see as the actual, as fact, as the way things are. A belief system has to be established that people trust unequivocally. (For Project Green Light to work, people have to believe in it). Say I interact with you in a simple simulation that treats us both as empty models with the same mechanics and the parameter of "work hard to overcome barriers to have a good life." In the world, we're materially extremely different, from radically different backgrounds of class, education, and trauma. When I move in this simulation, and my path is slower, or off-path, or circuitous, and I don't yet have "a good life" with the associated markers, my movement seems a flaw in my mechanics. That my mechanics are flawed appears as fact; we don't have any political backgrounds (you and I are actually treated differently in the world), or the legacy of personal and intergenerational trauma rooted in a war, or depression, or much else factored in. These aren't accounted for, and so they take on the aspect of complaints and gripes that shouldn't matter, if again, I work hard.
"Technology is not mere tool making and tool use: it is the making of metaphors," Bridle wrote last year, and through its hidden metaphors, often, "a kind of transport or transference is achieved, but at the same time a kind of disassociation, an offloading of a particular thought or way of thinking into a tool, where it no longer needs thinking to activate." The solution, he says, "to think again or anew" is that "we need to re-enchant our tools."31 It would seem "re-enchanting" is the wrong word, unless it means not further mystifying our tools. Instead we need to make them very sensible to us, and make critical thinking essential to "activating" them.
The machine eye as we're growing accustomed to it needs to see people roughly, small, crudely, distantly, from atop the hill, much as a drone does. It seems like mathematical necessity. That we can ask for more rhetorical imagination from contemporary dwellers of the hill city is not really a widespread idea or movement, but it needs to enter the mainstream. Giving the keys to defining reality to a select group of engineering priests is cultural suicide.
When did technology, especially AI, get so boring, when there is bountiful opportunity for better imagination, for more multiplicity and range in creating a machine intelligence that does more than create worldwide bureaucracy by gathering information on us not even for interesting and productive uses, and lock us into debt slavery? What limited imagination this shows, when a simulation could represent "the essence" of complex, rare, and interesting phenomena about different people, towards a higher-dimensional narrative of individuals that actually rounds out the skeins, the texture maps of diversity. If the "evidence" produced is already virtual, as detailed above, then it is by definition subject to constant change.
If our simulations and machines were embedded with the values of an equitable and fair world that eradicated supremacy and xenophobic violence instead of mere efficiency and accumulation, then there would be space to imagine simulating complex social dynamics that would, suddenly, be statistically efficient. That we don't, as a culture, have much access to these discussions is tragic, but we have the power to change the narrative, the explanation. As Roundtree has noted, in theoretical sciences like computer and simulation science, "the ultimate goal isn't truth, so much as explanatory and narrative power. It can be argued that theoretical sciences are making theories for both the long haul and the next logical step toward better understanding."32 If machines are in the business of deploying and activating theory, change the theory.
My position here is constructivist; I see all these machines and simulations and technology, however bizarre and alien to our sensibility, as first, always, shaped by human experiences, desires, and decision-making. And the critical and philosophical challenge for anyone interested in technology, or affected by it, is learning to read machines, and the images they produce, of the 'reality' of things, of people, of society, with a flexible, but rigorous set of theoretical tools. Understanding the relational, virtual nature of simulated evidence (which we learn to see as fact) is a first step in that toolset. The second is knowing the process of naming, where the crude, weird "bad logic" I began this essay describing, steps in.
We must stop with understanding the machine's seeing as anything like human seeing. This comparison is a fallacy, but it is also the effect of design. The confusion further obscures what is actually happening when we share images, uploading them online. A machine learning system naming the world operates differently than we do. It mines an image, sorts its contents, then matches them with types it has learned.
We also might move on from expecting bias to be eradicated totally from tools, as though there will be such a thing as a machine that shows no mark of a maker. There will be bias, but a collaborative, collectively decided upon "bias" (meaning, values, positions, and choices in naming) in our tools might be preferable to one that we had no part as citizens at all. "Neutrality" is frequently discussed in relation to machine learning and algorithmic bias in a great deal of literature, investigative journalism, and conference talks; the revelation of ideology in our precious tools is usually presented as a shock. As a culture we have been trained, further, to expect machines to not just see well, but also to not have bias, to purify the the oppressive views of their makers through pure math. There are competing histories around technology's origin—some thinkers like artist Jesse Darling point out that we have been using technology forever, from the condom to the weaving loom to the bicycle.33 If we start in at the industrial revolution, humans have been grappling with their relationship to machines and the machine's simultaneous separation from and expression of human need and desire. It can be argued that machines have always been "biased," the way anything made from our hand will carry the maker's mark. And when machinic tools moved from physical engineering to social engineering, from production of material to production of images and ideas, from workhorse machines to vision-machines, they became powerful ideological containers.
In "Invisible Images," an urgent essay on how machines see and how our images are "looking at us," Trevor Paglen writes that "machine-machine systems are extraordinary intimate instruments of power that operate through an aesthetics and ideology of objectivity, but the categories they employ are designed to reify the forms of power that those systems are set up to serve. As such, the machine-machine landscape forms a kind of hyper-ideology that is especially pernicious precisely because it makes claims to objectivity and equality."34 Whether hyper-ideology, or a simulation of an ideology of objectivity, the effect of these systems is erasure, violence.
For instance, the city upon a hill covenant that John Winthrop delivered to his Puritan followers promised prosperity in exchange for commitment to God, and a creation of a commonwealth that would signal to Europe a new kingdom. In that kingdom's map, a misconstrued one based on supremacy and colonial-imperialist genocide as an effective tool, the Puritans misnamed Native Americans as "Indians," and further misnamed them as savages, as less than human, as wild threats. This misnaming justified breaking treaties over several hundred years, massacre, and total decimation of the "Indians." Naming Native Americans as we know should have been done, in an ethical and restorative way, as owners of this land, as stewards, as the holders of a nation's trauma, is a first step in reparative relating.
What Paglen is crucially pointing to is that in machine-machine systems, the claim to objectivity makes a similar lossy, erasing, violent, stupid, shallow misnaming of people harder to even see. What's taking place may be comparable to what the Puritans did, and according to their map, their categories of typing and naming were objective and true. But with time, with cultural studies, with historical restoration, with scholarship, with national reckoning with past crimes and complicity, those predispositions of the eighteenth century can be well-questioned.
The machine-machine system's goal of efficiency, its seeing apparatus, driven by engineering's neutralizing mode, is so widely accepted and understood as our driving map that without some intervention it would take another four hundred years to undo its naming. It often, wittingly or not, reaffirms colonialist tendencies. The makers are homogenous, frequently libertarian: that friend who says they don't "see color" and "treat everyone the same." The "problems" of difficulty, of messy, "troublesome" aspects like gender or race or disability, qualities too hard to parse mathematically, all the unseen, immaterial phenomena that make a person all person-y, are factored out. Difficult people are then treated like bugs, glitches, like poor, bad runs. There is of course a long-standing social imperative to get rid of "troublesome" aspects as a matter of purity and normativity. Or, more confusingly, as described earlier, differences are represented but are treated categorically as all the same. The categories become modular add-on features while offline, we're robbed of communities in which to challenge systemic power differentials.
Numerous tech entrepreneurs have publicly called for solutions to SF's homelessness problem; the former mayor recently suggested putting all homeless on a Navy vessel.35 Another tech startup founder's suggestion that homeless be put on a cruise ship seemed to be floated briefly.36 A software developer and entrepreneur came under fire recently for writing to the mayor that he shouldn't "have to see the pain, struggle, and despair of homeless people" when he and others went and "got an education, worked hard."37 What allows one person to become an elite programmer or developer, and another to fall prey to addiction, seemed not to be of interest; that the social crisis is caused in part by kickbacks to tech companies that have divested funds from public services and exacerbated homelessness, is not of interest. Demanding public responsibility for human needs is not a parameter to include in the city's map. Put them on a ship.
And last April, in Irvine, conservative Asian-Americans gathered in droves to protest a homeless shelter, citing much the same fears of their place on the grid being tainted. "They need to put them somewhere, maybe somewhere else in California," one resident said, adding, "I really don't know where they can go. But Irvine is beautiful, and we don't want it to get destroyed."38 They came out in unprecedented numbers to protect their families from undesirables who were, in the campaign's language, "the way they are" because of their own personal failings. The homeless were not there because of generational poverty, or mental illness, or lack of economic opportunity, or past incarceration, or an already abusive attitude society has towards the homeless. They were in tents because they didn't deserve homes.
When you act like a God and build a world that doesn't take account of differences, but rather tries to, as Fred Turner described, "neutralize them in a single process, or a single code system, or under a single ethical rubric, what you end up doing is erasing precisely the kinds of differences that need to be negotiated."39 We are communicating on supposedly neutral interfaces in neutral worlds while feeling the physical, social, and psychological effects of oppression on our minds and bodies on wildly varying scales in the world. In the "neutral" and "benevolent" space we will of course encounter conflict after conflict over our identity, our positions, robbed of the ability to negotiate flux, and change and distribution of resources across differences.
Most distressingly common, we are typed, and the type is used to make predictive models about who we will be through time. That our qualities might change, that our preference for a song one day does not stretch out to define us over time, is not accounted for. Digital shadowing captures our preferences and then takes this to be a statement on who we are, everlastingly. That taste might reflect a phase, or a subject of research, is left out. Say you are in your twenties, and you like a Norwegian death metal band you started to listen to at 14, but you never came across the band's anti-Semitic streak or the story about its drummer burning a church. You didn't start listening to a band with nearly incomprehensible lyrics because you were a fascist; you were lazy and trying to be metal to your friends. Say, then, an algorithm based on data of demographics by past musical preferences is run through a platform. Suddenly, you are categorized in the same listening group as actual Nazis in rural Oregon. You're indifferentiable from them.
How to factor in change for each individual, the probability of our desires changing over time? What formula or algorithm will allow for these essential rewrites, making space for unstable, dynamic, and fluctuating negotiation that mimics how we make decisions? As Bridle writes, such "computation projects a future that is like the past, which makes it incapable of dealing with the reality of the present, which is never stable."40
Platform politics have been profoundly effective in giving space to marginalized groups to communicate and organize. They are also destructive, in that the demand for legibility increases vulnerability for groups already subject to appropriation, extraction, marginalization. And the political possibility on neoliberal platforms is limited because of their colonialist heritage.41 In engineering, there is an "explicit ethical choice, inside all parts of the field, to leave politics aside," as Turner writes.42
There is another outcome; in thinking and reflecting on how machines see and name, we gain insight on how we too, name the world, whether we are living and relating to others intelligently or stupidly through this naming. We can hold a mirror to our own seeing as it organizes power. Are we on the hill, too? Are we comfortable there? Do we ever go down into the wild? Are we comfortable at a remove, at not seeing difference up close? Do we simulate critically, in imagining others' life possibilities, their minds, who we think they are?
In traditional visual culture and art criticism, the role of aesthetics and ethics and moral debate served such a field of play. Reason was used to explore the world, through direct sensibility, but there is a philosophical flexibility. In rhetoric, how we construct knowledge of the world through naming objects—describing their qualities, creating narrative, generating metaphor—is an act that is linguistic, and moral, and aesthetic all at once, a matter both of belief and science. In machine-machine visual culture, the flexibility hardens as the map is misconstrued; openness to change and evolution in perspective and position is devalued when short-term needs are the goal.
We recall Roundtree's thesis that simulations, representing "the virtue (or essence) of an event," means they are "best understood as having relational meaning," which means "to understand them, we must make conspicuous the … conditions that determine their contextual value."43
Further, we need more of the very ad hoc, abductive reasoning embedded directly in computation, an ongoing process "of making a transparent relationship between evidence, decision making, and consensus building."44 We can practice abductive reasoning, living in relation, in which any kind of hard science or computation is always interpreted through a diverse set of social relationships, perspectives, and values..
Scaling Virtual Evidence
What happens when we see an image? Our brain processes light bouncing off of the screen or photograph. If I look at a picture of my friend whose face has slight pink flush, I try to figure if they are sick. My eye seeks out temperament, orientation, position. Their expression seems to be off a degree. I conclude they are in fact embarrassed. In trying to construct a meaning, I draw on the long arc of other "face images" I've gathered in memory, along with my senses. Even if unrelated, I may draw on other contextual frames—how I feel about the friend (perhaps I have seen them embarrassed before) and how I feel about myself (I feel unsure of myself, and project this onto others). I might have a sudden flashback of an experience that gives brilliant perspective on who she is. I combine all this situational context (instantly) with what I already know about living the world through experience and study. Sorting through this detailed matrix of possible influences, I create a meaning for the image, in my mind.
What happens when a processing machine sees an image? When it creates an image based on a million photographs, and their attendant sets of information, metadata, tags? What memory is it creating out of that imagining of a new photo? Based on that memory, what will it then best recall in the future? What will it not be able to recall at all? It first discerns shapes, layers, and positions of objects in relation to each other. The image produces a mathematical abstraction whose "qualities," as Paglen notes, "are guided by the kind of metadata the algorithm is trying to read."45 Machine vision engineers create algorithms to discern the most interesting elements in photos, and create categories of features from pixels. Software deployed after creates patterns out of the discerned features. The digital or machine learning process searches back through this "database" of categories, to match what it sees, and then name it.
So the image's qualities have to be set ahead of time so the computer even knows what to look for in its localized storage. In contrast, in the brain, our image "data" is not stored in any specific place in the brain, and it is retained with a lot of loss. The brain's image is lossy in the sense of moving from high to low resolution. In contrast, digital photos do not fade from your hard drive; they do not become fuzzy and lower-resolution over time.
For contrast, take how I see a self-driving car and how it sees me. When I see a self-driving car I make an abstract inference, and say "Investing in automated technology is a high priority in this place we are in now, but I know the risk and this is likely not yet safe." My action is to step out of the way of the self-driving car, not because it is a car, but because it is self-driving. These cars are still prone to hitting people and not stopping, unable to pause the way a person would, possibly causing less harm. My perception and circumspection are mediated through language, through personal memory, through reason, through my self-preservation instinct, through my reading of automated vehicle test-drive crash reports. In turn the self-driving car, "sees" me through its sensors as a moving thing that is possibly human, possibly not; if I am too still, or stand in shadow, it might not recognize my personhood at all. Perhaps I'm fortunate enough to be shaped like a mailbox, which happens to be the easiest and most recognizable shape for this car's ML system to avoid.
What theoretical concepts do we have for the scale of machine-machine visual culture, which ratchets and scaffolds up along with increasing computational complexity? How do we analyze active, alive systems that look at images for entirely different ontologies of information? How do we critique the knowledge produced? Much as there is space for rhetorical imagination in critiquing simulations, there must be space for human intervention in machine visual culture. Artificial "seeing" is a process of creating more virtual evidence, analyzing and drawing from databases that match features to speculate on class, gender, race, economic status, and feeling, habits, and inclinations. From this virtual evidence, psychographic profiles are rendered, swinging elections and galvanizing crowds into a frenzy over non-news.
Though we haven't spoken of robots in this essay, they are a useful touchstone as the first research base for machine vision, building literal eyes to "see." With advancements in the field, roboticists learned to create cognitive architectures to organize the behaviors of robots, intelligent systems that can infer the intention of a person from their actions and movement.46
These architectures traffic in symbolic systems, roughly mimicking our own, to create a "behavior base" from which to learn more. But a robot's cognitive architecture, broken down into its component processes, is based on algorithms, which extract information from scenes, make a determination statement of what reality is, infer a "meaning," a description of the objects in relation to it and to each other, to then make a decision.
Seeing to cognate, to act: robots sure are made to seem comparable to us. It is unsurprising that after thousands of years of making and producing images through painting and drawing, and most recently, photography, we would use these mediums' critical language to describe our machinic visual culture. As Paglen notes, the "theoretical concepts we use to analyze classical visual culture are robust: representation, meaning, spectacle, semiosis, memesis," but we cannot use these terms as easily for machine-produced images, especially as the robot apparatus disappears and the seeing is beyond our eyes and senses.47 As this culture has become "detached from human eyes," and "largely invisible," human vision is now "an exception to the rule," in a world in which "the overwhelming majority of images are now made by machines for other machines, with humans rarely in the loop."48 So when my phone "recognizes" my friends, it is important to remember it is only recognizing images of them, matching the patterns of their faces with other face patterns it has already processed.
Then consider scale: today's powerful AI is trained on hundreds of millions of images that have to be sorted, tagged, described. It took 50,000 Mechanical Turk workers to build the ImageNet dataset of 14,000,000 images out of a pool of a billion.49 Machine learning demands human intelligence tag images and moderate content. About half a million people distributed around the world are paid to tag these images, and train the intelligent machines we use, the computer vision we need. They analyze sentiments, tag metadata, categorize, perform character recognition.
On platforms, we share billions of images each day, creating the datasets that feed ravenous AI. Two billion photographs a day are uploaded to Instagram and Facebook. A half billion people back up and manage their archives through Google Photos. Through the work of machine learning, we find our pictures sorted by places and things and people and combinations of categories: Detroit, June 2018. Providence, September 2018. As I edit this essay, I get a notification: New Movie: Check out 'Smiles of 2018.' I watch a movie of my selfies taken in Austin, Texas, Brooklyn, in Detroit, in Miami, on the Amtrak, outside my show, in a car with my colleagues, on New Year's Eve in Chicago, set to a bizarre upbeat jingle. I'm not bothered by it; I am interested to see what kinds of smiles I made over a year, because it's a strange aesthetic memory category I wouldn't think of organizing my photos through. My smartphone analyzes and parses my photographs to produce symbolic group abstractions that might be interesting to me, but more to form a kind of meaning for the AI—which learns to name the images to form a "sense" of what meanings click with people—and to produce a big database of knowledge. They identify patterns—some meaningful to us, some not—in endless combinations. Smiles in New York, laughing and crying in Los Angeles, and ambivalence in San Francisco. Our perspective and reading of our archived selves begins to shift through this elective seeing of our own images. I see myself through the happiest memories of the last year, and, on the most base emotional level, I do feel happy that Google Photos made me feel happy about my life. I share the video with my partner and we feel happy together.
These AI do not process and interpret in a vacuum; they still work, however strange their seeing, within the context of larger histories of image interpretation. And so their missteps and misreading are not free from criticism. I can understand that like affect, experience, and chance, "Smiles of 2018" is another powerful vector of influence in my naming of my own experience, in my creating a narrative for myself through my images.
A worker analyzing tweets on Amazon's Mechanical Turk systems can be asked, "Is this tweet happy, angry, excited, scared, annoyed, or upset?"50 This is a complex linguistic sentiment, describing a tweet as having a tone on its own, though boiled down to six feelings. "Does this person look trustworthy?" This question appears beneath an image of a man who wears a terrible Hawaiian shirt, an '80s flat top, a possible rat-tail.51 He is smirking. I click "no." He reminds me of a lot of untrustworthy people. How this data point will affect other men in Hawaiian shirts and high tops, I am unsure. I am also not sure that I care too terribly, because the crime of wearing a Hawaiian shirt and not knowing you look shady seems, to my bias, self-evident. This may be innocuous enough, but I then imagine the "untrustworthy" rubric easily extracted to thousands of people I've never met. I imagine a trustworthiness index placed on South Asian or Middle Eastern friends, such that people who wouldn't call them untrustworthy to their face can mark so anonymously, without fear of judgment.
AI's need for supervised learning is insatiable. People have to complete the job of cleaning up and normalizing the datasets. The variety of lighting, positions, and settings is so extreme within a dataset that, as Dr. Sarvapali Ramchurn, a professor of artificial intelligence at University of Southampton, notes , "even after classifying 50 million pictures, only very few items will be accurately classified in all possible contexts."52 Though there are datasets trained carefully on more sophisticated measures by experts to read context, to frame, to parse linguistic and semantic content, for the most part, the massive dataset tags for machine vision are crowdsourced rapidly and broadly. The demands of speed and complexity and cost, combined with a desperate need for neural networks to identify structures and patterns (make sense of) within millions of datasets, make for a classification infrastructure that's both very raw and evolving, and quickly made intractable, very difficult to change, because of scale and compounding effects.
Unreadability and Being Read
How do we make sense of reading images that aren't even meant to be read by us? If a machine is reading a machine-produced image, what theoretical concepts can we use to describe what is being represented? What critical visual terms can we use to describe the algorithmically-generated image? As AI's evolution moves from supervised to unsupervised learning, the process of naming is becoming less sensible and intentionally less readable to people. It is hard to know what one is looking at, let alone subjecting it to loving and rigorous critique. How do we describe seeing that reads much of the digital evidence of our lives? How do we even critique an eye that can "recall the faces of billions of people," as Paglen points out?53 (He was then discussing Facebook's DeepFace, which in the ancient days of 2014 had an accuracy of "97.35% on the Labeled Faces in the Wild dataset," meaning it "closely approache[d] human-level performance.")54
The range of image datasets that AI now can train on is dizzying: all the world's plants, cars, faces, dogs, colors. In a famous machine learning training set, where networks once struggled to discern a fox from the field behind it, the same fox can now be separated and described by its age, weight, and species. The best machine learning system can tell what time of day it was in the field, describe its markings, and tell us what other companions are hiding in the field behind it. Neural network papers give a sense of the many painstaking iterations needed to refine a vision system. Each year, the ImageNet Large Scale Visual Recognition Challenge asks competitors to train a neural network to try and identify objects within an image—like separating foxes from a grassy knoll. Each year these competing models classify images into 1000 different typologies with more precision.55
The rubric for evaluating these images as "successful" is precision. Is the image high resolution and easily readable? Does it "sharply represent" what we see? The other is the level of accuracy of tagging, naming what is there in direct, clear terms as possible. The result of all this computational power is a very basic level of clarity: the big man is on a field, the fox is in a field under the sun. The amount of complexity it takes to get here is staggering, and there is something elegant in the process, as scholar Peli Grietzer captures in depth, revealing how we also once learned the field-ness of a field, the triangular-ness of triangular objects, the fox-ness of fox-like creatures.56 The process necessitates that images are boiled down to receptacles of assorted qualities that are isolated and determined to be significant. So vast and global is this effort that the computational production of this named reality appears as a truth.
If anyone can technically train a neural network, who gets to train the ones that organize our lives? Machine learning skips the jerky sorting and matching process that earlier vision recognition systems (from eight to ten years ago) undertook. It is a system that learns as we do, modeled after the structure of animal brains, in which neurons are layered. A machine learning system creates its own algorithms, rewriting them to more accurately identify patterns, as it learns from seeing the environment. It distributes this learning along a network of other machine nodes, each learning and competing.
We may look at images with our eyes, but our lives are shaped by a different kind of partial, broken seeing that posits accuracy, that is made continuously through relational, active, and emerging algorithms. In much of the popular literature on neural networks, they are posited as dreaming, or as imagining images. But we don't solely "dream up" images in our mind from some thick, gooey subconscious—and neither do these networks. We actively generate images through our biases, our memories and histories, our styles of narrative, our traumas. And just as training sets also "reveal the historical, geographical, racial, and socio-economic positions of their trainers," so do neural networks, seeing from the hilltop over the entire known world.57
Artists are tackling the gaps with humor. In Us, Aggregated (2017) artist Mimi Onuoha points out the absurdities in many of a search engine's classifications by working backwards.58 She asks, "who has the agency to define who 'we' is?" She uses her personal family archives, and runs them through Google's reverse-image search algorithms, and then frames the resulting photographs according to their labels. In Us, Aggregated 2.0, (2018) she frames the many diverse intimate photos that have been tagged with the basic label "girl."59 In Machine Readable Hito (2017) Paglen worked with artist Hito Steyerl to make legible machine learning processes marking character and gender and personality.60 They performed facial analysis of Steyerl's many facial expressions. In many where she is frowning, she is labeled as a man; in neutral or confused expressions, she is some percentage of female . The projects suggest how the standard of a good, right face, can reify extant politics of visibility, and suggest what the system sees as the norm for gender, the norm for emotional expression.
"Should we teach facial recognition technology about race?" reads a recent Wired headline.61 Every few months, a comparable strawman headline agonizes over how tenable a partial model of the world can be. Even in our most advanced technologies, the dumb fantasy of a world without race or difference or weird outliers persists. And the results are dumb and dumber: pictures of stoves with men at them are still labeled as "women." More and more, the values—or willing blindness—placed into machine-learning technologies exacerbate its shortcomings. Software is trained to categorize at scale "to a high level of accuracy." Note how that phrase, a high level of accuracy, becomes its own justification, despite the very best algorithms lacking the ability to use common sense, to form abstract concepts, or refine their interpretation of the world.
There are countless examples of flawed programmatic bias embedded in fallaciously-named "neutral" imaging processes. The most infamous might be Google's 2015 "gorilla" PR disaster, in which photos of African-American employees and friends of Google employees were labeled as gorillas. Google responded by erasing the word "gorilla" entirely from the library, such that its evolving image-recognition system, integrated increasingly across platforms, would not embarrass the corporation again.62 The underlying issue was simple: the training sets constituted mostly white faces, as they were built by mostly white engineers.
We interpret images poorly or well in part because of political or cultural imperatives that are either open or closed. Visual recognition systems reinforce the violence of typing according to the same imperatives. There is a clear technological imperative to ignore through partial seeing, to support a social narrative, and a culture war. Every decision to name images becomes a profound ethical issue. While some engineers prefer a political agonism and that their codes be thought of as written in isolation from the outside world, their social impacts are too profound. The eye cannot just dispense its choices and float on.
Machine-learning engineers and designers deploying their vision systems must account for their blind spots instead of gesturing at the machine, offloading responsibility. That "we all bleed red," that "we're all members of the human race," that one feels they can be "blind to race and gender," should be called what they are: simulations of supremacy, in which everyone loses.
It's time to ask whether feel-good, individualist techno-libertarian sentiments that allow the eye to shut off to the effect of its own seeing, serve us as a culture. We must make a practice of actively naming the flaws embedded in bad seeing. We take seemingly innocuous computational interpretations of photographs and digital images to be political and ethical acts. There need to be collaborative paths to a machinic naming that restores dignity and complexity of the imaged and imagined, with encoded sensitivity to context and historical bias, and an understanding of traditionally bad readings.
In this massive machine symbolic system we must still try to read intelligently. The great literary critic N. Katherine Hayles calls for us to carefully consider nonvisual aspects along with the visual when examining how networked machines see. Hayles's penchant for a "medium specific criticism," as Wendy Chun interprets it, means that we need to understand how a machine reads to critique it.63 We see how technological design flattens our identities even as it gives the illusion of perfect self-expression; we have looked at the strange categorization and typing of ourselves along parameters of affect and trustworthiness. It is not a surprise that technology created through centralized power has watered a past promise down. What we have is a banal, distributed corporate information collection service running under the banner of intellectual inquiry. Its tendrils gather up our strong and weak desires to freeze us as consumers forever, progressive or not, Nazi or not.
Paul Christiano of Microsoft's OpenAI, one of the most distinguished thinkers on the future possibilities of artificial intelligence, has written recently that the question of "which AI is a good successor" is one "of highest impact in moral philosophy right now."64 Christiano does not shy away from what machines see, embracing their foreignness to our desires and needs, and their evolution into cognitive systems we understand less and less.
Companies will not open their black boxes any time soon, though ethicists, journalists, and activists vigorously advocate and shape the creation and deployment of AI towards more just and open frameworks, demanding accountability and transparency. Even if the black box stays closed, we do not need to willingly stay blind. We hold the responsibility of understanding an underlying ideology of a system that interprets images, and to fully grasp why it needs to pretend to be objective in order to function as a system.
The machine-machine seeing described in this essay demands we draw on all the critical faculties of seeing we have developed through history and have at our disposal, while also acknowledging the crucial lacks in our critical visual language.
On one hand, we must stay alert to automation bias, in which we begin to value information produced by machines over ambiguous human observation. If the world begins to affirm the vision of the simulation, faith in the machine eye overrides all. But we need ambiguous observation, doubts, backtracking, and revision. These are qualities of careful thinking, to not make a set conclusion without revisiting assumptions.
I suggest we practice asking the same questions we might in critically evaluating art:
Is what I'm seeing justifiably named this way?
What frame has it been given?
Who decided on this frame?
What reasons do they have to frame it this way?
Is their frame valid, and why?
What assumptions about this subject are they relying upon?
What interest does this naming serve?
This is one step towards intelligent naming. This is where we might best intervene, to shift predominant attitudes and perspectives that shape virtual evidence and generate machine-machine knowledge. For truly nuanced naming of images of people, places, and things, we must practice breaking the loop, to consider and describe the likely frame and ideology being effected. Looking at dozens of personal family photos labeled "girl," can we articulate everything that is lost in that tag? What happens if we do not give the narrative? Can this break for rhetorical imagination, consideration, and reevaluation be built into the machine learning process? For now, these systems are obsessed, understandably, with the empirical, but once the world is named, how will these systems evolve, as we have had to in the world?
If I see an image of a mugshot of a man of color online, and the tags "arrests," "larson," and "battery," I should take pause. Am I on looking at a government site of images in arrest records? Is the image floating freely in a spam ad, the kind that populates less reputable sites, paired with a CLICK HERE TO SEE CRIME IN YOUR AREA, unmoored from context and narrative? Does the man look like an immigrant, like someone in my own family? Am I looking at an alt-right site filled with rabid xenophobic news on the border caravan and who is supposedly coming to get "us" up in remote, landlocked towns? How am I seeing this image? What thread did I follow to get here? How long do I linger on this image before moving on, and what did that lack of careful looking produce in my mind? What bias of my own was affirmed, and what was instantly dissonant? Could I resist the urge to click on easily, or did it feel hard?
When I have misread a representation—meaning, when I have hastily made a narrative about an image, a person, their presentation—I recognize that a mismatch has occurred, between reality and my false virtual evidence. I had instantly decided that specific visual cues mean something certain or likely true about the internal life of a person, about their possibility, though I know how foolish that is in practice—and how painful it is to experience. In the world, we do this constantly, in hurtful and unjust—but ultimately revisable—ways. If I walk into a job interview disheveled with holes in my clothes, the interviewer might assume I both didn't care about the job, and that am in some kind of distress. They may immediately assess me as not employable, no matter how fit I am for the job. I'm not fit for the mental work with holes in my clothes—this is a quick, dashed off-decision that we make an allowance for through a social understanding in which people who want jobs will dress the part.
Can we build machine vision to be critical of itself? Even as we learn to see alongside the machine, and understand its training sets, its classifications, its gestures, these must be more intervention points, in which corrections, adjustments, and refinements accounting for history, for context, for good reading of images, is made. There may be a fusion of the sensitivities and criticality we use for human visual image interpretation with the language specific to machine vision. Machine learning can be improved to be fair, checks made rigorously for statistical parity to check what groups or races are being classified incorrectly by the algorithmic eye.
But Paglen isn't convinced. "It's not just as simple as learning a different vocabulary," he notes. "Formal concepts contain epistemological assumptions, which in turn have ethical consequences. The theoretical concepts we use to analyze visual culture are profoundly misleading when applied to the machinic landscape, producing distortions, vast blind spots, and wild misinterpretations."65 To counter, some suggest that what we need is better-tagged training sets of images, more accurate ones "without bias," so we will be seen perfectly, and we will then be treated well.
The gesture to enforce "algorithmic violence," as Mimi Onuoha has written, is perhaps the most terrifying example of what we're up against.66 An AI paper from two years ago suggests that we could figure out who is a criminal based on their cheekbone height, eye size, and general facial structure. In other words, a criminal could be predicted, determined by a "type" of face—where eye size, nose structure, and other elements in a data set of convicted criminals are extrapolated to form a model for what a criminal type is—in effect, a self-enforcing loop in which the biases and limits of the dataset are not accounted for.
It seems a total fallacy that a computer vision algorithm would have no subjective weight or baggage. Even though we understand this claim is impossible, it remains the most prevalent idea in technological development. A neural network, as magical and strange as it can seem, is always produced by biases, desires, interests, bad readings, creators, and engineers with no regard for society who throw up their hands to say, "I only make the thing!" For a neural network to read the image "objectively," it would have to not be made by human hands or run on historical data of any kind.
But the desire for a "perfect" dataset in which people are seen perfectly is misguided; when are we ever seen perfectly? Why can't we demand this machine eye be better than our own occluded, hazy, partial, lazy seeing? Maybe it isn't perfect seeing, but critical seeing that we need. Critical seeing requires constant negotiation. We negotiate incorrect or imprecise naming through revision of our own beliefs. When we see, we take in the "data-points" of an image: color, form, subject, position. We organize the information into a frame that we can understand.
Some of the more doom and gloom accounts of modern AI and vision recognition suggest all is lost; that we are victims of addictive neurobiological targeting tools, slavishly trained to obey a high resolution display. Even as this new visual culture becomes more unwieldy, more insane, the sources of images more impossible to define, the ways they are marked unreachable, we are still supposed to evaluate our own judgments about the truth or reality of an image. In more humanist (and moralistic) veins of theory, seeing is always an ethical act: we have a deep responsibility for understanding how our interpretation of information before us, physical or digital, produces the world.
Without doubt our cognitive capacity is being outstripped, and precisely for that reason, there is no better moment to reassert the value of critical seeing. We have evolved cognitively to be able to negotiate visual meanings, holding them lightly until we have contemplated and thought through the questions above. It is imperative to do so when looking at any image passed through machines. As this is already incredibly hard to do, we might need more flexible frameworks through which to evaluate the construct of machine vision and its suggestion of value and truth. We have to be more critical visual readers, because we are ultimately the bodies and lives being read.
Recall how machine learning can be both supervised and unsupervised. Our own perception and meaning-making is similar to "unsupervised deep learning." We too learn to make patterns out of the "data" of what we see, noting differences and similarities, confluences and comparisons, from one image to the next. In our comparison of images, we create narrative representations, a sense of the world, and a corpus of representations that we carry out in our life. But we also are built to grow in response to resistance, and to the harm we cause. Training sets—which form beliefs—might be subject to this same provisional process, in which the choices of tags, simulation parameters, and mechanics across difference, are subject to revision. A final decision is made after a wider group of ethically minded stakeholders, literary scholars, and social scientists, hypothetically, compare and debate interpretations and frames.
In Benjamin Hale's short story "Don't Worry Baby," a woman, her child, and the child's father leave—possibly escape—an anarchist commune in the '70s.67 The story takes place on the plane ride back to the States. The woman accidentally takes a powerful hallucinogenic slipped into a piece of chocolate by the cultish father of her child. He tells her to just ride it out. As she holds their baby in her lap, she begins to feel her perception softly morph, and shift.
What follows is a viscerally awful sequence, as her synapses flood with the drug: the father's face disintegrates, the forms of other passengers in the claustrophobic, cigarette-smoke filled plane cabin fall away. She hears language as symbols, and sees faces as signs. She feels everything moving inside of her, from the cilia in her gut to how her veins move to help her pass milk into her child. Mid-flight, the child's eyes reveal themselves as dilated. This is a total loss of control: the mother suffers through a hellish, speechless meltdown as she can no longer read her child's face. It is locked far away, "in its own mind," turned completely inward.
The story's drama arises in part from the implied unraveling the utopian order of the commune and its worldview, where each person had a sure role, a sure name, and a position in tightly proscribed bounds of the social order. Plummeting through this psychological horror, the reader feels how tenuous our hold on reality is, how deeply tied it is to facial recognition and cognitive faith, how quickly a sense of safety is lost without it. One screwy, distorted face unpins the fabric. We see how closely allied seeing is to naming and knowing.We get the sense that this unmooring is also an opportunity; a face that is only partly readable can be a challenge for better reading. A better visual reading can expand our sense of possibility. This is of course the power of surreal images, which confound, defamiliarize, shift the frame of what one assumes is true.
Settling in partial comfort with unknowing is endemic to our survival. We actually need to be able to create partial models of the world. Very rarely do we have all of the information of reality around us. The versioning of programming implies that constant revision and rewrites are essential, as in any language. It's unclear whether machine learning as it is being currently designed—at the scale it is seeking—even has space for such "unknowing," for provisional change of the dataset's vigorous naming. It would seem removing criticality is necessary for machine vision.
I return here to Detroit, a city that has been consistently abandoned, abused, and defunded. The most vulnerable who are hovering right at 35% unemployment are of course the demographic most affected by the green light eyes of T.J. Eckleburg over the ruined cityscape. Project Green Light, combined with facial recognition software, combined with license plate reading, means that a person with a suspended license can be arrested while walking into a pharmacy to get cough medicine.
PredPol is a company that sells software that uses a predictive policing algorithm, which is itself based on an earthquake prediction algorithm. To predict crime, the software uses the same statistical modeling used to predict earthquakes, a method that researchers have named as too simple and deeply flawed to be used. The company's data scientist compares crime modeling to "self-excitation points" and posits the forecast is made of "hard data," and is objective and fair, allowing police to offload their decisions to police a red-outlined area to "the machine."68 The software does not take into account the most deeply unethical issues involved in policing: what the police's predispositions to the red zone are, how the police already seek to penalize petty crime more in some neighborhoods than others ("broken windows" policing), how they target and harm people of color more than non-. PredPol masks its data input, which is flawed and deeply biased arrest records. In using supervised machine learning to send police out to the same area, the model is, as Caroline Haskins reports, only predicting how an area will be policed, not how crime will occur.69
All this set aside, the police now can cite that the software's heat map led them to where a crime might occur. The conceit of PredPol is almost beyond comprehension: that we can produce a predictive map of where crime is likely to occur by tracking "human excitation" or excited movement (defined loosely) along city streets. This heat map, combined with facial recognition software that tries to guess at criminal facial structures, opens up a nightmarish realm of possible abuse, where police are now shielded by the "lack of bias" of machine learning. This has been widely argued as an example of technology used to wash away racially oppressive and violent tactics and mass surveillance.70
Earlier this year, PredPol went a step further. They were funded by the military to "automate the classification of gang-related crimes," using an old map of gang territory and previous criminal data, which is well known to be highly biased, anti-black, and in favor of the overstepping power of the police.71 The trained neural network "learned" to classify a gang affiliation, and a gang affiliation would add to sentencing time and fines, earning money for the police department or county, say, that decided to use it.72 At the conference presentation, the research study's co-author, Hau Chan, junior co-author, was met with outrage from conference attendees. He stated "I'm just an engineer" in response to questions about the ethical implications of the research.73
Most disturbing here is that the one mitigating ethical pause, the human factor—an actual person who would read and evaluate the narrative text which police had to collect about the supposed gang arrest itself—was the most costly factor and so eliminated. The neural network, according to Ingrid Burrington and Ali Winston, would instead generate its own description of the crime, without a single human being reading it, to then be turned "into a mathematical vector and incorporated into a final prediction."74
Not only would this AI-generated description be flawed and completely mismatched, the use of historical crime data means that future crimes could be described as gang involvement, making "algorithms of a false narrative that's been created for people … the state defining people according to what they believe."75 They'd then set the system to run without oversight, making a policing process that is already fraught with abuse as authoritarian as possible. Geographic bias encodes racial bias, and without talking to a single human being, a city is remapped and reformed. The god's eye view comes right around, AI enforcing exactly what its makers want to see in the world.
This is the likely future of AI seeing us at scale. Let's look back to the green lights in Detroit. Once this $4,000 surveillance camera is installed to channel data back to a Real Time Crime Center, the Detroit Police department notes they hardly have manpower to surveil all the cameras all day long. The partial seeing of street surveillance is much the same seeing as some police practice while looking at members of marginalized and high-risk, high poverty communities. A former chief in litigation at The Department of Justice's Civil Rights Division has noted that Project Green Light is a "civil liberties nightmare," in which money is poured out of communities into these cameras, enforcing a further 'hands-off' approach to neighborhoods already desperately underserved, without adequate education, employment, or housing opportunities.76 Nightmare it may be, but the green lights were still installed in food deserts, at the most trafficked areas for staples for miles.
Racial capitalism, weak machine learning, and algorithmic surveillance intersect to create a world that is not better seen, but less seen, less understood, more violent, and more occluded. In a nation where anti-blackness is and has been the institutional and cultural norm, and is an enormously lucrative position, hoping for the Green Light program to reprogram itself, to offer up a "provisional space" in which surveillance is somehow rethought in its methods and outcomes, seems facile. The system is working for them as is.
So in place of civic and human investment are machine vision cameras, promising security and peace of mind for owners, creating a self-affirming loop. This might work in some cases, but it is overall more disastrous for the vulnerable, as it opens overpoliced communities to the specter of punishment at any possible moment. A population desperate for services, for good governance, is forced to see this devastating possible surveillance as a net positive over nothing at all.77 A freeze frame of a camera feed in an area with a "predilection to crime" can be pulled, a subject in that frame can be used as evidence, their misdeeds imagined or maybe real (a suspended license, say) but named as a likely crime. The photo is held as a prompt for punishment along an endless scale of time. Determined by the freeze frame, they are given a new fingerprint of who they are, of what kind of person they are likely to be.
Abuses of machine vision are not hard to imagine. Think of immigration authorities with a camera feed on a wide city street on a southern Californian city, seeking out a general description of a six-foot tall individual in jeans, in a nighttime crowd. The reading of license plates forms the meat of databases, as the numbers are photographed, read, stored, and then sold to companies. Cameras sit in the foyer of banks, watching expressions as we look at our bank account.
Looking up from the street to the camera, we begin to understand how our "individual realms of personal power," to use Stewart Brand's motto in the Whole Earth Catalog, have reflected a very narrow vision of the world back to us.78 Our knowing became channeled through violent, tired logics. But technological design has become so powerful that it can be used to persuade users to desire, and strongly suggests they should even want the world totally made in their image, reflecting those desires.
It's in the interest of this machine eye to create a plethora of life signatures for us. We become profiles—avatars—rich with recorded experiences, filling a demand to be legible for companies, municipal organizations, and bureaucracy to hone in on. There's no break between the constructed model that's underneath the world and the reality that is produced.
We might ask, if AI is able to learn language on its own at levels of unprecedented mathematical complexity, then why shouldn't we have better models of people, with added layers embedded for history, context, and drags they place in simulations that account for trauma and oppression? Is it that we just can't yet imagine a simulation that isn't from a god's eye view? Can we imagine the machine eye can tumble from the top of the hill to the wild below, down to the ground and in it, that it can see beyond the flesh for each individual, unmoored, roving, seeing in every direction at once? What simulation of society would this eye produce, recognizing, seeing, and accounting for what is hard to model?
If you were to fill out a god's eye view of society, what bodies do you imagine in it? What do you look like in this simulation? What exactly is the model of your body moving through time? What does this simulation account for, or not account for? What hidden or not sensible qualities are erased? What are you able to name easily? What are your blind spots? What should the machine eye visualize that you cannot? What is the simulation of America in which a person of color lived a full and healthy life? In which the mentally ill were cared for? In which debt slavery was abolished? In which racialized capitalism was acknowledged as real and accounted for in all aspects of society? What could technology look like if it were not built around efficiency alone, if history and narrative context were not costly aspects to be erased, but in fact essential to a complete simulation? How would our seeing, naming, and knowing change, if the practice of technology was not framed so relentlessly as constituting objective observation of phenomena, but instead as an active creator of an illusion of empirical, measurable, stable, and separate world?
Future ideology in technology might abolish the idea of a tabula rasa as a starting point, which has failed us over and over again. We might experiment with a worldview that does not look down at the world from the hill. Instead of starting over, we insist on not being empty models. If we are to be predicted, let us be seen and represented and activated and simulated as difficult, complex, contradictory, opaque, as able to change, as comprised of centuries of social movement and production, personal history, and creative, spontaneous, wild self-invention. Let us see back into our machine eye as it sees us, to try and determine if it even imagines us living on in the future. If not, we must engineer worlds that produce a reality that is bearable, in which we are seen in full.
1. "Project Green Light Detroit." City of Detroit, detroitmi.gov/departments/police-department/project-green-light-detroit.
2. Gross, Allie. "Does Detroit's Project Green Light Really Make the City Safer?" Detroit Free Press, Detroit Free Press, 21 Apr. 2018, www.freep.com/story/news/local/michigan/detroit/2018/04/20/project-green-light-detroit/509139002/.
4. Vincent, James. "Artificial Intelligence Is Going to Supercharge Surveillance." The Verge, The Verge, 23 Jan. 2018, www.theverge.com/2018/1/23/16907238/artificial-intelligence-surveillance-cameras-security.
5. Bridle, James. NEW DARK AGE: Technology and the End of the Future. VERSO, 2019.
6. Ibid, 24.
7. This is the premise and guiding thesis of Aimee Roundtree's theories around rhetoric and scientific imagination. Roundtree, Aimee Kendall. Computer Simulation, Rhetoric, and the Scientific Imagination: How Virtual Evidence Shapes Science in the Making and in the News. Lexington Books, 2017.
8. Described by Steyerl in "Hito Steyerl and Kate Crawford on Stupid AI and the Value of Comradeship." e-Flux Conversations, 27 Jan. 2017, conversations.e-flux.com/t/hito-steyerl-and-kate-crawford-on-stupid-ai-and-the-value-of-comradeship/5957.
9. Scott, Andrea K. "Ian Cheng's Alternate Realities at MOMA PS1." The New Yorker, The New Yorker, 18 June 2017, www.newyorker.com/magazine/2017/05/15/ian-chengs-alternate-realities-at-moma-ps1.
10. Vincent, James. "Artificial Intelligence Is Going to Supercharge Surveillance."
11. Description of Sondra Perry's Graft and Ash for a Three Monitor Workstation can be found at https://www.serpentinegalleries.org/sites/default/files/press-releases/sondra_perry_-_full_press_pack.pdf
12. Frank, Jenn. "Diablo III Is Adorable." Unwinnable, unwinnable.com/2012/05/25/diablo-3/.
15. Aimee Kendall Roundtree, Computer Simulation, Rhetoric, and the Scientific Imagination, 3.
16. Roundtree, 97-101.
17. Roundtree, 34, 36, 37.
18. Ibid, 38.
19. Ibid, 4.
20. Ibid, 5.
21. Fred Turner, Interviewed by Nora Khan, "Fred Turner: Silicon Valley Thinks Politics Doesn't Exist." 032c, 032c.com/fred-turner-silicon-valley-thinks-politics-doesnt-exist.
25. Outlined in Chun, Wendy Hui Kyong. Programmed Visions: Software and Memory. MIT Press, 2013.
26. Chun describes this in her open of Control and Freedom, describing her critique as dwelling "on the persistence of human reading, on the persistence of software as an ideological phenomenon, or to be more precise, as a phenomenon that mimics or simulates ideology." Alexander Galloway's notion of software as a simulation of ideology is put forth in: Galloway, Alexander R. "Language Wants To Be Overlooked: On Software and Ideology." Journal of Visual Culture, vol. 5, no. 3, 2006, pp. 315–331., doi:10.1177/1470412906070519.
27. Roundtree, 32, 37.
28. Roundtree, 36.
29. Roundtree, 36.
31. New Dark Age, 12.
32. Roundtree, 108.
33. From a talk I saw Jesse Darling give at NEW ROLES FOR THE ARTIST, a symposium hosted by UKK at Kunsthal Aarhus, Denmark, where we were both in conversation with Angela Dimitrakaki and Patricia Reed on November 29, 2018.
34. Paglen, Trevor. "Invisible Images (Your Pictures Are Looking at You)." The New Inquiry, 2 Oct. 2017, thenewinquiry.com/invisible-images-your-pictures-are-looking-at-you/.
35. Robinson, Melia. "A Former San Francisco Mayor Wants to Put the City's Homeless on a Navy Ship." Business Insider, Business Insider, 7 Sept. 2016, www.businessinsider.com/san-francisco-homeless-navy-ship-2016-9.
36. Robinson, Melia. "A Tech Worker Wants to Put up San Francisco's Homeless Population on a Cruise Ship." Business Insider, Business Insider, 23 June 2017, www.businessinsider.com/greg-gopman-san-francisco-homeless-cruise-ship-2017-6.
37. Miller, Michael E. "S.F. 'Tech Bro' Writes Open Letter to Mayor: 'I Shouldn't Have to See the Pain, Struggle, and Despair of Homeless People', 'Riff Raff'." The Washington Post, WP Company, 18 Feb. 2016, www.washingtonpost.com/news/morning-mix/wp/2016/02/18/s-f-tech-bro-writes-open-letter-to-mayor-i-shouldnt-have-to-see-the-pain-struggle-and-despair-of-homeless-people/1012581233/?noredirect=on&utm_term=.787f9ec6f1ca.
38. Do, Anh. "In Fighting Homeless Camp, Irvine's Asians Win, but at a Cost." Los Angeles Times, Los Angeles Times, 1 Apr. 2018, www.latimes.com/local/lanow/la-me-homeless-asians-20180401-story.html.
39. Turner, "Silicon Valley Thinks Politics Doesn't Exist."
40. New Dark Age, 44.
41. In "Linguistic History," Manuel DeLanda wrote that "the mere existence of 'virtual communities' will not guarantee social change in the direction of a fairer, less oppressive society."
42. Turner, "Silicon Valley ..."
43. Roundtree, 106.
44. Roundtree, 37.
45. Paglen, "Invisible Images."
46. Artificial perception and cognition can be best described through formal mathematical frameworks through which we might understand how machine learning experts constitute perception: first as identification of objects in the world, then, as a set of implications of those objects, and third, as an attempt to connect and make meaning between the first two. This is can be formalized through "perception morphisms," which "describe structure preserving paths between perceptions." For a clear, fairly accessible overview, one might see Arzi-Gonczarowski, Z. Annals of Mathematics and Artificial Intelligence (1999) 26: 215. https://doi.org/10.1023/A:1018963029743.
47. Paglen, "Invisible Images."
49. Heath, Nick. "Inside Amazon's Clickworker Platform: How Half a Million People Are Being Paid Pennies to Train AI." TechRepublic, www.techrepublic.com/article/inside-amazons-clickworker-platform-how-half-a-million-people-are-training-ai-for-pennies-per-task/.
50. Amazon Mechanical Turk. "Tutorial: A Beginner's Guide to Crowdsourcing ML Training Data with Python and MTurk." Happenings at MTurk, Happenings at MTurk, 7 May 2017, blog.mturk.com/tutorial-a-beginners-guide-to-crowdsourcing-ml-training-data-with-python-and-mturk-d8df4bdf2977.
51. An image from: Matt Aldrich, Coco Krumme, Ernesto Martinez-Villalpando, Charlie DeTar. "Human Classification with Amazon Mechanical Turk," from "Using pattern recognition to analyze prosper.com," made for PATTERN RECOGNITION AND ANALYSIS, MIT Media Lab Course held in fall of 2008: courses.media.mit.edu/2008fall/mas622j/Projects/CharlieCocoErnestoMatt/turk/.
52. Quoted in: Heath, Nick. "Inside Amazon's Clickworker Platform."
53. Paglen, "Invisible Images."
54. DeepFace: Closing the Gap to Human-Level Performance in Face Verification. Found at:
55. "Large Scale Visual Recognition Challenge (ILSVRC)." ImageNet Large Scale Visual Recognition Competition (ILSVRC), www.image-net.org/challenges/LSVRC/. Currently, Convolutional Neural Network (CNN) models do very well on visual recognition. Researchers check their work against ImageNet, with iterations in models getting stronger and image datasets (Inception, on to Inception-v3) better each year. For a fantastic walkthrough of deep learning explanation, see Colah on Conv Nets: A Moderular Perspective: https://colah.github.io/posts/2014-07-Conv-Nets-Modular/, which is easily one of the most readable primers, or check out https://www.learnopencv.com/deep-learning-based-object-detection-and-instance-segmentation-using-mask-r-cnn-in-opencv-python-c/).
56. For a stunning tour-de-force work by a literary theorist on auto-encoding, cognitive mapping, the aesthetic complexity of machine learning, please see Ambient Meaning: Mood, Vibe, System, Peli Grietzer's dissertation written as a Harvard Comparative Literature student in 2017. The above is inspired by Grietzer's discussion of children's mental, geometric compressions: "We might think about a toddler who learns how to geometrically compress worldly things by learning to compress their geometrically idealized illustrations in a picture-book for children. Let m be the number of sunflowers, full moons, oranges, and apples that a toddler would need to contemplate in order to develop the cognitive schema of a circle, and n the number of geometrically idealized children-book illustrations of sunflowers, full moons, oranges, and apples that a toddler would need to contemplate in order to develop this same cognitive schema ..." Found at: http://marul.ffst.hr/~bwillems/fymob/ambient.pdf
57. Paglen, "Invisible Images."
58. Mimi Onuoha, http://mimionuoha.com/us-aggregated/.
59. Mimi Onuoha, http://mimionuoha.com/us-aggregated-20.
60. Hu, Caitlin, and Caitlin Hu. "The Secret Images That AI Use to Make Sense of Humans." Quartz, Quartz, 1 Nov. 2017, qz.com/1103545/macarthur-genius-trevor-paglen-reveals-what-ai-sees-in-the-human-world/.
61. Chen, Sophia. "Should We Teach Facial Recognition Technology About Race?" Wired, Conde Nast, 15 Nov. 2017, www.wired.com/story/should-we-teach-facial-recognition-technology-about-race/.
62. Simonite, Tom. "When It Comes to Gorillas, Google Photos Remains Blind." Wired, Conde Nast, 20 Nov. 2018, www.wired.com/story/when-it-comes-to-gorillas-google-photos-remains-blind/.
63. Chun, Wendy Hui Kyong. Control and Freedom Power and Paranoia in the Age of Fiber Optics. MIT, 2008. Page 17.
64. Paul Christiano, "When Is Unaligned AI Morally Valuable? – AI Alignment." AI Alignment, 3 May 2018, ai-alignment.com/sympathizing-with-ai-e11a4bf5ef6e?gi=f81396e3c39d.
65. Paglen, "Invisible Images."
66. Onuoha, Mimi, "Notes on Algorithmic Violence," found at: https://github.com/MimiOnuoha/On-Algorithmic-Violence.
67. Hale, Benjamin. "Don't Worry Baby." The Paris Review, 25 Oct. 2016, www.theparisreview.org/fiction/6434/dont-worry-baby-benjamin-hale.
68. Described in detail in: Haskins, Caroline. "Academics Confirm Major Predictive Policing Algorithm Is Fundamentally Flawed." Motherboard, VICE, 14 Feb. 2019, motherboard.vice.com/en_us/article/xwbag4/academics-confirm-major-predictive-policing-algorithm-is-fundamentally-flawed.
70. For a deep, intensive survey of algorithmic policing and the politics of PredPol, please see Jackie Wang's excellent book, Carceral Capitalism (MIT Press, 2018), a chapter of which is excerpted here:
71. Winston, Ali, and Ingrid Burrington. "A Pioneer in Predictive Policing Is Starting a Troubling New Project." The Verge, 26 Apr. 2018, www.theverge.com/2018/4/26/17285058/predictive-policing-predpol-pentagon-ai-racial-bias.
73. Hutson, Matthew, et al. "Artificial Intelligence Could Identify Gang Crimes-and Ignite an Ethical Firestorm." Science | AAAS, American Association for the Advancement of Science, 24 Jan. 2019, www.sciencemag.org/news/2018/02/artificial-intelligence-could-identify-gang-crimes-and-ignite-ethical-firestorm.
74. Winston, Ali, and Ingrid Burrington. "A Pioneer in Predictive Policing Is Starting a Troubling New Project."
76. Jonathan Smith, quoted in: Gross, Allie. "Does Detroit's Project Green Light Really Make the City Safer?"
78. A copy of the Whole Earth Catalog can be found at: http://www.wholeearth.com/issue/1010/article/196/the.purpose.of.the.whole.earth.catalog