I was recently talking with a friend about Midjourney (the AI image-generator), and she hadn’t encountered it yet. We were planning a book discussion at the time, and just to introduce her to the platform I suggested, “Let’s prompt Midjourney with the single word ‘protest’ and see what comes back.” The initial renderings are below:
I found the results both surprising and completely unsurprising (you know, that holding two contradictory ideas in your head thing). If I ask Midjourney to render “protest” and these are the returns, what are my assumptions about the assumptions Midjourney is mining?
- Protest is practiced by young people.
- Protest is practiced by white people.
- Protest is practiced by people with a “grungy” or “hipster” aesthetic.
- Protest happens in urban spaces.
- Protest happens on cold and/or wet overcast days.
- Protest involves signage (often mocked for spelling errors, but we require full forgiveness here).
- Protest, in the bottom left photo, includes what appears to be a peaceful police presence and patriotism in the presence of a flag or two.
- Conversely, the bottom right photo hints at some violence, with the female figure’s face rendered with abrasions and some blood.
- Finally, protest is practiced by people who are often ridiculed for doing so, as the people depicted likely have smart phones or laptops, and for some media voices this is a totalizing hyprocisy that nullifies all their social concern.
Put another way, these might be the people of Occupy Wall Street, who were widely mocked and dismissed in media coverage as being uninformed, privileged, dirty, poorly dressed, and lazy. They are there protesting, but they don’t really know why.
In Midjourney, you can do variations of any of the returned images. I chose the bottom right image:
In the above images, I am curious as to what led to the foregrounding of a female figure, while all figures represented behind are closed-mouthed men; only the woman’s mouth is open.
Given what I’ve seen so far, I altered the prompt to “violent protest.” Again, the renderings are both surprising/unsurprising:
As you can see above, the word “violence” completely eliminated the appearance of women. I cannot locate a single female figure. Also, in the bottom right photo, the foregrounded figure and all the background figures are open-mouthed yellers, which differs from the image with the female figure out front.
Again, what do we see? Screaming. Lots of screaming, and whatever word each mouth is in the process of producing, it is violent. One of the photos includes fire, likely in the form of a burning sign. Why is this interesting? Yelled words are violent in Midjourney renderings, while written words are peaceful. How do we know? Here are the Midjourney results for “peaceful protest” and they all include signs and imply a quiet setting (none of the “violent” images include signage):
So what do we see in “peaceful”? Women displace men. Also, not only is there no screaming; there are no mouths! Major figures stand masked, a school bus is added, and the top right image appears to include our first person of color. (It’s also finally warmer–short sleeves!)
Throughout this little experiment, I am struck by the consistent homogeny: older people do not protest, or protests are never intergenerational. There is racial homogeny, and when you add the adjectives “violent” or “peaceful” there is near gender homogeny. There is also homogeny in location: cities.
To be clear, I like Midjourney. I think it returns some beautiful, startling images. However, what is rendered in this batch of images is, to my eye, every reportorial stereotype deployed against protest to neutralize that protest’s particular goal or effect.
I’ll need to think about and tinker with this one more.