In a moment of perfect timing, The Washington Post just published an article titled “An AI that creates images from prompts worries researchers. And now anyone can use it.” An alternate headline reads, “AI can now create any image in seconds, bringing wonder and danger.” The article focuses on the three AI art generators I have been playing with: Dall-E, Stable Diffusion, and Midjourney.
Some of the major concerns are expressed as follows:
The technology is now spreading rapidly, faster than AI companies can shape norms around its use and prevent dangerous outcomes. Researchers worry that these systems produce images that can cause a range of harms, such as reinforcing racial and gender stereotypes or plagiarizing artists whose work was siphoned without their consent. Fake photos could be used to enable bullying and harassment — or create disinformation that looks real.Nitasha Tiku, Washington Post
In the next paragraph, Wael Abd-Almageed offers, “Once the line between truth and fake is eroded, everything will become fake….We will not be able to believe anything.” Admittedly, in my limited exeperience, I struggle to square how truth is significantly more eroded by images produced by these AI art/image generators, especially in the age of Photoshop and other editors, where a simple light adjustment amounts to an “erosion” of the real. I share concerns about what we’ve dubbed “deep fakes,” but let’s hold the concerns quoted above against the images generated below.
I chose, as a prompt for all three interfaces, the term “diverse student body.” I selected this mostly because a rendering would almost certainly involve human faces and also challenge the algorithms in terms of race and gender representation–an area for which these algorithms have received significnt criticism (and subsequent training). The prompt also plays into the meme of false representation, where largely homogenous institutions present themselves as culturally diverse in glossy publicity materials. I would argue that those images, designed by humans, “erode” our sense of truth as much as any algorithm.
Anyway, here are the results:
You will see that a common trait, also referred to in The Washington Post article, is a deformation of faces and eyes, some more severe than others. This is always the case with Dall-E and Stable Diffusion, and it leads me to question the value or useability of such images. What would someone use them for? Is the below a mock movie poster for a zombie film? The above image, if presented in a smaller format and limited to a quick view (say in a slide deck), could pass as any stock image.
I will say, given the amount of criticism related to representation by these image generators, especially Dall-E, they do pretty well with the spirit of the prompt. I think any real photograph would better serve, especially if you want humans with real eyes.
Midjourney continues to excel, especially since Midjourney commits to producing an artistic aesthetic. With Midjourney, I never feel as if I am generating an image, so much as a stylized rendering of an idea or concept. The results, to my eye, are always excellent and interesting.
The Post article closes with Dall-E’s efforts to prevent “deep fakes” with photos uploaded by users. I haven’t uploaded a personal photo to these interfaces, so I wanted to give it a test. I used the below image of me playing ping pong:
After uploading this athletic photo, I immediately regretted Dall-E’s response. Behold the full assault on Hawaiian shirts: