Poetry is maybe the most human of all human endeavors. One would think this is one of the very few fields that artificial intelligence could not take over. But according to a recent study, this is not so, as non-experts rate AI poetry higher than poetry written by humans.
And not just any humans. The researchers chose 10 poets who are very much part of the canon: Geoffrey Chaucer, William Shakespeare, Samuel Butler, Lord Byron, Walt Whitman, Emily Dickinson, TS Eliot, Allen Ginsberg, Sylvia Plath, and Dorothea Lask. Quite a list, spanning more than 800 years.
They then asked ChatGTP 3.5 to write poems in the style of each poet. They did not select the best of these, just used the first five that ChatGTP 3.5 came up with. And they asked the participants, who were not experts, to rate the work on 14 scales, like profundity, rhythm, originality, imagery, beauty, emotion, and overall quality.
If the participants had been told that a poem was written by AI (whether or not it actually was), they rated it lower on all these counts. If they were told it was written by a human, they rated it higher. That’s not very surprising.
What was very surprising indeed is that most of these non-experts, when they were not told whether a poem was composed by AI or by a human systematically ranked the ones that were actually written by AI higher than the ones written by humans. For example, they ranked the first poem ChatGTP 3.5 wrote ‘in the style of Shakespeare’ higher than an original Shakespeare poem.
Headline-grabbing stuff, no doubt about it. But what does this experiment show exactly? The researchers are very clear that these findings were limited to non-experts; the vast majority of the participants did not read poetry more often than a couple of times a year. And the research team’s own explanation has to do with participants’ lack of expertise, concluding that these non-experts preferred the AI poetry because it was less confusing, less opaque, and more straightforward.
This must be part of the story, but it’s not the whole story. To take a brief detour into a different but also very deeply human endeavor—wine-tasting—it was first reported that non-experts are very bad at making the simplest distinctions about wine (for example, whether they were drinking red wine or white wine if they could not see the color), but shortly afterwards it was found that experts make the same kind of mistakes (and sometimes they in fact made more mistakes). So watch out for studies of the same kind with poetry experts in the near future.
And there is something dissatisfying and elitist about saying that laypeople just don’t get the complexity of these poets, most of whom very explicitly wrote for the wide public and not just a select few. If the hoi polloi doesn’t get real human poetry, then it is not much of a consolation that English majors do.
But things are more complicated. It is notoriously tricky to do experiments on aesthetic appreciation. This experiment was about rating: how participants rated various poems on a scale from 1 to 7 on various counts. It was about the participants’ self-report about their experiences and, as we know from literally thousands of studies, self-reports can be notoriously misleading in most contexts. So this experiment was not about the aesthetic experience of poetry, which is what really matters. It was about what participants thought about their aesthetic experience.
Finally, aesthetic experiences and aesthetic experiences of poetry are fragile and precarious. Often you may sit down to read a poem, but you’re just not in the mood and you keep reading the same line over and over again, without taking any of it in, as opposed to sitting down and devoting all your attention to it. The participants in this experiment did the test online and it’s safe to say that there is no guarantee that they were all in a poetry-reading frame of mind when doing the test.
The study is important and ingenious. But it doesn’t show that budding poets should abandon their passion. Not yet, anyway.