Data Analysis of Novels Reveals… Nothing New

Scientists have data mined 1,700 stories to tell us what we already know

Earlier this year, I wrote about one of my creative writing pet peeves: the constant attempts to reduce all forms of art into a few simple categories. You know the idea, how “all stories” are one of ten or six or three or eight types of possible stories:

These self-congratulatory attempts to reduce art to formula rarely tell us anything useful about stories. These formulas don’t tell us how stories function or how different narratives affect readers. They don’t tell us how great stories were written or what meanings the works can produce. Instead, these essentialist structures are parlor tricks that exploit the need for all mysteries to have simple explanations. But what the critic is invariably doing is generalizing to the point of nonsense.

When you lump the near-infinite number of stories into a few vague categories, you’ve ceased to say anything meaningful. This is exactly why there are so many of these formulations — “There are sixteen master plots!” “No, there are truly only seven types of stories!” “You’re both wrong, all stories are a stranger coming to town or a man going on a journey.” “Well I say five!” — and each are equally “right.”

Well if you love pointless story structure analysis, especially when they are vaguely scientific, I’ve got a new study for you. According to the MIT Technology Review, “Scientists at the Computational Story Laboratory have analyzed novels to identify the building blocks of all stories.” The article also notes that there is no consensus about the number of story types but suggests this is because there hasn’t been scientific analysis. Now, Andrew Reagan and his team at the University of Vermont in Burlington have done a “sentiment analysis” to check out the “emotional arcs” of 1,700 stories.

The idea behind sentiment analysis is that words have a positive or negative emotional impact. So words can be a measure of the emotional valence of the text and how it changes from moment to moment. So measuring the shape of the story arc is simply a question of assessing the emotional polarity of a story at each instant and how it changes.

This is profoundly reductive. Human emotions are numerous and complex. The idea that all human emotion, even in the context of narrative storytelling, can just be reduced to “positive” or negative” is silly. (It would be far more interesting to see an analysis of emotion in stories that wasn’t binary.) And can you really judge a story’s emotional arc by counting words? Are “moon” and “child” really inherently “happy” words — as these researchers say — in context of storytelling? Tell that to the werewolf horror story I just read…

Additionally, this study of “fiction” and “stories” includes works of philosophy by Schiller and Kant alongside numerous collections of non-fiction essays and letters. There are many interesting possibilities with data analysis of literature, but sadly, I don’t think this is one of them. Even beyond the aformentioned problems, the conclusions drawn are pretty unhelpful.

Quick, without doing any computer analysis, answer this question: if we declare there are only two directions (up and down) that X can move, and we say X can only switch directions up to two times, how many ways can X move?

Well… X can just go up (1). It can just go down (2). If X switches directions once, it can go up and then down (3) or down and then up(4). If it switches twice, it can go up-down-up (5) or down-up-down (6). Those six are literally every permutation possible. Guess how many types of emotional arcs this study claims to reveal? Yes, six.

A steady, ongoing rise in emotional valence, as in a rags-to-riches story such as Alice’s Adventures Underground by Lewis Carroll. A steady ongoing fall in emotional valence, as in a tragedy such as Romeo and Juliet. A fall then a rise, such as the man-in-a-hole story, discussed by Vonnegut. A rise then a fall, such as the Greek myth of Icarus. Rise-fall-rise, such as Cinderella. Fall-rise-fall, such as Oedipus.

Put aside the fact that these examples don’t really fit — Alice does not purely rise, and Romeo and Juliet rise before falling — they didn’t even need to do any analysis because they defined these as the six possibilities from the start. But wait, you say, couldn’t there be more complex plots that vacillate between rising and falling with more than two direction changes? Of course. These are just the six “building blocks” and “stories that follow more complex arcs that use the basic building blocks in sequence.” So… there aren’t only six types, there are just six “building blocks” that are actually just two building blocks: rising or falling.

So there you have it folks, “science” has determined that all stories have characters who rise or fall or do both, perhaps multiple times, in some order.


