Computational Text Analyst Eder Presents Stylometry at Digital Humanities Initiative Event

December 13, 2016

Computational text analysis expert Maciej Eder, director of the Institute of Polish Language at the Polish Academy of Sciences, and associate professor at the Institute of Polish Studies at the Pedagogical University of Krakow, was the Digital Humanities Initiative's second guest in the Conversations in the Digital Humanities series on November 3. In recent years Eder has received international attention for his analyses of the authorship of ancient Greek and Roman plays and 20th-century novels with the help of Stylo, a software he created in collaboration with the Computational Stylistics Group, namely Jan Rybicki and Mike Kestemont.

In the conversation session, Eder presented computer-based stylometry, its methodologies and possible outcomes, through two recent projects. Stylometry, Eder explained, focuses on the basic building blocks of the language in which the examined works and corpora were written, as opposed to traditional text analytical methods, which look for a text's extraordinary or unusual linguistic (i.e. lexicological and syntactic) features. For English-language materials, this entails, for instance, looking for and quantifying the frequency and relative patterns of function words (e.g. of, at, with, etc.) within one or multiple texts.

Eder likened this approach to dactyloscopy, the analysis and classification of patterns observed in individual fingerprints. He demonstrated the identification of an author's stylistic "fingerprint" through his stylometric analysis of Harper Lee's 1960 novel "To Kill a Mockingbird." When projected to a diagram, the data shows that the last chapter differs stylistically from the rest of the book, and consequently, it perhaps bears witness to the influence of either the book's editor, Therese von Hohoff Torrey, or the novelist and playwright Truman Capote, Lee's close friend. The latter is all the more likely to have interfered with the manuscript as a wider-range network analysis shows stylistic similarities between "To Kill a Mockingbird" and two of Capote's novels.

In another example from his current work, Eder demonstrated that, through a stylometric re-examination of traditional authorship attribution methods, one can, with the help of Stylo, challenge the supposed authorship or identify the as-yet-unidentified authors of texts. His project to analyze the individual styles of ancient Greek and Roman dramatists and combine the received data with network visualization techniques has revealed several instances of cross-fertilization between these works and plays whose authors have thus far been unidentified.

The session's guest discussants, Margit Kiss of the Hungarian Academy of Sciences and Levente Selaf of Eotvos Lorand University, emphasized that the selection of the material for analysis affects the results significantly. Selaf suggested that while computer-based text analysis works with quantifiable data and could augment the work of literary critics and historians, its results should never be considered "exact" or unquestionable.