We are currently working on topic modeling the Slavic Review. As part of this project, and to encourage feedback and contributions from the community along the way, we have made our preliminary Slavic Review Topic Model Browser available for interactive viewing.
For a comparative angle, we have also just run preliminary topic modeling of the Slavic & East European Journal and the Russian Review. Both the SEEJ Topic Model Browserand Russian Review Topic Model Browser are available online as well.
What is Topic Modeling?
In the current phase of the “Avant-Gardes and Émigrés: Slavic Studies and Digital Humanities” project, we are exploring Topic Modeling, an approach to textual analysis that algorithmically and iteratively examines a large corpus of texts to produce quantitative assertions about frequent word co-occurrences. The most widely-used method of topic modeling in Digital Humanities engages an algorithm called Latent Dirichlet Allocation (LDA). This algorithm sifts through a given corpus and measures the proximity of all words in the text. If certain words frequently appear near other words (for example, “poet” and “Pushkin”), they are grouped together, creating a “topic.” When implementing this algorithm, the programmer may decide how loosely these topics are bound, which is to say, how closely they appear in the text, by selecting a maximum number of topics. For example, an analysis containing two hundred topics will have more closely-bound groups of words than one containing one hundred topics.
Unlike traditional methods of categorizing text, the LDA algorithm employed in topic modeling categorizes words based on quantitative, rather than qualitative, factors. Take, for example, Topic 117 in our data set, which contains the words, “political, power, government, country, leaders, regime, opposition” and so on. When a person reads this list, one might automatically think of this topic as “words related to politics;” however, the computer has no sense of the semantic meaning of these words (thus the automatically-generated title of “Topic 117”), because they are joined together only by their mutual proximity within the text. The benefit of this is that words that we may not immediately associate as related to a particular topic will emerge in the results of topic modeling. To return to Topic 117, we also find here “future,” “period,” and “year,” which may imply that our understanding of historical time is rooted in political language.
To summarize the above, one of the most useful outcomes of topic modeling is the ability to see certain correlations and connections between elements of a corpus that one might not have initially recognized. Moreover, topic modeling can also confirm pre-existing notions concerning the corpus or dislodge some incorrect judgements about the collection of texts by showing quantified views of the corpus as a whole rather than through anecdotal evidence.
The final major benefit of analyzing text with topic modeling is the ability to track the presence of trends over time. In a singular publication, such as a novel, one could follow the emergence of a theme chapter by chapter. In our example of the Slavic Review, we are able to observe scholarly trends as they present themselves in the form of various topics since the journal’s inception in 1941. If one traces Topic 181, which includes words such as “women,” “gender,” sexuality,” and “feminist,” one would find that this topic emerges in the Slavic Review first in the late 1970s and appears with greater frequency from the early 1990s to the present.
Thanks to Andrew Goldstone, we have a model-browser that allows the user to investigate the particulars of each topic, including a graph showing its frequency over time, a list of its words based on weight (how frequently they appear overall), and even the individual articles in which each topic appears (including what percentage of the article consists of that topic).
Of course, we should remain cautious and critical in our conjectures, lest we mistake “dirty OCR” for proof that “fi rst offi review slavic infl public signifi confl diff aft defi cial” (topic 42) became a serious subject of study in 2011! We are, for now, “reading the tea leaves”: see here for Jonathan Chang, Jordan Boyd-Graber, Chong Wang, Sean Gerrish, and David M. Blei’s perceptive critique, “Reading Tea Leaves: How Humans Interpret Topic Models.”
Models and Open Source Tools
Our work is inspired and made possible by the models and tools made available by Andrew Goldstone and Ted Underwood, demonstrating the possibilities of topic modeling academic journals with their joint work on the PMLA and Goldstone’s work on Signs. Both Goldstone and Underwood generously offered us guidance and troubleshooting tips along the way.
For a summary of Goldstone and Underwood’s work, please see: http://tedunderwood.com/2012/12/14/what-can-topic-models-of-pmla-teach-us-about-the-history-of-literary-scholarship/.
We used Goldstone’s open source GitHub toolkit to build our model, available here: https://github.com/agoldst/dfrtopics.
For another application of Goldstone’s toolkit online, please see Signs at 40: http://andrewgoldstone.com/blog/2014/10/29/signsat40/
History of the Slavic Review
The history of the Slavic Review dates to 1941, when publication of The Slavonic Yearbook: American Series began. Following an arrangement with the School of Slavonic and Eastern European Studies of the University of London, a committee of American scholars took over the management and publication of the Slavonic and Eastern European Review. This committee represented the American Historical Association, the Modern Language Association, the Political Science Association, the Mediaeval Academy of America, and the Committee on Slavic Studies of the American Council of Learned Societies, the leading scholarly organizations in the United States at that time. The founding of this scholarly journal was spearheaded by Columbia University’s Professor John Hazard. Hazard had also worked as an expert on the Soviet legal system for the federal government during World War II and afterward returned to academic life. With a diverse group of scholars at its helm, Slavic Review has been an interdisciplinary journal since its inception.
This forerunner to Slavic Review, The Slavonic Yearbook, even predates the founding of the leading scholarly organization in the United States in Slavic Studies, the American Association for the Advancement of Slavic Studies (AAASS), today known as the Association for Slavic, Eastern European, and Eurasian Studies (ASEEES). The Joint Committee on Slavic Studies (JCSS), a joint committee of the American Council of Learned Societies (ACLS) and the Social Science Research Council (SSRC), also preceded the AAASS. The AAASS was founded in 1948 following World War II as the Cold War was beginning in earnest.
Since its founding, the journal has appeared under a handful of different titles: Slavonic and Eastern European Review. American Series (1943-44), American Slavic and Eastern European Review (1945-1961), and — since 1961 — under its current name. Remaining the premier location for new scholarly insights and directions in the field of Slavic Studies, Slavic Review stands to this day as the leading journal in the discipline.
Patterns and Discoveries
Many of the patterns that we might first observe, looking at the topic modeling results for the Slavic Review, confirm expectations and general perceptions of the field or conform to common sense. (Conversely put: these are the patterns we can immediately “read.”) For example, work on film (as evidenced by topic 158) flourishes after the late 1980s; whereas the topic associated with Soviet workers’ wages peaks with a special issue / article block in 1955 and fades away after the 1960s. Less dramatic trends are also visible: the topic most associated with linguistics (topic 130) has seen some decline, especially after 2000. The distinct topic model associated with the Nicholas Marr school of linguistics specifically (visible when working with 200 topics, as here; labeled topic 101) has a small peak in 1957, corresponding to one article, and another in 1996, corresponding to a review of a monograph on Marr. Since individual issues of the journal contain a relatively small number of articles, one unusual piece or an article block can make significant noise within a less popular topic.
Other topics appear rather suddenly, potentially reflecting new theoretical or cultural developments in the field, academia, or in society at large. For example, one topic associated with discourse and cultural practices (topic 47) appears suddenly in the 1990s and has been increasing since the 2000s. Our tentative hypothesis, based on looking more closely at the most prominent articles and reviews associated with this topic, is that topic 47 may reflect the influence of Michel Foucault on the Slavic field; a general move towards anthropology in theory and research; as well as growing interest contemporary culture. Similarly, the sudden spike in a topic associated with the words “Bakhtin, Freud, new, theory…,” upon examination of the contributing articles and reviews, appears to be associated with seminal translations and commentaries on Mikhail Bakhtin’s works in the 1980s and with the impact of his theories on the field in the decades since (topic 29).
We conjecture that some topics reflect less particular academic trends than reverberating echoes of cultural events somehow related to the world of Slavic studies, such as the release of the film Borat (2006) and subsequent special issue. The intersection of this popular cultural phenomenon with a region of the world relevant to the Slavic Review — as well as the editorial choice to publish a special issue — explains the appearance of a distinct topic with the words Borat and Kazakhstan as most prominent (topic 99).
Another entire category of topics initially struck us as mysterious. For example, a topic associated with “paper” and “tables” appears suddenly in the 1980s and quickly disappears after 1990 (topic 46). After looking more closely at the articles, we found that this topic correlated with a feature called Books Received. Book reviews have been part of the Slavic Review essentially since its inception, but for some reason, in the 1980s, these features accounted for a greater percentage of the pages in the journal. One potential explanation is that there was a relative boom of books published in the field and/or reviewed in the journal at that time. Each of the book review entries in the Books Receivedsection is filled with keywords correlated with topic 46. Having more books in these sections increases the amount of these keywords relative to the rest of the journal; longer Books Received sections could help explain the sudden prominence of this topic.
Other topics still, which we chose to leave in at this stage, to share the noise we encounter along the way, seem to reflect “dirty OCR” more than intellectual content. (Again, see topic 42, “fi rst offi review slavic infl public signifi…,” suddenly appearing in 2011, exclusively in reviews, and very likely reflecting hyphenating practices.)
Topics, Trends, Conjectures
Below we list some preliminary observations of topics that caught our eyes, working with the 200 topics available for exploration on this site. We may be reading the tea leaves for now, but we invite you to join us and contribute your observations, insights, corrections, and explanations. The goal of our interactive model — and of the entire exercise — is to encourage critical self-reflection to provide a pedagogical toy/tool for our community. What do we write about when we write Slavic Studies?
Topic 7 [communist communism communists party new international world parties comintern movement countries tito moscow ideological propaganda]
Peaks only around 1960, begins to significantly drop off by 1970; moreover, appears predominantly in reviews rather than in articles.
Topic 8 [socialist consumption consumer socialism life western…witchcraft magic]
This does not take off until 2002, but since then has remained popular. The antecedent articles in earlier generations are either about consumption or witchcraft, what is going on?
Topic 10 [example fact question problem point view evidence made argument make use different seems way issue]
Primarily consists of reviews or articles that focus on the critique of others’ work. Dominant throughout, after about 1952.
Topic 18 [travel tourism world exhibition national modern fair museum tourist]
This becomes a presence in 2003. Maybe due to article block in 2003 on tourism, another in 2010 on art exhibitions. Becomes a legitimate trend and holds.
Topic 21 [public society popular social culture people private review life cultural readers]
Spikes in 1989, remains relatively consistent through 2010, drops off again. Counterintuitively, most articles/reviews are not about contemporary Russia, but covering these issues in the 19th and Soviet Period.
Topic 24 [moscow space platonov ruins tolstaia center city future spatial utopian pelevin metro]
Mostly a sci-fi topic, suddenly emerges in the 2000s
Topic 27 [media television russian history courses news slavic offered material propaganda radio course european political literature]
great example of noise getting in the way of what we’re seeing. Two early years, the percentage is so high (in 1945 and 1946), that scales the rest
Topic 28 [baltic latvian latvia estonia]
Baltic studies seems to have really fallen off since the early years of the journal into a small but stable category.
Topic 29 [bakhtin freud new theory psychology]
Early spikes and then nothing until the 1980s when seminal translations and works of and by Bakhtin come out. Has been a pretty strong topic since.
Topic 30: [german germany germans east berlin]
Big spikes, but also seems like there is a cyclicality to interest in German topics. Excitement, then steep drop, build up to another spike.
Topic 44 [tolstoi tolstoy peace love war karenina andrei lev work novel god]
Perhaps the most observable pattern associated with this popular topic is that it seems to move cyclically. Time your submissions accordingly?
Topic 47 [new people review time cultural practices world discourse … anthropology]
Seems to appear out of nowhere in the early 1990s and grow rapidly to dominate our “discourse” in the subsequent two decades. Our initial hypothesis is that this topic correlates to the influence of Foucault on our field.
Topic 60 [god christ icon holy divine christian wisdom]
Slight peak from about 1989 to 1993.
Topic 64 [pushkin onegin romantic petersburg bronze poet]
This topic is more prevalent after 1980 than before.
Topic 69 [mickiewicz polish poet great poem poetry]
This topic starts strong and vanishes in 1960. Compare this topic to the other Polish topic (topic 119), which appears to replace this one.
Topic 78 [income wage workers average rubles]
This is another topic with a very early spike that totally drops off. Perhaps the journal became less interested in economics generally. Articles during the spike year were about central planning and the Soviet economy. More recent ones look to issues like income inequality.
Topic 97 [marx philosophy theory]
Marxist theory appears to hit the Slavic field in real time in the mid-late 1960s. We see articles on Soviet Marxism, as well as a marked interest in cybernetics.
Topic 99 [kazakhstan borat baron cohen last www]
The result of a special issue in 2008, which focused on Borat and Kazakhstan; however, there is another peak in 2012, when articles appear to be either about Kazakhstan or the media. Since they are connected to some aspect of this topic, media words end up getting included in the topic. Meanwhile, strikingly, not a single popular topic included the words Internet, digital, or web in the Slavic Review. (Try the “word” search function to see what else is missing.) The presence of “www” does indicate some discussion of the Internet.
Topic 104 [ukraine ukrainian ukrainians kiev national russia]
This Ukrainian topic appears to be rather stable with certain moments causing spikes in its prevalence. The major jump in 1963 seems to be tied to a special issue on the topic.
Topic 110 [crime punishment criminal state social society hooliganism crimes corporal penal]
Often the topics involve bribery — but why the first two words? One hypothesis is that Slavists writing on “crime” cannot resist making the pun with “punishment”! This topic is most prevalent in 1980s and after.
Topic 112 [jewish jews yiddish community israel jew religious jewry]
There is less of this topic before 1966 and then a significant rise.
Topic 115 [folklore tales songs legends oral literature tradition]
This topic has a highly cyclical existence, particularly since 1970. Each iteration seems to represent a change in critical scholarship (transition from medieval texts to early print to minority oral cultures and women in folklore).
Topic 119 [polish poland poles galicia warsaw catholic government… sienkiewicz]
This topic on Poland becomes more dominant after 1965. This topic should be compared to the other Polish topic, why is Sienkiewicz the feature author in this one and Miczkiewicz in the other. In an initial use of this program, we only ran 100 topics and there was only one Polish topic. Over time, one Polish topic replaces the other in the journal.
Topic 124 [day love night people old eyes came man land white men still god saw heart]
This topic, initially mysterious, seems to correlate to romantic poetry and works of translated poetry in Russian, Polish, Serbo-Croat, Slovak, and others. Peaks through 1947, present throughout in much smaller quantities. Interesting that language of origin doesn’t seem to matter.
Topic 133: [self subject body new form reality experience language subjectivity]
Virtually non-existent before 1990. There is a massive spike in 2008. Begins with reviews of Svetlana Boym’s work and articles from Stephanie Sandler. Several works on Julia Kristeva in particular.
Topic 138: [genocide, nazi, holocaust, germans, collaboration, occupation, mass, violence]
Not significantly present in scholarship until 1996 when there is a burst of scholarship on the Holocaust and other topics related to World War II. Peaks in 2002, gradually declining since.
Topic 140 [yugoslavia yugoslav serbian serbia croatian serbs croatia belgrade national bosnia zagreb serb croats slavic tito]
The selection of words in topic itself interesting, reflecting the most popular areas of study. The topic peaks in 1983, possibly due to Tito’s death and resulting media attention.
Topic 158 [film films cinema new director sergei eizenshtein made]
Not surprising that this topic appears as a subject of study in 1986-88 during the years of perestroika as certain barriers are broken down. Since its appearance it has become a significant presence.
Topic 165 [poetry poem poet poems poetic poets verse lines words line derzhavin lyric love written lermontov]
This was a consistently strong topic, but has shown a recent drop off since 2008.
Topic 174 [byzantine slavic church greek monastery rome latin work medieval]
Very strong topic at the start of the print run, has since been followed by a consistent drop off.
Topic 183 [mandel shtam brodsky poet akhmatova mandelstam osip death kak poetry esenin loseff]
20th-century poetry topic, peaks in 1987. Not that dominant before 1986, and starts fading as early as 1991 arguably, more so after 2005. Really a particular peak. The split in Mandelshtam’s name is most likely due to some issues with OCR.
Topic 198 [democracy democratic economic support market people political reform]
This topic makes a strong appearance in 1994. It seems to have taken a few years after the end of the Soviet Union to sink in, then became rather dominant. There are indications of a drop off after 2009.