Humanities tend to make things complicated. And that’s a privilege, because they have decided to describe and analyze their topic of research as exactly as possible. Skimming over complex issues leads to a loss of important details and, in doing so, to inaccurate argumentations. As humanists, our medium to transport insights and information is still a language and so – besides scientific research – we artfully string together words and produce rhetorical highly adorned sentences to convince our readers that our assumptions and arguments are unassailable. To describe details, we often need a lot of words (although sometimes we reduce a complex issue to a familiar technical term)!
Besides the textual reduction and argumentative expansion of information, digital humanists use another technique to convey insights: visualizing textual information (as introduction, I would suggest CAO, Nan/CUI Weiwei: Introduction to Text Visualization, Atlantis Press: 2016, and BUBENHOFER, Noah/KUPIETZ, Marc (Ed.): Visualisierung sprachlicher Daten: Visual Linguistics – Praxis – Tools, Heidelberg University Publishing: 2018.). Data analysists use graphs, charts, and all kinds of visualizations to support the intention of their statements. As digital humanists, we transform text to data and analyze it, which is why we also work with these “visual windows” to illustrate and support our humanistic argumentations.
During my last vacation, I had a silly idea: I was thinking about a network of words within a specific text or corpus. I was looking for a method to visualize the amount of two-word-connections (collocations) in a text. Of course, it is not a big challenge to count how often a specific word in a text is connected to another. So, I took a random sample text and ran the Treetagger over it. Fast, easy, simple. I got a lemmatized list of words (without quotation marks) in a column in exactly the same order as in the text. In the next step, I copied that column and pasted the content next to the origin column, but set a row below. For example, the text “A B C” became a table like this:
Starting with the first word, you can now read the lemmatized text from top to bottom, but also count how often the collocation A-B (or whatever) is used in the text. In fact, I had a corpus with 35.000 lemmas, leading to a bigger possibility of collocations that came up more often than just a single time. To make it short, I produced a list of connections (or edges) and a word list (nodes), and played with them in GEPHI. My colleague Christopher Nunn already presented the next step on Twitter: a network of collocations in Augustine’s letters.
— Christopher Nunn (@ChNunn) January 18, 2018
The most important result of this experiment is that some visualizations are definitely not suitable to publish. I’m totally bored of multidimensional networks and balls of wool, spaghetti graphs and so on. As an expert and creator of the visualization, I clearly see fascinating and surprising things. In the example above, only I (and maybe some other experts) can see unexpected collocations at first glance, because I know the corpus and the data. But is it really the best way to publish it? Is it suitable for a public presentation? I’m sure, it isn’t. The more information a graph includes, the less insight it brings for many readers.
The challenge of scientific data visualization is to spot out the most important information out of a bunch of useless and unimportant props. Many researches don’t think about that, which makes me tired of heatmaps with millions of colorful spots but without clear information content. Finally, I hate graphs which one can only understand with an explanation that is more than one page long. The deeper sense of graphs and visualizations must be that they should be understandable by themselves without the explanation of its creator. Digital humanists should question the sense and intention of a visualization if it needs a textual instruction to interpret it (the really brilliant blog post ‘The 7 Kinds of Data Visualization People’ by Elijah Meeks still amuses me in this context).
Inspired by Cole Nussbaumer Knaflic (@storywithdata , Storytelling With Data, New Jersey 2015), I started to overthink the ways humanist present their data. I’m sure that all of us want to make an argumentative point in our thesis or paper and that we want to convince our readers of that specific idea. Often, the data lead us to ideas, which would support the argumentation, but we spend less time shaping our visualizations. I do understand why. We are scholars working with texts and are used to write properly. We are not used to design graphs in an understandable and persuasive way, following the ground rules of graphic design and typography. Now, after I thought about visualizations, I’m firmly convinced that the aesthetics and design of a visualization are more important for the support of an argument than their actual content. Design must lead the readers’ eyes to the most important aspect of a visualized argument to accompany written information. Finally, I wish somebody would write an instruction: “Storytelling With Data for Academics” (till then, please read Knaflic’s manual).
I already shared some aspects of “Why and How Digital Humanist should visualize” in a position paper (in German) on heibox for our InFoDiTex. I would be glad to hear your opinion.