A critique of the storytelling body of research and the need for an end-user focused communication model
by Ross Waring ph.d., Vividdata Visualization Inc.
Increasingly, we live in a world of complex data. Being able to work with and discuss that complexity is vital for our collective success. This is especially true when tackling the big, difficult problems like structural racism or global pandemics.
Fortunately the technologies available to process, manage, wrangle and understand complex data about those big problems, are themselves complex and powerful. Notable among these are the various technologies developed for data visualization and visual analytics purposes.
As early as 1989, Kosslyn, a cognitive-psychology pioneer in the data visualization field, identified that graphics fundamentally needed to fulfill two basic functions: analytical and communicative.
The technologies’ analytical functions have certainly advanced to the point where today we can create highly complex visualizations that could not even have been imagined, let alone produced, a dozen years ago. Yet our understanding of how data visualizations communicate is still relatively thin. The currently prevailing de facto model, applicable at least to some visualization types, is known as the storytelling approach. Beyond that, there is relatively little else to support knowledge about visualization’s communications functions.
The main idea of this paper is that the data visualization and visual analytics industry, as it developed over the last 30+ years, had shaped and constrained how we imagine and then use data visualization technologies in support of human communications.
To explore this idea, this research paper:
- describes the data-visualization industrial context in terms of its main structural and functional segments.
- talks about the visualization professional practice’s functional lore including ongoing beliefs about storytelling’s uses and benefits.
- examines the academic R&D literature on data visualization storytelling plus its related concepts.
- critiques that literature in terms of its premises, theory and methodologies, plus raises other general causes, consequences and concerns about storytelling in visualization.
- suggests an alternative end-user centric approach that focuses not on what makes visualizations ostensibly unique but instead considers key commonalities across visualization types and instances.
Before going further, we should be clear about this concept: data visualization. There have been several terms and various definitions put forward over the past 30 years to mean what might reasonably thought of as data visualization. Most of them have tended to be defined specifically, perhaps too narrowly, such as: scientific visualization, vs information visualization, vs infographic, etc. Then within something like the information visualization category, further subgroups have been proposed, such as narrative visualizations or rhetoric visualizations, among others.
Rather than being too specific, let us cast a wider net and deliberately adopt a broad starting point. At a high level we could suggest:
Data visualization can be defined as the use of display and computer technologies, tools and techniques—to represent and interconnect often widely varying, data, in the form of one, a combination, or a sequence of visually sensed meaningful, empirical, graphical images.
Obviously the word “meaningful” here is ambiguous and loaded: it needs greater elaboration. Nonetheless, in this view, data visualization is primarily a communication medium, rather than just an instance of visualized data. Thinking of it as a medium, lets us consider a broader-range of potential effects or processes that could result when two or more people use data visualization technologies to facilitate their communication.
The data visualization industry has, and has had, certain macro-level characteristics, structures and relationships. These constitute data visualization’s industrial-structural context. People in the industry formed and perpetuate the currently popular notions about how data visualizations effectively communicate, in alignment with, and in many respects due to, this industrial-structural context, which is discussed in the next section.
Data Visualization in an Industrial-Structural Context
First note that visualization was made possible and impelled forward due to the important influences and interactions of several large, mostly-technological developments. These include innovations in broad categories:
- Business intelligence software, enterprise management and decision support systems
- Advanced analytics, artificial intelligence, and data science generally
- Open-source software development
- Augmented reality, virtual reality
- Data scope/quantity, storage, capacity, data speeds, cloud computing, and IoT
- Data governance, data management, quality, individual privacy and protections
- Open-data, open government, access to information
These have been the technological forces that, together, helped data visualization, both as a set of technologies and as a professional field, to have developed.
Data visualization emerged and developed in a way that it formed its own somewhat insular industrial ecosystem. When we look at the key organizations and individuals in the visualization field over the past 20 years, we notice a few identifiable core segments. Structurally, the broader visualization industry appears to comprise three main groups. They are further supported by a few subsystems and subgroups. The main data visualization and visual analytics industry segments, which are also shown in Figure 1, are:
- Visualization academic research and development
- Visualization software & technology vendor and open-source sector
- Visualization global professional practice
1. Academic Visualization Research & Development Segment
The visualization field has been, without risk of hyperbole, dominated by computer science research and development (R&D), typically by those within a organization know as the IEEE (“I-triple-E”). This is a global association of computer science academics and professionals, a subsection of which has been singularly focussed on visualization technologies, techniques and algorithms.
IEEE’s first annual conference on data visualization was held in 1990. As we track IEEE visualization conferences over time, we see that by 1992 a separate symposium on “information visualization” emerged within the conference structure. Then by 2006 the conference, if not the entire visualization discipline as represented within the IEEE, had cleaved into subgroups of inquiry: SciViz (scientific visualizations), InfoViz (information visualizations) and vast (visual analytics science and technology). The subdivision of the R&D segment into main visualization types and functions may be more a function of the IEEE’s organizational and conference-logistics needs, than it is of any fundamental differences among visualizations (as a general category), or across visualization algorithms or software tools.
Nonetheless, the visualization academic research tradition is certainly situated squarely within the computer science discipline, and has very strong, specific roots in the general domain known as human-computer interaction (HCI) as well as deep connections to, and reliance on, fields of perceptual and cognitive psychologies. (see e.g., Ware, 2000)
Those who graduated from visualization-focused graduate degree programs in computer sciences were, for the most part, those who then went on to build and lead the visualization software industry, e.g., Tableau Software.
Because early computer scientists were and are unquestionably strong leaders in the visualization field, and as they were so specifically devoted to HCI as well as two areas of psychology, we would reasonably expect this might over-time accrue to an ideological limited field-of-vision in terms of data visualization’s development trajectory and perspective.
2. Visualization Software Segment
Among the industry’s core global visualization-centred software vendors, Tibco was founded in the 1980s, Qlik in the 1990s and Tableau in the visualization-heydays of the 2000s. These companies were initially more focused on visualization capabilities.
But over time they have tended to shift more into the analytic platforms category of software vendors. In that sense they compete with larger institutions of, e.g., IBM, SAP, SAS, Microsoft, Google and Amazon Web Services. All of those companies were enhancing their visualization capabilities in existing software and platforms.
Gartner, an industry analyst firm, when defining its “BI and Analytics Platforms” market, includes general enterprise analytics and business intelligence along with those core visualization firms. In this changing competitive space, the visualization capabilities may be taking a lower priority, compared with the analytical functions.
It is important to note that two initiatives early-on in the data visualization history, likely had a strong positive effect on popularizing the visualization field and practice. These were Tableau Public and ManyEyes (from IBM, now discontinued). Microsoft also has an open-source public software product named SandDance, which is operable but seems to have a lower profile than e.g., Tableau Public.
The direct users of visualization software – whether it is software paid for with license and maintenance revenues or software via an open-source project – they are the people who constitute the visualization professional practice as well as the sales market for the data visualization industry. This is the third main industry segment.
3. Visualization Professional Practice (market) Segment
The third segment, the professional practice of data visualization, is of a different nature:. This segment is far more heterogeneous, if compared with the other segments.
Those in this segment of the visualization industry, range from data-scientists, to report writers, to graphic designers, to citizen-journalists and freelance visualization builders. These are the people who generally build data visualizations as their primary or secondary work functions. To the R&D community these are the “visualization authors” and to the software vendors they are the purchase decision-makers or influencers as well as product users.
The practice of visualization has notably been applied in such fields and industries as:
- Applied Sciences such as healthcare, aerospace, environmental management, resource development and extraction, etc These tend to be particularly scientific and data-rich but often dealing with highly unique data.
- Enterprise and corporate uses of visualization mainly include internal uses: finance, operations, network, infrastructure, security and information delivery, as well as externally: corporate and industrial communications.
- Government national security, crime-prevention, audit, emergency management, as well as open-data initiatives.
- Data journalism as demonstrated in mainstream media (both online and print) New York Times, Washington Post, UK Guardian, Economist, as well as citizen journalism. (The notion that journalists and reporters tell stories in their writing and that data journalists use visualization, also makes it easier to connect data visualizations with storytelling.)
- Advocacy or activism and community groups using data visualizations to press a cause.
- Influence or promotions e.g., marketing, PR, associations and interest/ lobby groups.
- Data-as-art, graphic design, and animation entertainment.
For the most part these professionals and practitioners are seen among the over 10,000 members of the Data Visualization Society, or among LinkedIn over 17,000 Visual Analytics group members, with about as many in its Data Visualization group.
In support of these various practitioners groups and use types, there are a few smaller but important sub-systems that are supporting the data visualization industry.
Business and Professional Media
Some key business and technology media have heralded the new visualization technologies in the 2000s and continue today. They include Harvard Business Review, Fast Company, Wired and Forbes. All of these media have had coverage, technology-editorial content or features on visualization. In addition certain online media, such as datasciencecentral.com, provide information and otherwise supports the profession.
Pundits and Opinion Leaders
The second group comprises an enthusiastic set of consulting pundits, authors, proponents, speakers, celebrity-practitioners and bloggers on and leaders of data visualization. These include notable figures as Cole Nussbaumer-Knaflic, Bill Shander, Stephanie Evergreen, Stephen Few, Alberto Cario, Andy Kirk, Elijha Meeks, Noah Illinsky, Kasier Fong, and David McCandless, among others. They are all champions of visualization. Some specifically advocate storytelling directly or indirectly. Also these individuals may already be predisposed to storytelling as a set of ideas because, as public speakers, they are practiced at it in the first place.
Within this visualization industrial-structural context, the concept of data visualization storytelling emerged and diffused into the visualization professional practice.
Origins and Evolution of Visualization Storytelling
Storytelling, explicitly associated with data visualization, is a notion that is part of a larger, predominant data visualization belief system. Getting to the core of what storytelling is about, requires casting a wide conceptual net.
First, what do we mean by storytelling as it relates to visualizations? Prima facie, it is unlikely to be a simple, singular concept; rather, it is a more-complex, multi-dimensional and socially-influenced construct. To understand storytelling in context, it’s important to consider the broader system of beliefs in which storytelling, as a set of principles, is situated.
If you focus on how people speak about the topic of data visualization, e.g., in their presentations or blogs on the subject, it reveals an underlying popular lore about visualization. For instance, many of us working in the industry, earlier in our careers, likely had been in the audience of a presentation on data visualization. Reviewing such presentations on the Web reveals a pattern of recurring beliefs, the most prevalent of which are paraphrased in Figure 2.
These are the most commonly observed folklore-like points used in presentations and communicated among members of the visualization community. Some points are well based in rigorous science; others are not as sound. But that is a broader and different conversation. Veracities and verisimilitudes aside, all of these statements and beliefs are arguably socio-subcultural objects of the visualization industry. These are elements of contemporary data visualization’s popular vernacular, its conventional-wisdom, its lore, essentially part of its lingua franca.
It is notable that conversations about these points are often told as stories, and so they tend to align generally within or next to the storytelling discourse. Together they provide a common basis by which those who work in the broad data visualization field, share a professional understanding about how to regard, develop, build, use or work-with data visualizations. Obviously this is a functionalist perspective.
Among these beliefs, the three that pertain to storytelling are of particular interest to this paper and will be discussed further on. The key point here is that ideas around storytelling exist within a larger context of dominant ideas and beliefs, within the global visualization community.
A wide range has been written, both print and online, about storytelling as it pertains to data visualizations, or applied to data in general. Much of it includes material from the popular press, business media, blogs and software marketing materials alike. Those works tend to provide lists of points, ideas or ‘tips and tricks’ on how to achieve data storytelling. Those works are not used in this paper. The reason is that those writings generally lack fully articulated logical structure that are formed on scientific evidence with anchoring theories and principles. On this basis, those pieces do not provide enough information to be included, at this time.
The body of R&D research literature that remains for deeper analysis, constitutes the basis on which we could reasonably substantiate the merits of storytelling for the purposes of data visualization. The next few sections describe its development.
Visualization Storytelling Academic R&D
The idea of storytelling related to data visualization first appeared in the research literature during the foreboding calm of August 2001.
Figure 3 shows the main works within this research literature on which storytelling in visualization is fundamentally based. Most of it originates from within the IEEE organization. This figure is not necessarily exhaustive; some additional articles or conference papers on visualization storytelling (and related concepts) may be absent from this figure (cf. Tong et al., 2018), but those shown are the main, particularly relevant pieces for the present analysis.
Even though storytelling emerged in 2001, we need to consider its roots, which extend back into the last third of the 20th century.
Visualization Storytelling Roots, 1967 to 2000
In 1967 Bertin, a French cartographer who, with colleagues, wrote Sémiologie Graphique, which exhaustively codified a framework of representing quantitative data onto Cartesian maps. Their underlying analysis was ostensibly based in semiology (or semiotics, or sémiologie, as in the work’s French title), which roughly is the study of communication through signs and symbols.
One issue with Sémiologie is that as a semiological analysis it is incomplete. The work explicitly takes data as a given, and this means any potential impacts or effects of a visualization, due to differences in source data or differences across people in terms of their individual differences in understanding data (e.g., individual subjectivities toward data) are out of its scope.
Then the 1970s provided a few key influences. The data visualization industry and especially its professional practice (not necessarily the academic quarter) often uses Anscombe’s quartet (Anscombe,1973) as visualizable evidence to support a notion that graphics are preferable or superior to tables of numbers, presumably in terms of readily noticing differences or anomalies. Around the same time, statistician Tukey (1977) was making an argument for exploratory data analysis, i.e., looking at the data graphically without necessarily beginning with theory-informed hypotheses.
While these two works do not impact the concept of storytelling directly, they certainly influenced the methodologies used to research visualization storytelling. First it fed the belief that graphics are necessarily superior to numbers because they can show things – in an exploratory manner – that numbers supposedly could not. Second it further endorsed exploratory analysis generally, leading many to view it as a sufficient methodological approach for the purposes of understanding data-visualization storytelling, for many of the studies that would follow.
Then in the 1980s, another statistician, Tufte began (1983) and continued (1990, 1997) to raise interest in the elegance and efficiencies of quantitative, explanatory and informational graphics. In his 1983 monograph, specifically in the context of Minard’s well known multi-dimensional map-time-series composite graphic of Napoleon’s militaristic march to and from Moscow, Tufte comments that Minard’s graphic
“… tells a rich, coherent story with its multivariate data far more enlightening than just a single number bouncing along over time [implying a more typical time-series graph].” (1983, p.40)
In the same monograph, Tufte adds that, in general, attractive displays of quantitative information (among other qualities) “… often have a narrative quality, a story to tell about the data.”(p.177)
In Tufte’s first use, the idea of story appears to be metaphorical and descriptive, but in the second instance the story, or at least a sense of narrative, is conceptualized as if it is a quality intrinsic to a visualization. Note that this idea is initiated and substantiated with just the Minard example (n=1). It may be the case that some of the narrative impact of that graphic may be due, to some extent, to the viewer’s prior knowledge and beliefs about Napoleon and his vaulting aspirations.
Also in the 1980s cognitive psychologist Treisman (1985) published groundbreaking work that identified the presence of cognitive pre-attentive processing in vision. This was a pivotal perceptual-cognitive research finding for the visualization field, because it confirmed that cognitive efficiency, via information search speeds, could be achieved through specific intrinsic qualities of visual stimuli. Subsequently, the visualization storytelling research seemed implicitly hopeful to discover a similar storytelling-specific cognitive process that might be perceptually triggered by some storytelling quality incorporated into a data visualization.
Then to round out the foundation on which storytelling developed, in 2000 Wilkinson published The Grammar of Graphics (2005, 2nd ed). It intricately and extensively imagines a framework to describe or even codify graphics in all forms. Wilkinson was explicit that this was not a typology of graphs (2005, p.2). Other works around the same time, were more typology oriented, such as Kosslyn (1989) and Johnson & Hansen (Eds., 2005). These works, whether precise typologies or not, suggest a desire in the field at this time to understand the scope and parameters of the emergent visualization domain, through specifying presumably exhaustive set of categories of, and considerations for, visualizations. For storytelling, the term ‘grammar’ in Wilkinson’s work may seem related, or at least relatable to the idea of story. But its use, metaphorical or literal, to explain storytelling is not evidently connectable.
Also in 2000, perceptual psychologist Ware published the definitive textbook Information Visualization, which arguably established the perceptual/cognitive psychology curriculum for information visualizations. This may have locked-in storytelling research, such that the research’s dependent variables– i.e., where you are looking for the ultimate outcome or effect– would usually default to: information retrieval speed and accuracy, or information memorability.
This background offers a general sense of the lead-up and context in which visualization storytelling then appeared.
Visualization Storytelling Emerged, 2001 and 2002
The idea of information visualization storytelling and visual metaphor come from defence-sector computer scientists, Gershon & Page (2001). The Gershon & Page piece is undeniably speculative and exploratory. It is not compellingly based in any established theory. The piece introduces the notion of visual metaphor applied to visualization without explicitly connecting it to storytelling. Most of the piece relates visualization to film or television scripts. Then the article concludes with (1) a justification of storytelling because it is ‘ancient’, (2) an implied comparison of storytelling to prostitution (as humour perhaps) and (3) and final discussion about technology and genres (not storytelling), which was mainly inconclusive anyway.
The next year, computer-science professorial team Wojtkowski & Wojtkowski (2002) delivered a conference paper also on storytelling in data visualization. While the piece states “In this paper we take a brief look at story telling and posit that storytelling allows visualization to convey information efficiently” the paper’s conclusion makes no reference to storytelling’s information efficiency.
These two pieces represent the initial articulations about storytelling related to information visualizations, i.e., still noting that storytelling may or may not be relevant to visualizations other than the “information” ones.
Reinforced Need to Focus on Communications, 2005
A curiously connected sidebar to the storytelling history, is that around this time, the 9/11 attacks in the United States eventually led US Department of Homeland Security, a few years later, to fund a research initiative through the Pacific Northwest National Laboratory in Washington state. This initiative was undertaken explicitly in response to potential or presumable intelligence deficiencies. Speculatively, deficiencies may have been connected to intelligence data visualizations, or possibly information failures may have occurred during the 9/11 attacks.
This research initiative was to collect the best perspective from within the visual analytics field (i.e., mainly the IEEE professionals who had been working on data visualization) to identify and articulate the optimal R&D research agenda for the visual analytics industry. This objective is explicitly stated in the study’s final report, Illuminating the Path: The Research and Development Agenda for Visual Analytics (Thomas & Cook, Eds., 2005). This report is also an IEEE publication.
In this edited volume when analyzing the body of text, the word communication appears more frequently that the word computer. This is telling and certainly indicative of the importance placed at this time on the communications function in visual analytics and visualization generally.
The report considers and discusses a few communication ideas. It is mainly based in a set of ideas referred to as “grounding theory” as well as some other issues and principles related to crisis-communications as well as communications involved in sense-making processes. The report states: “The effect of mediating technology can be better understood through the use of communication models, such as Clark & Brennan’s theory of ‘grounding’.” Yet on reading Clark & Brennan (1991), grounding is not presented or built-out as a theory, nor even as a model of communication, but rather it is established as a subprocess within human communications, i.e., the grounding of a message’s meaning through additional contextual dialogue between people.
In spite of those recommendations, subsequent research did not adopt grounding as its core communicative concept, Instead it appeared to follow deeper explorations mostly of specific micro-processes that potentially underlay data visualization storytelling.
New Focus on Micro-Processes, 2006 to 2012
An early notable piece on visualizations and communication appeared at this time. Communication-minded visualization (CMV) was an idea proposed by Viégas & Wattenberg (2006) in response to their perceived lack of attention to communications in visualization research. A difficulty with this piece is that it does not actually develop a tangible problem for which CMV is the optimal solution, which in that context might then be a scientifically testable and verifiable approach.
Mainly, the works during this period focused on micro-processes, referred to here as visual-metaphor, visual-narrative and visual-rhetoric. These are slightly standardized wordings to reinforce that these constructs had been explicitly developed in terms of data visualization, as opposed to human communications more generally. Some of the articles behind these concepts do not always make a direct connection to storytelling, but they obviously fall within the same intellectual vein.
Visual-metaphor was the first such micro-process examined. This notion ties back to the Gershon & Page article, but this foundation is not consistently explained across visual-metaphor studies. Central to this is the research by Ziemkiewicz & Kosara (2008) which explores further the nature of visual metaphor, but without reference to Gershon & Page’s earlier thinking.
Their idea was that if a verbal metaphor (used in an experimental question and put to a subject) was consistent vs inconsistent with the supposed visual metaphor used implicitly in the test visualization, as the experimental stimulus, then cognitive response would be faster. Consistent metaphor structures did have an effect but only when the experimental question was particularly difficult. Subsequently, Ziemkiewicz & Kosara (2009) explored and found some consistency effects when controlling for certain individual differences (i.e., gender and metaphor preferences). Yet these findings are not sufficient evidence to argue in favour of the general cognitive efficiency resulting from activation of a supposed visual-metaphor cognitive process.
Visual-narrative was another of these proposed micro-process ideas that was explored during this period. Segel & Heer (2010) were among the first to follow this notion; they base their work directly on Gershon & Page, even though the latter held that narrative was only a subset of the larger category of “story-like visual presentations.” Segel & Heer assembled a quantity of visualizations that seemed to be narrative in character, and then they used their judgements to analyze the implicit narrative structures, to come up with a typology of so-called genres of narrative visualizations. Due to the selection and judgement processes, this methodology risks selection bias that then creates some external-validity issues when generalizing the results. Many will recognize this piece as the source of the so-called “martini-glass” narrative genre. Hullman et al. (2013) considered the impact of sequencing in narrative visualizations. This is reminiscent of message order effects in the context of attitude change or compliance gaining research in social psychology.
Visual-rhetoric was the third of these related micro-process studied. Notable, Hullman & Diakopoulos (2011) looked at only narrative visualizations, but only those that seemed to have visual-rhetorical qualities, i.e., a further subset of narrative visualizations. The study’s team coded and classified the rhetorical components and approaches. Given that the 10 most frequent [rhetorical] techniques were as basic as color, fonts and source citations, this arguably extends beyond core aspects of rhetoric. This suggests construct validity issues with this approach.
During this period, another notable research finding was causing a stir. Bateman et al. (2010), found that, contrary to Tuftian expectations, embellished graphics (i.e., with extraneous design elements beyond the empirically essential) could indeed be more memorable, compared to their unembellished counterpart. Recall that memorability is a key default outcome or dependent variable in the HCI research tradition. The Bateman et al. article may have induced some cognitive dissonance within the visualization R&D community, because findings were at odds with the intuitive logic of Tufte’s highly regarded guidance.
Return to Storytelling Phase, 2013 to 2018
The next phase in the storytelling research tradition was more one of reflection. Many visualization researchers during this period had returned to the core notion of story or storytelling (e.g., Figueiras 2014a&b; Boy et al. 2015; Lee et al. 2015; Rodríguez et al. 2015; Thudt et al. 2017).
Importantly, Kosara & Mackinlay (2013) took on the challenge to clarify and perhaps reinvigorate the storytelling concept with a thought piece. Recognizing that storytelling in visualization had been somewhat vague, they offered renewed definition: “We define a story as an ordered sequence of steps, with a clearly defined path through it.”(Kosara & Mackinlay, 2013) In doing this, storytelling transitioned from having concepts that were too vague to ones that were now overly broad. Kosara & Mackinlay also contributed the idea of storytelling “affordances” which are those various qualities within a visualization that enable a story to be realized.
While research into data visualization storytelling continued beyond 2018, we can reasonably conclude this discussion with the 2018 survey review article on visualization by Tong and colleagues (Tong et al., 2018). This review article examines the many, mainly IEEE, studies on visualization storytelling. Not surprisingly, the authors are consistent with methods used in those various studies: Tong et al. grouped the various storytelling studies into their various themes. Then they made observations about the research developments in those themes across studies. It is nicely descriptive.
However, this piece disappointingly is not particularly critical. It does not challenge any precepts or logic. For instance, within the first few paragraphs this review states “Throughout history, storytelling has been an effective way of conveying information and knowledge [citing Lidal et al., 2013]”; but the Lidal et al. article gives no sources or justifications for this point. Instead it was and remains an unchallenged dubious premise.
The Tong et al. review article includes discussion of visual-narrative, but does not mention the related ideas of visual-metaphor or visual-rhetoric. The article seems to understate Gershon & Page as the progenitive authors of visualization storytelling, but it overstates the extent to which Gershon & Page actually explains “…the usage of storytelling in information visualization.” A bit too much is taken on faith. -Yet for what it does covers, Tong et al. is a useful source.
Granted, the notion of storytelling has high face-validity in that storytelling is very broadly understood across cultures, which facilitates the idea’s diffusion through the visualization field as a popular approach. Most people would intuitively understand the idea of story. Nonetheless, storytelling is a highly abstracted and somewhat convoluted construct, specifically in the context of data visualization.
The construct is convoluted because, for instance, story also evokes a subjective need, felt by some individuals: the ability and opportunity to “tell their own story.” In this day and age a person’s ability to tell their story is undeniably important. But this is not in question for this paper, which concerns the scientific merits of the storytelling approach to data visualization effectiveness or efficiency.
If we consider the research literature holistically, a few types of fundamental weaknesses appear. These can be considered in three general areas:
Premises logically is the first area to to address any concerns. A premise is an assumed statement of fact, and premises are typically used, explicitly or implicitly, to underpin the start of an argument in general (e.g., each of the first three sentences of this paper). It is often taken on faith that a premise is correct, like a mathematical axiom. But unlike axioms, premises have higher risk of being inaccurate.
The following appears to be the main premises underlying or affecting storytelling. They are expressed in the literature either explicitly or implicitly:
- Storytelling is able to convey information efficiently. While some explicitly hold this (Lindal et al, Wojtkowski& Wojtkowski 2002) as a premise it is underspecified: i.e., more efficiently as compared to what exactly? Arguably there is no research or evidence to support this premise.
- There is potentially a ‘storytelling’ related cognitive process that if perceptually activated would increase the information effectiveness of a visualization. This second premise is an extension of the first. It is more tacit in the research literature. It represents the expectation that there is a yet-to-be discovered storytelling equivalent or analogue to Treisman’s pre-attentive processing.
- A data visualization communicates by virtue of qualities that are intrinsically designed into, and captured within, a visualization. The third premise is generally expressed in the research (e.g., Kosara & Mackinlay 2013). It is problematic because it fixes the locus of control (agency) within the data visualization instance. This means outcomes or effects can only be the result of what is manifestly designed into a visualization. This represents a limitation of scope.
- A visualization begins with data, and ends with a visualization being perceived by a person’s cerebral cortex.This fourth premise is generally and implicitly made within the literature to contextualize the scope of visualization development processes (e.g., Card & Mackinlay 1997). This creates another limitation of scope.
If any of the premises underlying data visualization storytelling is inaccurate or overly restrictive, then storytelling as an idea, model or approach would be less effective than anticipated.
The second area deals with the application of theory, or arguably the overall lack of it. Visualization storytelling, as a line of research, is effectively a-theoretical in basis and foundation. In spite of efforts and references to semiotics or semiology (e.g, Purchase et al. 2008; Vickers et al. 2012 applied to information visualization more generally), semiology as a theory is underpowered in this context and is not sufficient to meet requirements of data visualization.
Semiology, as inspired by Bertin, is occasionally used in connection to data visualization storytelling (qua narrative in Hullman & Diakopoulos 2011) or to information visualization generally (Vickers et al. 2012).
Lacking a sufficient theory, the research reasonably defaulted to using exploratory techniques instead. This would seem to make sense as data exploration using visual analytics is certainly well aligned with Tukey’s notions of exploratory data analysis using graphs. Yet at some point a theory, or two, about how visualizations effectively communicate would need to emerge in order to make further progress.
The current primary definition of a story (Kosara & Mackinlay, 2013) is likely too broad. Given its breadth, it arguably would also include any explanations, instructions, directions and even recipes for cooking. If anything has two or more parts plus a sequence through it, then it is a potential story for the purposes of data visualization. With a concept so broad any intuitive understanding people may have about storytelling then becomes less relevant, effectively a trade-off between construct validity and face validity.
Semiology, for its part, cannot on its own provide a sufficient scope of understanding of how data visualizations communicate. Without a more-effective theory, which would identify subconcepts and specify causes and effects, the research seemed to default to seeking effects among the core dependent variables used in the HCI tradition. Those were speed and accuracy of information recall from a visualization, as well as its memorability.
While long-term memory-support may have been the implicit purpose of storytelling in preliterate societies, millennia ago, this seems unlikely to be a relevant problem in the context of contemporary data visualizations or data use.
The third general area has to do with the methodologies used in some of the more empirical research studies.
Wilkinson’s Grammar of Graphics as a work is inductive in its approach to develop a framework to understanding graphics potential range, but it was a conceptual framework and not an empirical test of its validity. Further, its “grammar” was not one that was intended or able to be built-out, in an inductive sense, to result in any notion of visualization story.
Instead, the empirical research in the visualization-storytelling area has tended to be deductive in its methodological approach. Examples include studies that draw their test samples from the presumably naturalistic visualization-sharing environments of the Tableau Public or IBM’s Manyeyes platforms, or from existing news media (Segel & Heer, 2010)
The issue with research based on sampling from those types of sources is that it can only model forms of visualizations that have already been thought-of and accomplished. This type of methodology cannot point to or uncover any fundamentally new approaches, which would not necessarily appear on public visualization platforms. Ongoing reliance on deductive reasoning, over time, will generally add bias to a research perspectives such that we only notice those examples that are available to us in the first instance. It does not typically bring to light fundamentally new and different ways of imagining or seeing things.
Without robust theory to guide it, visualization-storytelling research methods tended to be highly exploratory with some studies using experimental designs, and others classificational in nature.
Generally there is nothing egregiously wrong with these methodological approaches, but the experimental studies would often draw on conveniently existing visualization examples, selectively picking specific types, then use researcher judgement to classify those into sub-types, and then test those sub-types using typical HCI types of dependent variables. Given the lack of theory this type of methodology will over time contribute to selection bias and confirmation bias.
Then to complete the cycle, as represented in Figure 4, the profession, via its members, perpetuates the ideas and beliefs of the visualization sub-culture, including the value and merits of storytelling. The members of the professional practice may or may not have been aware of the specific research literature on visualization storytelling. But those ideas and the belief that storytelling was somehow advantageous, diffused through to the profession, which seems to have adopted visualization storytelling as its main industry leading-practice or approach.
As an approach, it usually requires those who build visualizations to select from various “tips’n tricks” style guidance, or that storytelling ‘affordances’ could be designed into a visualization.
Even with that, arguably storytelling’s practical, beneficial outcomes are underspecified. This underspecification, in turn makes it difficult to prove what or how large-scale use of visualization software tools should be effective in an organization, or to confirm the expected return on investment to an enterprise.
Storytelling and the Visualization Industry’s Culture of Uniqueness
Storytelling arose and is mainly the product of two related forces within the visualization industry, author-centricity and that data visualization industry’s essential culture of uniqueness.
Visualization storytelling is an author-centric idea. It is a solution specifically for the visualization builder. It solves their problems, e.g., “How do I start designing this complex visualization?” It does not necessarily solve the problem of the visualization end-user, and likely not by a long shot. How many CEOs really want to hear that they’re being told a story?
Within the visualization industrial ecosystem, visualization authors are an important group to the R&D community because those authors need to use the algorithms or tools to build visualizations. Those authors are of interest to the software development segment because they are the purchase decision makers or influencers for the software purchase. It is clear why the industry is author-centric.
Author-centricity allows and encourages us to imagine and position the most important communication process as something that is done by the visualization author and is contained or containable within any given visualization instance.In this author-centric way of thinking, the data scientist has a single or narrow range of objectives for a visualization, and then builds the one best visualization to meet that purpose. This is one-to-one thinking: single purpose through to a single visualization outcome. This is evident in the heuristics and decision trees shared and used among visualization professionals.
At a more general level, there appears to be a culture of uniqueness in the data visualization industry and field. This culture is reinforced generally by the idea of intellectual property and patents, which incentivizes the development of expectedly new and unique (and therefore patentable) data visualization algorithms, types, processes or approaches.
Obviously the open-source aspects of data visualization encourages reuse of code components, creating efficiencies. Yet the entire industry is focused on allowing an author to build a perfectly suited visualization for a given purpose, unique to that extent.
Visualization storytelling is consistent with and cultivated within this culture of uniqueness: yet storytelling is more of a one-to-one solution in an otherwise world of many-to-many problems. Storytelling alone is not sufficient for the purpose of communicating fully and effectively with data visualizations.
Further, in terms of consequences, storytelling might cause information silos. The idea of a ‘story’ most likely denotes to some, and connotes to others, a sense of completeness. This sense may also reasonably be presumed to apply to the set of data or information contained in a data visualization story.
A story typically has a beginning, middle and an end, with narrative arcs, final outcomes and resolutions of conflict etc. A simple story (except for stories in a series) is typically thought of by the readers or listeners as a complete work. Arguably, if a visualization specialist builds a visualization story this may then give the viewer the potentially unrealistic expectation that the data, analysis and information in that one visualization are all complete.
Consequently, a storytelling style could suggest to a viewer that there is no need to look elsewhere for confirming or possibly contradictory information. In this sense, data visualization storytelling might contribute to a siloing effect in information, a potential risk of confirmation bias, where not enough beyond-this-story connections are being made to other data, facts or visualizations. This obviously would not be a good outcome.
The last section of this paper suggests a more end-user centric model, which data visualization authors, developers, builders, data scientists might consider. It should help ensure that visualization end-users’ needs are better met.
How to communicate with a data visualization
In comparison to visualization authors, visualization end-users are a particularly crafty lot. They will use, reuse and repurpose visualizations (or presentations, reports, strategic projections, etc.) to try to meet the tasks at hand. In an end-user centric view, it is vital to consider and manage these many-to-many relationships. In practical terms, a single point (of an argument) could be made using one of several available visualizations or types. Also, realistically any one visualization could meet numerous objectives, directly or indirectly. This is in part why a storytelling approach may be overly simple given the potential complexity of uses and expectations of data visualizations.
Our understanding of data visualization and how it best communicates would benefit from broader thinking: rather than asking “How do data visualizations communicate?” ask instead “How could people (two or more of them) possibly communicate using a data visualization?” This shift in perspective allows an end-user centric model.
What follows is not a full model, by a long shot, but rather an introduction to it. A more detailed paper will eventually follow. In the meantime this text is directional.
First is a high-level point. Assuming that you, the reader, are a builder of data visualizations, please no offence is intended but your prized visualization is not particularly important to its end-user. For that matter, you data scientists, the data are not all that important either. What is most important to end-users is being able to focus on the actual people, objects and events etc. that are measured and recorded as data and then represented in a visualization. This may seem like a fuzzy point, but whatever you can do as a visualization designer or builder to keep a viewer’s attention focused on those actual people, objects or events, the better.
Because of the many-to-many relationships between needs and visualization options, an end-user will need to understand certain qualities that actually are common across data visualizations, as opposed to what makes any given visualization unique, e.g., its story.
Assuming end-users primarily need to understand and speak to commonalities across data visualizations (as well as integrate this with other empirical evidence), they will more specifically need to know if, and to what extent, any given data visualization at-hand enables them to speak to each of the following dimensions:
- Data. This means the raw data. How visible and/or accessible are the actual raw data used in or underlying the visualization. Are they available for peer review or reanalysis? Data here means only the data and does not include any meanings of codes, symbols or explanations; as those are metadata.
- Metadata. This means all the structure and rules behind the raw data. How visible and documented are all metadata? Those are all the related data types, names, symbols, labels, explanations, methods, rules, definitions, qualifications, caveats and governance that apply and are attached to the raw data, giving it meaning or context. This dimension can become very complex, but is hierarchically manageable.
- Analytical Dimensionality and Structure This means all the relevant facts about the analytical dimensionality (e.g., univariate, bivariate, multivariate etc), and statistical-structural characteristics (dependent vs independent, associations, specific aggregations or statistical transformations) that are fundamentally inherent in any particular data visualization, as designed and built. This dimension essentially reflects how the visualization was put together from the data & metadata, and by extension how to decipher it. Also this dimension is necessarily applicable to all other valid uses of quantitative empirical data. This means we can use it to compare one visualization to another, as well as compare visualizations with other use of empirical quantitative data, such as in report statistics, quantitative presentations etc.
- Magnitude – Given the data, metadata and how they were put together to form the visualization, does the visualization express any aspect of any of the data’s empirical quantitative magnitude? How and to what extent is magnitude clearly shown, and for which variables, but not for others? If the visualization shows sums, totals, cumulative values, counts, differences, then these are some indicators that it expresses magnitude.
- Centrality – Given the data, metadata and how they were put together to form the visualization, does the visualization express any aspect of any of the data’s empirical quantitative central tendency? How and to what extent are any measures or indicators of central tendency shown, and for which variables, but not for others? If the visualization shows means, medians, modes, trends, correlations, measures of association, R2s, then these are some indicators that it expresses centrality.
- Variability – Given the data, metadata and how they were put together to form the visualization, does the visualization express any aspect of any of the data’s empirical quantitative variabilities, its distributions, exceptions or outliers? How and to what extent are any indicators of variability shown, and for which variables, but not for others? If the visualization shows standard-deviations, variances, skewnesses, kurtoses, outliers, exceptions, or marginalized cases, then these are some indicators that it expresses variability.
The key point is that every piece of information that is empirical, quantitative and functionally useful to an end-user, and could be contained within any data visualization, we contend, necessarily falls into the structure of these six dimensional categories. This makes its useful, comparable information available for extra analysis, consideration or query, as needed. Further, a visualization builder’s attention to providing a thoughtful balance between magnitude, centrality and variability will typically help visualization end-users make better-informed decisions.
As a proposition, modern data visualization and visual analytics tools and technologies are uniquely powerful in that they allow visualizations to show more of all six of these qualities, not only concurrently but in greater detail, resolution and clarity, when compared to traditional, static, lower-resolution graphics, reminiscent of previous decades. And the tools potentially allow us to connect information and insights across visualizations, and allow us to jump from an executive dashboard to a strategic plan to corporate regulatory filings, by clicking on the qualities that are common across them.
However, the curious problem remains that the new visualization tools are being used often just to replicate old-style graphic approaches. Research that is based on existing visualizations, which often replicate the old techniques or traditional ways of seeing things, is not particularly valid in fully understanding how data visualizations could communicate. Consequently data visualization technologies’ potential for supporting communications is still most certainly under-developed.
Obviously this paper raises a number of questions that require further elaboration, exploration, measurement, testing and confirmation. Also note that this approach is intuitively connected to and has implications for data literacy and analytical literacy.
In the final analysis, over its diffusion and throughout its industry, the data visualization and visual analytics innovations are not yet systematically enabling end-users to get what they fully need to support communications. Instead visualization end-users seem constrained and limited, in several critical ways as explained in this paper, by the otherwise ancient experience of having been told a story.
Anscombe (1973). “Graphs in statistical analysis” in The American Statistician, 27(1), pp. 17-21.
Bateman, Mandryk, Gutwin, Genest, McDine & Brooks (2010). “Useful junk? The effects of visual embellishment on comprehension and memorability of charts” in Proceedings of the SIGCHI conference on human factors in computing systems, pp. 2573-2582.
Bertin, avec Barbut, Bonin, Arbellot, Guermont, LaPeyre, Recurat, Salamon, Vergneault, Abedi-Miran, Bertrand, Letarte, Bonin, Dufrene, François, Mako & Pottier (1967). Sémiologie graphique: Les diagrammes, les réseaux, les cartes. Paris: Des Éditions de l’École des Hautes Études en Sciences Sociales.
Bostock, Ogievetsky & Heer (2011). “D3 Data-driven documents” in IEEE transactions on visualization and computer graphics, 17(12), pp. 2301-2309.
Boy, Detienne & Fekete (2015). “Storytelling in Information Visualizations: Does it Engage Users to Explore Data?” in Proceedings of the 33rd Annual ACM Conference on Human Factors in Computing Systems, pp. 1449-1458.
Card & Mackinlay (1997). “The structure of the information visualization design space” in Proceedings of VIZ’97: Visualization Conference, Information Visualization Symposium and Parallel Rendering Symposium, pp. 92-99.
Clark & Brennan (1991). “Grounding in Communication” in Perspectives on Socially Shared Cognition, pp. 222-233.
Figueiras (2014 a). “How to tell stories using visualization” paper at IEEE International Conference on Information Visualisation.
Figueiras (2014 b). “Narrative visualization: A case study of how to incorporate narrative elements in existing visualizations” paper at IEEE International Conference on Information Visualisation.
Gershon & Page (2001). “What storytelling can do for information visualization” in Communications of the ACM, 44(8), pp. 31-31.
Hullman & Diakopoulos (2011). “Visualization rhetoric: Framing effects in narrative visualization” in IEEE transactions on visualization and computer graphics, 17(12), pp. 2231-2240.
Hullman, Drucker, Riche, Lee, Fisher & Adar (2013). “A deeper understanding of sequence in narrative visualization” in IEEE Transactions on Visualization and Computer Graphics, 19(12).
Johnson & Hansen, Editors, (2005). The Visualization Handbook. Burlington, MA: Elsevier.
Kosara & Mackinlay (2013). “Storytelling: The next step for visualization” in Computer, 46(5), pp. 44-50.
Kosslyn (1989). “Understanding charts and graphs” in Applied Cognitive Psychology, 3(3), pp. 185-225.
Lee, Riche, Isenberg & Carpendale (2015). “More than telling a story: Transforming data into visually shared stories” in IEEE computer graphics and applications, 5, pp. 84-90.
Lidal, Natali, Patel, Hauser & Viola (2013). “Geological storytelling” in Computers & Graphics, 37(5), pp. 445-459.
Purchase, Andrienko, Jankun-Kelly & Ward (2008). “Theoretical foundations of information visualization” in Information Visualization, pp. 46-64. Springer.
Rodríguez, Nunes & Devezas (2015). “Telling Stories with Data Visualization” in Proceedings of the 2015 Workshop on Narrative & Hypertext. ACM.
Segel & Heer (2010). “Narrative visualization: Telling stories with data” in IEEE transactions on visualization and computer graphics, 16(6), pp. 1139-1148.
Thomas & Cook, Editors, (2005). Illuminating the path: The research and development agenda for visual analytics. National Visualization and Analytics Center (Pacific Northwest Labs).
Thudt, Perin, Willett & Carpendale (2017). “Subjectivity in personal storytelling with visualization” in Information Design Journal, 23(1), pp. 48-64.
Tong, Roberts, Laramee, Wegba, Lu, Wang, Qu, Luo & Ma (2018). “Storytelling and Visualization: A Survey” in VISIGRAPP 3(IVAPP), pp. 212-224.
Treisman (1985). “Preattentive processing in vision” in Computer Vision, Graphics, and Image Processing, 31(2), pp. 156-177.
Tufte (1983). The Visual Display of Quantitative Information. Cheshire, CT: Graphics Press.
Tufte (1990). Envisioning Information. Cheshire, CT: Graphics Press.
Tufte (1997). Visual Explanations. Cheshire, CT: Graphics Press.
Tukey (1977). Exploratory Data Analysis. Addison-Wesley.
Vickers, Faith & Rossiter (2012). “Understanding Visualization: A Formal Approach Using Category Theory and Semiotics” in IEEE Transactions on Visualization and Computer Graphics, 19(6).
Viégas & Wattenberg (2006). “Communication-minded visualization: A call to action” in IBM Systems Journal, 45(4), p. 801.
Ware (2000). Information visualization: Perception for Design. New York: Morgan Kauffman.
Wilkinson (2005). The Grammar of Graphics, Second edition. Springer.
Wojtkowski & Wojtkowski (2002). “Storytelling: Its role in information visualization” paper at European Systems Science Congress, 5.
Ziemkiewicz & Kosara (2008). “The shaping of information by visual metaphors” in IEEE Transactions on Visualization and Computer Graphics, 14(6), pp. 1269-1276.
Ziemkiewicz & Kosara (2009). “Preconceptions and individual differences in understanding visual metaphors” in Computer Graphics Forum 28(3), pp. 911-918.
Much thanks to Dr. G. Power and Dr. R. Rice for their kind and positively critical feedback on earlier versions.