(Data Visualization as a Communications Technology)
Let me try to change your mind about data visualization…
Human communications often entail an annoyingly complex set of requirements, processes and effects. You might think you’ve got an interesting idea but without a way to communicate it to others effectively, does the idea really even exist? Perhaps it does, but just not fully formed until it can and is conveyed. Maybe its conveyance is part of its formation. Or this might just be a ‘chicken or the egg, near a tree falling in the forest’ kind of mixed metaphorical conundrum.
Nonetheless, in my professional opinion, communications continues to be a problem for the data visualization industry, not to mention for myself in developing ideas about what these technologies do to, or with, human communications. So to that end, please indulge me in a brief story:
In 2015 I bowed-out of a corporate position to pursue a career in data visualization. I didn’t have a game plan other than to take a self-funded sabbatical, allowing me to study visualization as if it was my second dissertation.
Like many of those who now lead the visualization industry, I became keenly aware in graduate school of (academic statistician) Edward Tufte’s visually rich and eloquent monographs on visualized, quantitative information. But at the time I didn’t quite appreciate how to work with his ideas.
A few years later as a newly minted ‘doctor’ rather than pursuing an academic career, I happily launched a professional journey that took me through numerous analytical and research roles, over 20 years, in the North American and EU corporate and technology milieux. Vary as my work certainly did, the one common thread was that data visualizations, of one sort or another, allowed me to help colleagues be successful with quantitative ideas or empirical positions.
Yet the full power of data visualization became evident only when I was in the business intelligence industry. I had produced a high-resolution visualization for my Chief Marketing Officer that revealed patterns and correlations of product ownership among tens of thousands of corporate clients. Apparently the visualization was so effective at proving a point among key decision makers that, in appreciation, I was given an expenses-paid vacation. That CMO also encouraged us to follow our passions, which contributed to my sabbatical and career shift toward data visualization.
My visualization project has been expansive and exhilarating. I cast a broad net and studied data visualization from different angles:
- industries, corporations, associations and government
- academic R&D
- software and technologies
- professional services
- uses and users
- perceptual and cognitive psychologies
- human-computer interaction
- decision making and sense making
- visualization history and development
- key players, opinion leaders and industry analysts
- supporting media and business media
- influencing factors (e.g., machine learning /AI, analytics, data, distributed processing, HTML5, data governance and privacy)
- strengths, weaknesses and opportunities.
The ah-ha moment only came while I was reading a book about D3. The author had just explained some code that renders a line graph (with time along the X axis and with data point markers). The text continued by explaining how you could modify the code so that the line wouldn’t appear but the data markers would still be visible. It concluded something like: and now you have a scatter plot. And to this I blurted out: “No you don’t. You still have a time-series, but it’s just missing its lines !” This moment led to two basic realizations:
(1) Some people (such as myself) when seeing a visualization will ‘counter-argue’ such as challenge its interpretations or premises or questioning its motivations. If the topic has high issue-importance to the individual, they will use central cognitive processing when engaging with the visualization. A visualization may also affect some viewers in different ways– attitudinally, behaviorally, or cognitively– as if they were receiving and responding to a communicated message.
(2) A fundamental difference in perspective between computer scientists and social scientists seems at play here:
Computer scientists have tended to focus on ensuring visualizations are true to their data; and that visualization users are able to retrieve information quickly and accurately,
Social scientists tend to be more broadly concerned with context and consequences of visualizations. By this I mean a tendency to focus on ensuring visualizations are true to the original empirical things or concepts that the data were and are intended to measure; and the effects/impacts/outcomes that a visualization may have on the people who view and engage with it.
The second realization, although a generalization as it certainly is, suggests a subtle but quite important distinction in perspectives.
Computer scientists working in the visualization field– including those in the closely related area known as visual analytics– continue to be exceptional leaders and make great advances in the tools and capabilities. But, while leading us in this field, it feels as if the they’re on one side of a conceptual canyon, building a technology bridge over to our side. All the while, current and potential users of data visualization (business, journalism, government, NGOs, activists, lobby groups, community associations. etc.) are on this side wondering if and where the bridge would finally connect, not to mention if its footings at that place would be solid and optimally fit for purpose.
Certainly by the late 1990s, computer scientists developing data visualization tools had understood and articulated a principle that at its core the visualization technologies needed to enable two things:
(1) First, it had to enable a fundamentally visualized form of analytics that engaged its users based on core perceptual and cognitive mechanisms.
(2) Second, it had to enable people to communicate and share with others share their visualization results and findings
The computer science R&D community continues to do exceptional work in the first but I – with a quantitatively oriented Ph.D. in communication theory and research and with the greatest respect– believe their extant research literature on aspects of how data visualizations communicate, could be much stronger.
One attempt I made at communicating this message to the IEEE computer science community that works on data visualization, wasn’t a huge success. That paper (full disclosure: the paper was rejected) was trying to argue that the implicit model of human communications, which underlies data visualization R&D, appeared, to a trained outsider, to be ineffectively and unnecessarily simple in substance and organization. Also the paper noted the curious lack of cross-over between the computer scientists’ communication research and the social science discipline and sub-disciplines of communications (a notable exception being data-journalism). But to be fair, perhaps the academic Communication Studies research community could have been insular in its own way, with attentions focused elsewhere (e.g., US political communications). Ultimately, that paper may have been the wrong message, through the wrong channel, at the wrong time.
Since then, the ideas have evolved further and are discussed in the remainder of this article.
Current visualization technologies appear to be based or developed in a notion that a data visualization needs only to communicate (with validity and reliability) the data or information that was intended to be visualized. Communication processes or effects, such as storytelling, presentation, narrative, rhetoric, metaphor, persuasion– which have all been research topics of numerous IEEE data visualization research papers over the past dozen years– have been thought to operate at this data or information level, or expected to operate by virtue of cognitive or perceptual mechanisms and processes. But it’s not that simple.
This is a curious situation. Certainly data visualization technology is particularly effective at wrangling complex data and helping us to see and understand complexity in those data. But despite its inherent complexity-management capabilities, the visualization technology and tools seem to adopt a rather uncomplicated view of how the visualization related communication processes and effects fully play out.
Human communication is remarkably complex and multi-dimensional in most instances. It operates concurrently on various levels. When engaging with a data visualization, a person may be receiving or interpreting a message from it. People may also feel a need to comment-on or interpret the visualization, and then they may agree with its message or resist it, by counter-arguing against what they believe the visualization is suggesting. This is precisely how people respond to communicated persuasive messages.
Professional users of data visualizations– notably including categories of: large corporations, data journalists, activists, community leaders, visualization boutique design firms and visualization pundits– generally appreciate and understand that visualizations communicate at different levels than just data or information. They design visualizations to convey messages, arguments or points of view. But the general approach to optimize its effectiveness seems more qualitative than quantitative, and based in professional wisdom, leading practices or tips-’n-tricks for creating better visualizations; depending on how the word ‘better’ is being defined.
This points to a potential opportunity. Data visualization technologies have considerable power to support and help us understand and improve human communication processes and effects arising from people viewing visualizations. This can be accomplished with empirical, quantified, measured context and response, which is visually integrated into the visualizations.
Simply put, we can build visualizations with the necessary metadata context and feedback mechanisms that would satisfy the communications requirements of their users, not to mention support the development of the professional visualization industry and practice overall.
Let’s think about designing data visualizations so they are optimal from a message-effects perspective. In which case, visualizations need to anticipate the communication needs around receiving and processing a message that might be presented through a visualization. Consider these next six core ideas.
(1) First, let’s posit that not only do data visualization messages exist in an empirical sense, but that each and every data visualization necessarily conveys at least one message. This holds whether or not the person who made the visualization intended it to convey a message. Its message may not necessarily be profound or interesting to some who see the visualization; so let’s accept that subjective interpretation in this process is unavoidable and that different people viewing a visualization will understand it differently based on their background, experience, attitudes etc.
(2) Second, from a message perspective, any message in an empirically quantitative data visualization must necessarily follow and be constrained by certain mathematical and statistical principles. We should ask: What is it that we could possibly say with data in a visualization? So based on a simplification of descriptive univariate statistics (the moment system) with bivariate, multivariate and inferential analogues, let’s assert that a data visualization’s message can only be built on or comprised of five basic elements:
That’s it. A visualization message cannot be about anything else, except if it happens to be a composite or sequence of those five elements. Composite messages (arguably analogous to an argument) are subject to order effects. This means that the permutations of and interactions among these basic elements certainly matter, and not coincidentally, such variability can be visualized.
(3) Third, because we have intentionally simplified the message-dimension, we should be careful of new complexities that emerge in other areas. In this case we need to understand the new complexity within the metadata.
When we make decisions based on visualizations, the comprehensiveness and quality of the metadata has to be confirmable and unimpeachable. The actual data itself can be of a lower quality, but as long as we correctly understand its data-quality level, thats what matters in decision making. And that kind of information is in the metadata.
A social scientist might think of metadata as the ‘measurement model’; a data governance professional may think of it in terms of Master Data Management for Analytical Metadata. A combination of these perspectives, plus the current MDM and visualization technologies, gives us ways to use visualized metadata quickly to confirm the validity and reliability underlying any visualization, or to alert to potential analytical risks.
(4) Fourth, people who engage in communications, such as those viewing, interacting or working with a data visualization, are often in the role or mode of information seeker. The viewer’s response to a data visualization message, after their initial cognitive assay of its topic and content, would develop typically in one of two basic ways:
If the visualization’s topic is of high/higher issue importance to the person viewing it, they will attend to the visualization more directly, using central cognitive processing to assess and interpret its message. These visualizations are expected to have mainly cognitive or behavioral effects on viewers, but they may also have attitudinal effects.
If the visualization’s topic is of low/lower issue importance to people viewing it, they will likely engage the visualization using peripheral cognitive processing and will tend to attend to the more animated, comical, artistic or otherwise eye-catching aspects of the visualization (attributes that are referred to, by some, as ‘junk charts’). These visualizations are expected to result in more attitudinal or dispositional effects on their viewers.
(5) Fifth, it’s often useful to think of communication as a negotiated exchange of information. Think of it this way: on the one hand, those people who build or present visualizations have the information and ability to specify and display its metadata more rigorously, but what they don’t have is an understanding of how the visualization is being interpreted or responded to. On the other hand, those viewing visualizations could provide feedback in real time as to their particular interpretation or response to the visualization, but they may not feel fully confident in basing decisions on what they are seeing, at least not without being able to confirm the relevant metadata.
In exchange for their feedback, the visualization viewers could be given the opportunity to see how their responses compare to others in general. An organization’s customers, clients, stakeholders or members of the public are more likely to provide candid feedback if they know that they could immediately see how they compare with others (social comparison being a broadly motivating desire). But note this process will more likely work and be sustainable in an environment of fair information practices and identifiable participants who are accountable for their information and responses in the exchange.
(6) Sixth, visualization messages can and do impact people’s attitudes, behaviors and cognitions. Similarly the viewers’ prior attitudes, behaviors or cognitions effect/influence how people respond to a visualization. This includes people’s attitudes or beliefs about the topic of the particular data visualization, or regarding the organization behind a visualization, or the person who developed the visualization, or the organization that collected the data.
Since this is a communication exchange, the viewer’s beliefs about the credibility and trustworthiness of all upstream parties are unavoidable factors in the “equation”. The visualization viewer’s analytical skill and quantitative aptitude also contribute to their interpretation of the message. Further, that different people will interpret some visualization’s message differently, is a reality that is probabilistic (stochastic), but this too can be quantified and visualized with the current tools.
Now even if you’ve only skimmed that last section, which could have been of high or low issue-importance to you, I do appreciate you following me this far. To wrap-up the personal narrative, my sabbatical ended this past fall. I’m still learning new technologies and developing visualizations as proofs of concept. But now my efforts are focused on a new business that delivers strategy and research to support a message-effects understanding of data visualization.
Data visualization as a communication technology now brings us, government, business and civil society, to the point where, this technology’s ability to convey visually honest and effective messages, is indeed its medium.
For more information check us out.
Code for the D3.js visualization examples in this article is based in part on various works of Mike Bostock and Harry Stevens.
Apologies to McLuhan proponents for the playful title.