Vividdata Visualization

Data Visualization is Out of Balance, A Follow Up.

Hi and welcome. If you’re reading this because you just watched my YouTube video presentation Data Visualization is Out of Balance, thank you for your continued interest by landing here.  

Or if you haven’t seen the video, you would benefit by viewing that first before reading this article.

Here also is the link. https://youtu.be/OnG1SkzL2GA

This article follows-up from that presentation and focuses specifically on a few things:

1st  as promised, this article provides some high-level questions that you can use to stimulate conversations on this topic among your colleagues, should you wish to explore the ideas further with them.

2nd I wanted to articulate a few further points about the DDM-Viz model in writing, which would have been too much information if included in the presentation.

3rd in anticipation of some direct and critical questions from people in the profession, I imagined some questions and provide at least some initial answers for consideration.

1) Discussion Questions for you and your colleagues on Data Visualization Effectiveness

For many years I worked in the marketing and public opinion research industry. I loved that I not only did many heavily quantitative studies, but I also had a knack for running focus groups. Qualitative research became a grounding counterpoint, showing me context and opportunities within the otherwise vast statistical results.

So, if I were moderating a small focus group right now with you and your key colleagues, here are the top questions that I’d put to you as a group. 

Take a read and feel free to use any or all of these to stimulate a good conversation among your colleagues. The objective is for you to get a strong sense if there’s an opportunity to enhance your role and the value you provide to your organization by leveraging these ideas.

  1. Over your career was there a time when you knew or felt that data were not being represented or used correctly? How did that play out, i.e., the outcomes?
  2. Have you experienced a professional situation where one set of numbers pointed a decision in one direction but different set suggested another direction? How did their original data differ? How was the discrepancy resolved?
  3. Was there ever a situation when you observed that numerical evidence for a decision was not presented visually as effectively as needed? Why? What were the various concerns?
  4. Have you ever felt you seemed to be following-up excessively on a data question(s) with colleagues? How was it resolved, in general terms?
  5. What would be your biggest worry(ies) if you were to give a data visualization about your key work, to your boss who needed to interpret it (without you) to your Board of Directors or Shareholders? Do you ever loose sleep over that?

As you can see these tend to be about context and consequences for data visualizations. Remember to ask questions of your colleagues such that they know they have the option to not respond if they choose.

2) DDM Visualization Model, a few more points

Another way to think about the model is essentially that it aligns with three points in time (relative to the building of a data visualization). These are when things could occur that could seriously jeopardize the success of a visualization.

  • what happens before the visualization is built: ie. the data and our correct understanding of it
  • what we have to decide during the process of building a visualization: the choices and approach
  • what happens after the visualization is built and deployed: the results, consequences and benefits 

For each we could imagine a dichotomy between the Industry Dominant position on one side, and the DDM-Viz Model alternative on the other.

IndustryDDM Viz Model
Data is simpleData are complex
Multiplicity of viz types and options
without clear functional hierarchy
Analytical Structure in the viz
Analytical structure is the better place to start
before selecting a visualization type. Structure
shows how the data need to be or have been
analytically interconnected.
Visualization StorytellingAspects of data ‘form’ the message
Aspects are like the principle components of
any message that could be based on
analytical, empirical data.
Perhaps multiple messages constitute a story.

Arguably, on one level the three on the left have a lower degree of interconnection and interdependency across them. Of course on both sides, one would select visualization types based on data types, but there is more room for error, arguably, in the Industry Dominant view on the left. The three on the right are relatively more interconnected, as each must be congruent with and more highly dependent on the others.

Anticipating some people reading this might feel dubious, I decided to come up with a range of self-direct questions that I could imagine people having after watching the presentation. In the next section I show the most direct questions with an as-direct-as-possible answer. The answer might not completely satisfy your interest, but at least it will provide a general notion of the rationale. 

Apart from the following Q&A that’s it for now.

Get in touch if you’re interested in the DDM-Viz model approach and hearing about how it can benefit your organization.

– Ross Waring.

Part 3. DDM-Viz Model Qs&As

 What’s the background/origin for view of data as being simple? 

This was a trend that was underway at least by the early-mid 1990s. Business Intelligence enterprise software vendors (e.g., BusinessObjects) were simplifying usage, with streamlined ETL as well as user interfaces with drag-and-drop icons, standardized data libraries and metadata. Data Visualization software vendors over the past decade may have adopted similar conceptual approaches as it may have been apparent that those BI and analytical tool software vendors would eventually move into the visualization and visual analytics competitive market space.

What’s an example where data complexity or fluidity, when harnessed in context of a visualization, provided value or benefit?

Data complexity refers not just to being absolutely correct and precise about the type of data (and measurement) but of other metadata, including exact specifications of any data that may have been systematically excluded from the analysis with required executive approvals. In one particular case this enabled streamlined year-over-year regulatory reporting. Any churn or error in core reporting, even prior to filing, would and did cause distraction to the Executives plus higher total costs of compliance.

Why is analytical structure necessary? Does it sufficiently capture all things that could go wrong in a visualization?

Much of analytical structure includes concepts of dimensionality, associations, and dependencies between data variables in statistical modelling. One classical approach  in multivariate analysis is to build a composite analysis beginning with univariate analyses, then progressing to relevant bivariate analyses, before continuing to multivariate stages with specified dependent variables. 

The structure is not sufficient but is necessary to establish the validity of the end-to-end analysis that may be explicitly or implicitly contained in a visualization. It is not sufficient just to know only the structure, because it works with deeper understanding of data and metadata, in conjunction with a deeper understanding of the aspects of data that are revealed in a visualization or any other analytical form.

What’s so bad about too many choices in visualization options etc? and why is analytical structure better/preferable?

There is a one-to-many relationship between structures and visualization types. A bar graph and a pie chart have the same analytical structure, the difference is how each represents the same data or information. In fact each of those two types is a modified version of the other.

Many other sets of visualization types have identical analytical structures. Too many choices is a bad thing only when the choices are offered prima facia without a sound hierarchical and functional selection criteria.

Analytical structure is a better starting point because it forces the visualization developer to decide upon the optimal or preferred way of putting the data together, combining them visually, to have the strongest intended analytical effect. Think of it as the visual analytics equivalent to statistical power (1-β) of a test – the visual power perhaps.

Why is storytelling lacking in validity?

Storytelling as it applies to information- or data-visualization and developed within a body of academic literature, roughly from 2001 (its inception with Gershon and Page) through 2018 (i.e., Tong et al. review article), is lacking in both external and internal validity.

It is lacking in external validity insomuch as its definitions and criteria do not map well or related to the actual world of people and visualizations. In Kosara and Mackinlay’s (2013) the definition of story is so broad as to include, by implication, any instance of a visualization that has any manifestation of sequence in it. Also the notion of “storytelling affordances,” from the same authors, clearly still locates the locus of storytelling such that it is imagined as an attribute or quality that has been built-in to a visualization instance. 

This would not align to the notion that storytelling might be something that a person might also do with a visualization, in which case the locus of control for storytelling needs to be external to the visualization. The existing research has overly or exclusively conceptualizing storytelling as internal to visualizations. That this does not comport with full range of actual uses of visualizations, is therefore a threat to external validity. 

Duly noted that more recent research has begun to look at subjectivities in the visualization-building processes (Lee et al, 2015).

Storytelling is lacking in internal validity insomuch as it does not articulate any specific processes or effects that are scientifically testable. Its experimental studies’ dependent variables tended to have been aligned with cognitive and perceptual psychology, e.g., speed or accuracy of information recall or retrieval. 

Then, more recently, the dependent variables also considered information memorability. That putative outcome may have been thrown off as the ideal outcome as a result of findings by Bateman et al. (2010). They observed that so called “junk chart” qualities in a visualization resulted in significantly greater longer-term topic and details recall (memorability). This was anathema to a Tufte-esque view of gratuitous visual elements as bad form.

The storytelling in visualization research seems motivated by the implied notion that people may have internal cognitive level processes for receiving story-told information, which seems highly reminiscent of Treisman’s (1985) research on pre-attentive cognitive visual processing. 

This storytelling cognitive process is occasionally justified by a dubious premise. It goes to the effect that because storytelling predates recorded history, there must be some cognitive-level process that is activated when people listen or attend to a story. This putative story-processing cognitive function is a premise.  From that premise it does not necessarily follow: that visualizations following specific storytelling conventions will necessarily activate that cognitive function, which then supposedly results in the conveyance of ‘vast amounts of information’. 

These various dubious premises constitute constitute threats to storytelling-with-visualization’s validity. But note that similar validity issues challenge the concepts of narrative, visual-metaphor, and rhetoric as each has been developed in the context of data- or information-visualizations.

Are the aspects: Magnitude/Centrality/Variability valid, exhaustive and where does this come from?

These are derived and extended from the moment system of statistics. Its first moment is central tendency, more simply centrality in this model. Then variance, skewness, kurtosis and further articulations of dispersion are generalized as variability. Finally recognizing a more fundamental analytical process based on addition or subtraction, we added magnitude to underpin the three.

This applies to univariate, descriptive statistics. The model generalizes this notion to extend further into inferential and/or higher-dimensionality statistical visualizations. For higher-dimensional analyses in visualizations, these three aspects apply as there are bivariate (etc.) conceptual equivalents of magnitudes, centralities and variabilities that can basically be spoken-to, at potentially different levels of detail and certainty. Obviously I have a lot more work to do to fill out this argument.

Visualizations showing high resolution often show all three aspects with greater precision. Other visualizations may show the same aspects in lower resolution, or not at all, depending on how a visualization was specifically designed and developed. Some aspects may appear to be expressed but could contain implicit bias. These are shown and flagged as limited or risky and flagged in the model.

These three particular aspects, when combined with a full articulation of the data, and the analytical structure, as a model, I assert, is exhaustive in that this model covers all the possible things that any two or more people could rationally speak-to (vis a vis human communication) that is validity based on empirical quantitative data. My further research needs to explore, test and validate this.

Why is aspects-of-data better than storytelling?

It’s not necessarily better, it’s just more precise. It’s a more precise approach to understand how visualizations communicate, and is highly descriptive of the content of what is or could be said, as one or more message components, based on or otherwise using any given data visualization. 

Or perhaps another way of looking at it is taking story and extending it further into the idea of analytical messages. Then the questions is on what basis can we understand and articulate a visualized analytical message. The DDM-Viz model defines this in terms of these three aspects of data, given parameters in the rest of the model.

How do you determine the levels of any one aspect, and is it reliable and valid?

This area is, at this time, based on subjective assessments of the strength of a visualization to express the aspect. Subjectivity error and biases are mitigated by initial use of a lower-level ordinal, evaluative scale.

Granted, this subjectivity is a potential weakness of the model. But this would only apply in the initial stages of an enterprise deployment of the model. Over time, the model allows this component to be assisted with longitudinal modelling. So the subjectivity would be augmented or informed by machine-learning models.

Why does low or no information level matter in terms of any particular aspect of any given data variable; why is it necessarily bad?

Data visualizations are sometimes mesmerizing. I suspect that people using them can become over focused on what the visualizations show as opposed to what they happen not to show, for whatever reason. People often need to be shown what they are looking at in a different way, and thereby shown what they are not seeing or being allowed to see, either.

Operational mistakes happen when we forget to consider all the information available to us or look at it in only a limited way. Equally, having low, no, faulty, flawed or potentially containing bias, needs to be expressed to people about the data on which they are basing vital decisions.

References

Bateman, S., Mandryk, R.L., Gutwin, C., Genest, A., McDine, D., & Brooks, C. (2010) Useful Junk? The Effects of Visual Embellishment on Comprehension and Memorability of Charts. ACM Conference on Human Factors in Computing Systems, 2573–2582.

Gershon, N., & Page, W. (2001). What storytelling can do for information visualization. Communications of the ACM, 44(8), 31-31.

Kosara, R., & Mackinlay, J. (2013). Storytelling: The next step for visualization. Computer, 46(5), 44-50.

Lee, B., Riche, N.H., Isenberg, P., & Carpendale, S. (2015) More Than Telling a Story: Transforming Data into Visually Shared Stories. IEEE Computer Graphics and Applications, 5, 84-90.

Tong, C., Roberts, R.C., Laramee, R.S., Wegba, K., Lu, A., Wang, Y., Qu, H, Luo, Q., & Ma, X. (2018). Storytelling and Visualization: A Survey. VISIGRAPP 3, 212-224.

Treisman, A. (1985) Preattentive Processing in Vision. Computer Vision, Graphics, and Image Processing, 31(2), 157–177.