[Tamara Munzner responded in her own blog, and I (Stephen Few) have reproduced her comments here. I have also responded to Tamara’s points directly. My responses appear in brackets and red italics, beginning with my initials: SCF.]
I’m writing in response to a still-unfolding debate and conversation within the visualization community that was catalyzed by two newsletter/blog posts from Stephen Few. He wrote two strongly negative critiques of two papers on memorability from a group of researchers at Harvard and MIT; Michelle Borkin was first author on both of these papers. He also critiqued the InfoVis conference itself, where these papers were published.
The first paper was published at InfoVis13: What Makes a Visualization Memorable? by Borkin, Vo, Bylinskii, Isola, Sunkavalli, Olivia, and Pfister. In my posts, I’ll call it [Mem13] for short. The second was published at InfoVis15: Beyond Memorability: Visualization Recognition and Recall by Borkin, Zylinskii, Kim, Bainbridge, Yeh, Borkin, Pfister, and Olivia. I’ll call it [Mem15]. Few’s critique of Mem13 is Chart Junk: A Magnet for Misguided Research, I’ll call it [Few13]. His critique of Mem15 was called Information Visualization Research as Pseudo-Science, I’ll call it [Few15]. The discussion about that article is on a separate set of pages.
I note two roles of my own, for full disclosure and context.
I was Michelle Borkin’s postdoc supervisor from mid-summer 2014 through mid-summer 2015. I was not personally involved with any of the memorability research, which was done while she was PhD student at Harvard with Hanspeter Pfister.
I’ve been heavily involved with InfoVis for quite a while now. I’ve attended every single one since it started in 1995, and first published there in 1996. My first organizational role was being webmaster in 1999, I started the posters program in 2001, I was papers chair in 2003 and 2004, and I’ve been a member of the steering committee since 2011.
All of which is to say yes, I do have some skin in the game on both of these fronts.
On Conventions Between Fields in Experimental Design and Analysis
I think of science as a conversation that is carried out through paper-sized units. Any single paper can only do so much – it must have finite scope, so that the work behind it can be done in finite time and described in a finite number of pages. There is a limit on how much framing and explanation can fit into any paper. Supplemental materials can expand that scope somewhat, but even without explicit length limits for them there must still be a boundary.
In the particular case of InfoVis as a venue, the restriction on length is 9 pages of text (plus one more for references). That’s fewer than venues such as cognitive psychology journals, where authors might have dozens of pages. In these papers, it’s the common case that a single paper covers a series of multiple experiments that hit on different facets of the same fundamental research question. The InfoVis length is longer than venues such as some bioinformatics journals, where the main paper is sometimes only a few pages, with the bulk of the heavy lifting is done in supplemental materials.
[SCF: Science is not “a conversation that is carried out through paper-sized units.” It is much more fluid and ongoing—or should be. Only the publication of scientific findings is confined to documents consisting of a few pages. Some of the problems that plague science are a result of thinking of it as paper-sized units. Too often, publication of a paper, whatever it takes, becomes the goal, rather than the production of good science.]
This inescapable fact of finite scope means that fields develop conventions of the standard practice: what’s normally done, the level of detail that’s used to describe it, and the amount of justification that’s reasonable to expect for each decision. These conventions can diverge dramatically between fields. The interdisciplinarity of InfoVis can lead to very different points of view of what’s reasonable and what’s valid.
[SCF: While it is true that the interdisciplinary approach to infovis “can lead to very different points of view of what’s reasonable and what’s valid,” this situation creates problems that must be resolved. Conventions must be developed that that specifically support information visualization, despite the many disciplines that inform it. We visualize information for particular purposes. This should never be forgotten when infovis research is conducted, regardless of the background and training of the researchers. The fact that visual science researchers participated in the two memorability studies done by Michelle Borkin and her colleagues does not mean that the conventions of visual science apply. Attempting to discover what catches someone’s attention or remains in memory after brief exposure to an image might be of interest in and of itself to visual scientists, but it should only be of interest to infovis researchers if it pertains to the use of visual representations of data to make sense of or communicate data. These studies were not properly designed with this objective in mind.]
We discussed both of the memorability papers in our visualization reading group at UBC. The difference in initial opinions based on backgrounds was remarkable.
A person with a vision science background initially thought the methods were completely straightforward: they were closely in line with decades of work in her specific field of vision science in particular, and aligned with the larger field of experimental psychology in general. Although the vision scientist could identify some minor quibbles, she was fully satisfied with the rigor. She was intrigued to see that the methods of vision science, which are typically directed to experiments with extremely simple stimuli, were successfully being applied to the more complex stimuli that are of interest in visualization.
In contrast, a person with a biomedical statistics background initially thought the methods were completely indefensible, with far too many variables under study to make any of the statistical inferences meaningful, and most importantly no discussion of confidence intervals or odds ratios. (I was well aware of confidence intervals, but I hadn’t heard of odds ratios. For a concise introduction to these ideas, see Explaining Odds Ratios by Szumilas.)
The biostatistician had had this highly negative reaction to many of the papers she’d been reading in the visualization literature, and had been thinking long and hard for the past year about how to understand her misgivings at a deeper lever than a first kneejerk reaction of “they’re just ignorant of the methods of science”. She articulated several crucial points that have helped me think much more crisply about these questions.
There are several fundamental differences between the experimental methods used in vision science and the methods considered the gold standard in medical experiments that test the effectiveness of a particular drug for treating a disease: randomized controlled trials.
Two of the most crucial points are the ability to manipulate the experimental variables/factors, and effect sizes.
First, in many medical contexts, some kinds of manipulation of experimental variables are off the table. Repeated-measures designs are impossible because of carryover effects: you can’t just give the same cancer patient 100 different cancer drugs, one after the other, because the effects will linger instead of stopping when the treatment stops. With great care, it’s sometimes possible to carefully design “case-crossover” experiments for just two conditions, where for example two drugs are tested on the same person, but certainly it’s not possible to do test many conditions on the same person. That’s why the common case is to design experiments with between-subjects comparison not within-subject comparison. Moreover, the trial lasts a long time: months or even years. Thus, the number of trials is typically equal to the number of participants.
Second, when manipulating variables that affect human subjects, you also have to consider harm to the participant. In medicine, there are many situations where you either cannot manipulate a variable (you can’t retroactively expose somebody to asbestos 20 years ago in order to see how sick they are today, and you can’t just divide a set of people into two groups and give one of these groups brain cancer), or you should not manipulate it for ethical reasons (you shouldn’t deliberately expose somebody to a massive dose of radiation today to see how sick they get tomorrow). One response to this situation is to develop methods for “observational” (aka “correlational”) studies, rather than “experimental” studies where the experimenter has full control of the independent variable. For example, in one kind of retrospective observational studies, a “cohort” is identified (a group that has been identified as having some property, such as exposure to an environmental toxin) and then it is compared to a similar group that hasn’t been exposed. Selecting appropriate participants for each of these groups is an extremely tricky problem, because of the possibility that the cohort also varies from the control group according to some confounding variable that has a stronger effect than the intended target of study.
What I used to think of as “experimental” studies turn out to be more properly called “quasi-experimental” methods because the experimenter doesn’t have full control of the independent variable: they can’t tell people to smoke or not to smoke, but they can ask the people who already smoke to do something else – but there’s still the extreme hazard of confounds. What if you divided so that one group happens to have more heavy smokers than the other, or what if an underlying reason that people smoke is stress and so you’re really measuring stress rather than the effects of smoking per se. The randomized controlled trials that are the gold standard of medicine are in this category. You can divide cancer patients into two groups, one that gets the experimental treatment and the control group that gets the placebo, and then analyze the differences in outcomes to try to uncover their linkage to the intervention. But you can’t control for how virulent of a strain of cancer they have, because you didn’t give them cancer. And, as above, you can’t give the same patient the experimental drug and the placebo.
(One good reference for all of this is the book “How to Design and Report Experiments” by Andy Field and Graham Hole, especially Section 3.2 on “Different Methods for Doing Research”.)
Above, I’ve been alluding to the other crucial aspect, effect size. The typical goal in medicine is to detect quite subtle effects, and thus experiments need to be designed for large statistical power in order to have a hope of detecting these effects.
In contrast, in vision science, life is very different: experimental trials are fast, independent, and harmless; frequently, effect sizes are big. First, trials are very short: just a few seconds in total for the full thing, and the actual exposure to the visual stimulus is often much shorter than one second! Moreover, it’s straightforward to design experiments that preclude carryover effects when you’re testing a perceptual reaction to a visual stimulus instead of a physiological reaction to an experimental drug. Thus, it’s the extremely common case to run many trials with each participant: dozens, hundreds, or even thousands of trials per participant. When considering the statistical power of an experiment, the designer is concerned with the total number of trials, which is the realm of hundreds or thousands. The number of participants is typically far, far smaller than in medical experiments, where in order to have thousands of trials you need thousands of participants. Also, in this domain, it’s not just feasible to design within-subjects experiments, it’s actively preferable whenever possible – because these designs provide greater statistical power for the same number of trials compared to between-subjects designs, since you can control for intersubject variability.
The combination of these two things — the ability to control for intersubject variability through within-subjects designs, and the ability to run many trials — means that there is not nearly so much concern for confounding variables based on splitting your subjects into groups improperly. One implication is that in this experimental paradigm, multi-factor / “factorial” designs are entirely practical and reasonable. That is, a single experiment can test more than one experimental variable, and each variable might be set to several values. For example, the visual stimuli shown to the participant might systematically vary according to multiple properties, resulting in many possibilities. Another implication is that “convenience sampling” is extremely common and does not require special justification, for example undergrads on campus or workers on Mechanical Turk.
Moreover, it’s even possible to design between-subjects experiments with multi-factor designs, given a crucial assumption: that individual differences have a smaller effect size than the effect size that we’re trying to study. This assumption is reasonable because there’s a huge amount of evidence from decades of work in vision science that it’s true – and moreover you can test that assumption in your statistical analysis of the results. And this point brings me back to the concept of effect sizes as the second key difference between the methods of medical research and vision science. In medical research, individual difference effects (how virulent is your cancer) are usually enormous compared to the variable under study (does the drug help). In vision science, individual differences in low-level visual perception are typically very small compared to the variable under study (does the size of the dot on the screen affect your speed of detecting its color).
All of these points are part of the reason that work in vision science is scientifically valid, because the methods are appropriate to the context – even though multi-factor testing with a small number of participants would be ridiculous in the very different context of medical drug trials.
Coming back to visualization, we’re in a context that’s very close to HCI (human-computer interaction) – and controlled laboratory experiments in HCI are a lot closer to vision science than to medicine. It’s common to use multi-factor designs and we run many trials on each participant. There is significant trickiness with carryover effects, typically more so than in vision science, and we often consider “learning effects” in particular as something that must be carefully controlled for in our designs. Our trial times are typically longer than in vision science, ranging from a minute to many minutes – but still far shorter than in medicine. There’s more to say here, but I’ll leave that discussion to another post because I have more ground to cover in this one.
Coming all the way back to the memorability papers and Few’s reponse to them, this analysis allowed me interpret a comment from Few somewhat more charitably: his complaint in the response to the paper about the demographics of Mechanical Turk not matching up with the population of the US. In the context of HCI research, it seems extremely naive because there has been enough previous work establishing how to use MTurk in a way that replicates in-person lab experiments that most of us in the field consider it a settled issue. By considering it in the context of randomized drug trials, as I describe above, I can better understand why Few might have thought along these lines – and my discussion above also covers why his criticism is not valid in this context.
[SCF: My critique of the experimental methods that were used in Borkin’s paper was not influenced by a background in a different research discipline (e.g., medicine). Instead, I was addressing the specific ways in which experimental research should be designed to produce meaningful and valid findings regarding information visualization. Nothing useful can be said about information visualization based on Borkin’s paper.]
(Two of the most relevant papers are from Heer’s group: Crowdsourcing Graphical Perception: Using Mechanical Turk to Assess Visualization Design by Heer and Bostock, from CHI 2010; Strategies for Crowdsourcing Social Data Analysis by Willet, Heer, and Agrawala, from CHI 2012.)
Again coming back to these papers, a contentious point in this whole debate is whether these experiments had sufficient statistical power to draw valid conclusions. Few has contended that the Mem15 paper can’t possibly be valid because there are too few participants. As above, I think this argument is missing the point that in this kind of experiment the power is more appropriately analyzed in terms of the number of trials.
[SCF: Even based on this assumption about the number of trials, did Borkin’s research demonstrate appropriate statistical power? We have no reason to believe that it did.]
I would certainly be happier with the Mem13 paper if it explicitly discussed confidence intervals and/or effect sizes, but it does not. That’s the common case right now in HCI and vis: most papers in HCI and vis don’t, although a few do. I note that Stephen Few did specifically state that he’s critiquing the whole field through this paper as an exemplar, so saying “everybody does it” isn’t a good defense – that’s exactly his point!
Pierre Dragicevic has written extensively and eloquently about how HCI and Visualization as a community might achieve culture change on the question of how to do statistical analysis by emphasizing confidence intervals rather than just doing t-tests: that is, null-hypothesis significance testing (NHST). I do highly recommend his site http://www.aviz.fr/badstats. I also note that he gave a keynote on this very topic at the BELIV14 workshop, a sister event to InfoVis 2014, which sparked extensive discussion. This kind of attention and activity is one of the many reasons I don’t agree with Few’s characterization of the vis research community as being “complacent”.
[SCF: You are skirting the point of my critique. My belief that the infovis community is suffering from complacency is primarily based on the fact that papers such as Borkin’s are accepted and promoted as good when they are in fact poorly done and invalid. When I begin to see the overall quality of infovis research improve and a greater openness to thoughtful critiques, I will have a reason to believe that complacency is diminishing.]
(Dragicevic also also contributed to the online discussion on Few15, with posts 6, 19, 40, and 45.)
The biostatistician in my group argued that even this culture change might not be the best end goal; she sees confidence intervals as just one mechanism towards a larger goal of using methods that take into effect sizes as a central concern, and report on them explicitly in the analysis. She points out the in the medical community there is the concept of levels of evidence: while randomized controlled trials are are gold standard in terms of being the highest level of evidence, they’re absolutely not the only way to do science. In fact, it’s well understood that studies leading lower levels of evidence are exactly required as steps along the way towards such a gold standard. They’re not invalid — or pseudo-science — they use different methods to achieve different goals. (For a concise introduction to these ideas, see The Levels of Evidence and their role in Evidence-Based Medicine by Burns, Rohrich, and Chung.)
[SCF: If Borkin’s study were properly designed to produce valid results, I would not have called it pseudo-science. I have not judged this paper as pseudo-science because it used methods that are different from mine, but because it used methods that were not designed to produce valid findings about information visualization. I was quite specific in my critique of the study’s flaws. You have not addressed any of those flaws specifically and directly. If you wish to argue that this study qualifies as legitimate science, then show how the specific flaws that I addressed are in fact examples of legitimate scientific design.]
The upshot is that I do think this question of statistical validity is complex and subtle, and that Few’s approach of just asserting “you’re not following the scientific method” is dramatically oversimplifying a complex reality in a way that’s not very productive.
[SCF: You are misrepresenting my argument. I did not say that this paper was flawed solely because of statistical problems. My lengthy critique is not guilty of oversimplification. To the contrary, you are oversimplifying this matter by misrepresenting my position and failing to address the many flaws that I identified.]
I hope that my analysis above starts to give some sense the nuance here: the methods of science depend very much on the specific context what what is being studied. Yes, it’s true that we talk about “the” scientific method: observe, hypothesize, predict, test, analyze, model. But when we operationalize this very general idea, the much more interesting point is that there are many, many methods used in science. There is no single answer, and a lot of the training of a scientist involves to learning when to use which method; and within every method are many smaller methods that require judgement, and so on – arguably it’s methods all the way down. Methods appropriate for medical drug trials aren’t even the same as those for epidemiology, much less for low-level perception as in vision science, or human behavior as in social science, or in the complicated mix of low-level perception, mid-level cognition, and high-level decision making that is visualization.
Moreover, all of this discussion has just been about the relatively narrow question of controlled experiments featuring quantitative measurement! There’s an enormous field of qualitative research methods that are also extremely useful in the context of visualization.
On the InfoVis Review Process
The process of reviewing papers is relevant in this memorability discussion, since the Few15 critique specifically called into question whether the peer review process at InfoVis yields appropriate quality.
Papers as a Mix of Strengths and Weaknesses
No paper is perfect, any paper has a mix of strengths and weaknesses. The job of the reviewers is to decide whether the strengths outweigh the weaknesses, and it is valid for two reasonable scientists to disagree on these given that it is an individual judgement call. That is, all papers have flaws; the judgement of the reviewer is to decide is whether these flaws are fatal. Few argues that the Mem13 paper and the Mem15 paper have fatal flaws. I disagree with this assessment, and I explain why at length below.
[SCF: You and I understand the job of reviewers differently. You believe that a paper should be accepted for publication if the “strengths outweigh the weaknesses.” That’s a rather low bar. I believe that the review process should determine if the paper is scientifically valid and worthwhile. In your final sentence above, you disagree with my assessment that Borkin’s paper has fatal flaws and promise to “explain why at length below,” but you never do. At no point do you address the specific flaws that I identified.]
Peer Review and the Conversation of Science
I’ll echo and expand on the words of two other InfoVis steering committee members that a conference is a conversation (Fekete), and that science is a conversation (Heer). The review process is an intrinsic part of that conversation, even though much of it is not visible to the readers of the final draft of the paper.
Papers are the major units of speech in the scientific conversation. Papers cite and discuss past work, and frame their new contributions with respect to the limitations of that past work. Typically the way somebody argues against the conclusions drawn a paper is to write another paper that carefully shows why that original one didn’t get the story right. The strength and validity of of that argument is judged in the peer review process, where frequently the reviewers are the authors of the very papers that the new paper is characterizing as having limitations. It’s not usually quite so simplistic as just saying the old work is flat-out wrong (although that sometimes does occur). It’s often a matter of noting situations where it falls short, or extending to new situations not previously considered, or proposing the existence of new confounding factors that serve illuminate a previously murky assumption or explanation.
Like most practitioners, Few doesn’t take part in that academic conversation as an author. That’s not suprising – if he did we’d normally call him an academic, since that choice to engage in publishing research is exactly the dividing line between those categories.
Unlike many practitioners, Few has engaged with scientific papers at a sufficiently detailed level that he has been asked to take part in that conversation as a reviewer. He has chosen to decline the most recent invitation because of his belief that anonymous peer review is implicitly unethical.
[SCF: I decline to participate in the infovis paper review process because I believe that anonymity invites bad behavior and that I have no right to pass judgment on someone’s work while remaining anonymous. Unlike the review processes for other events, which allow reviewers the choice of remaining anonymous or revealing their identities, the infovis process forbids reviewers from revealing their identities, which is absurd. In a court of law, the accused have the right to know the identity of their accusers. There’s a reason for this, which I believe applies to the paper review process as well, even though the consequences are not as dire.]
While of course Few is free to make his own choices in this situation, since they affect only himself, I strongly disagree with the assertion that anonymous reviews are fatally flawed. Anonymous reviews provide the opportunity for honesty of assessment without fear of future retribution or retaliation. It’s a structural check against the problem that papers could be rejected from grudges rather than from merit. It’s also a protection for junior people being able to honestly assess the work of senior people without the fear of such retaliation as unenthusiastic letters when tenure time rolls around. Neither of these situations is a problem for Few personally, since he doesn’t submit papers or want tenure, but they are very real concerns for academics.
[SCF: You are ignoring the fact that anonymity gives reviewers the right to reject papers due to grudges that they have against paper authors. In your effort to protect reviewers you are putting authors at risk.]
In the comment thread, Few expresses concern that anonymity supports irresponsible or incompetent behavior “in the shadows”. The fact that he isn’t acknowledging is that there is indeed considerable and significant oversight in the review process that happens at multiple levels. Reviewer identity is only anonymous *to the authors*. It is not at all anonymous to the other members of the program committee or the papers chairs!
First, there’s a two-tier reviewing system, where the (primary and secondary) reviewers who are on program committee have positions of higher responsibility than the external reviewers that they invite. These program committee members are carefully chosen based on the quality of the reviews they have written in the past.
The primary reviewer exercises judgement about the competence and thoughtfulness of the other reviewers when writing up the meta-review. As Jeff Heer alluded to in his first and second comments, all four reviewers read what the others wrote, and then discuss – sometimes at length. I consider it a sign of strength, not a process problem, that reviewers can and do regularly disagree on the merits of a particular paper. Usually these discussions end with some level of agreement, where either an initially positive person gets convinced by arguments about flaws from a more negative reviewer that there is a problem, or vice versa – that a reviewer who champions the worth of a paper (despite inevitable imperfections) convinces the others that it should see the light of day. As a PC member, I most certainly notice if an external on that team does a poor or incoherent job of reviewing, and I make it a point to not invite them again (and would sound an alarm if I saw that another PC member tried to do that in the future for a paper where I was on board).
Second, there’s oversight from the three papers chairs, who read every single review. They explicitly note cases where there is a review quality problem. Program committee members whose review quality is too low — or who consistently invite unqualified externals who write low quality reviews — are not invited to participate in subsequent years. At this point the pool has been sufficiently carefully vetted that there’s only a few per year who are disinvited, and some years there’s no need to eliminate anybody. Moreover, if the papers chairs are concerned that they don’t have enough information to judge a particular paper, they may call in a “crash reviewer” to do an additional review with just a few days of turnaround time. I’ve asked for these a few times when I was papers chair, and I’ve done a few of these myself in later years.
[SCF: These processes and safeguards are not effectively addressing the problems that I’ve identified. Invalid research papers are getting through the review process.]
It’s true that the strengths and weaknesses of anonymous review is an active issue of debate across many scientific communities, and visualization is no exception. While I think that it’s reasonable to discuss whether InfoVis should change the process, I believe that the stance that anonymity necessarily begets irresponsibility is overly simplistic. The strength of a single-blind reviewing system very much depends on process questions of how it is run, and I think InfoVis has an extremely robust and careful process. It yields higher quality results than most other communities that I’m aware of.
I may well write further about this question in some later blog post, but that’s enough for now.
Quality of Evaluation Papers at InfoVis
The bar for ‘publishable’ and ‘strong’ typically moves over time at most venues. I’m confident that it’s gone in the right direction at InfoVis for evaluation papers: quality has increased. In the early years of InfoVis, there were no controlled experiments at all. Then there were a few, and they were fairly weak. As there came to be more and more, the bar was gradually raised, where they needed to be stronger and stronger to get in. I believe we’re now in a place where most are strong, and a few are great. I don’t believe we’ll ever be in a place where everybody thinks every single paper that gets in is great, because there is so much variation in the judgement about what it means to be great. That’s true for any venue at all.
Punching Up vs Punching Down
Few clearly sees himself as punching up: he’s the David, the lone voice in the wilderness, the underdog. The Goliath that he’s fighting against is the slowly turning wheels of entrenched academia in general, of which the academics who dominate conferences like InfoVis are an instance in particular. All of his language frames himself as somebody who is fighting the good fight.
[SCF: This characterization of my position is “punchy,” but ill chosen. I do not see myself as David facing down Goliath. I merely see myself as someone who knows and cares a great deal about data visualization and is concerned about the quality of data visualization research. That’s it. Goliath was the champion of the Philistines, who were exercising oppressive dominion over the Israelites. The infovis research community exercises dominion over nothing but its own members. The infovis research community has little affect on the world. I’ve been trying the change that by helping you become more useful and relevant to the world. If you’re searching for a biblical analogy, perhaps the Good Samaritan would be a better fit. I’ve taken the time to notice your wounds and give a damn. Few others in the world of data visualization practice have bothered. Given the reception that I’ve received, their disinterest is easy to understand.]
In contrast, nearly every academic I’ve heard from who has seen his newsletter has reacted in shock, and there’s a palpable sense that Few crossed a line. I think that’s because we see it as punching down: he’s a senior person who is publicly attacking a junior person, and there’s a strong convention against doing that in academia.
[SCF: I find it revealing that “nearly every academic” feels a “palpable sense that Few crossed a line,” but apparently no one can explain where that line is drawn, who drew it, and why I should respect it. I’ve requested an explanation, but no one has responded.]
I need to think more about exactly why that social convention exists. My first speculation is that it’s a reaction to the strong hierarchical system of academia. Senior people have direct power over more junior ones in so many ways (hiring, reviews, tenure) that there’s a sense of noblesse oblige – that those with power and privilege have a duty to those who lack that power. (Or, if you like the pop culture superhero version better than the snooty French version – “with great power comes great responsibility”.)
[SCF: As a non-academic, I don’t subscribe to this your sense of hierarchy. I don’t think of Michelle Borkin as junior. She is an assistant professor at a university. She has students of her own. She publishes research papers for the world to see (and I thought, for the world to critique). We’re all adults. When we put our work out there in the world, we must accept responsibility for it. It’s that simple. I would expect the world of academia, perhaps above all others, to be open to critique. Alas, to my great dismay, I’ve found that this is far less the case than in the world of business where I’ve spent most of my career.]
I might in the future write a longer post just on this subject, but there’s a lot of ground that I want to cover so I’ll move on.
Pseudo-Science as Fighting Words
It’s disingenuous at best for Few to accuse somebody of doing ‘pseudo-science’ and then express surprise that people are getting upset. That’s like complaining that I can’t believe that person over there hit me in the face — when all I did was kick him in the stomach!
[SCF: I was not surprised that people were upset. I was surprised and disappointed by the unreasonable ways that many in the academic community have responded (i.e., in the form of unwarranted personal attacks rather than by addressing the content of my critique). My claims were rational, accurate, and supported by evidence. Only two academics have responded in the thoughtful manner that I would expect from scientists: Jeff Heer and Pierre Dragacivec.]
His later comments said that the academics are not open to feedback and are slamming the door in the faces of people who don’t have PhDs following their names. I don’t agree that the irritation expressed by many academics at his remarks is fair to interpret as sign of a disrespect towards all non-academics; they’re a sign that his rhetorical choices have made people angry at him in particular.
[SCF: Let’s get straight what I’m saying. Many (perhaps most) infovis researchers are out of touch with the real world of data visualization practice and are responding to my critique in ways that demonstrate no concern for getting in touch. I’m tired of the excuse that my so-called “rhetorical choices” excuse the academic community from responding thoughtfully. This is nothing but a diversion from a very real set of concerns that few in the academic community are willing to acknowledge, let alone address.]
‘Pseudo-science’ is fighting words: that label is a direct personal insult to the intelligence and integrity of a scientist. Of course there will be an emotional response. It’s implausible to me that Few does not understand that this word choice would be a red flag. He made the deliberate choice to frame this debate as fight rather than a discussion. He even admits in a later comment to being “intentionally provocative”.
[SCF: Please don’t falsely assign inflammatory intentions to me. I have not insulted anyone’s intelligence or integrity. If you believe otherwise, you are welcome to provide specific examples. If my behavior were really at fault, you would not need to exaggerate it I as you have. I went out of my way in my article to point out that the failures of Borkin’s paper were not failures of intelligence or integrity.]
I suggest that the label of ‘pseudo-science’ should be reserved for things like Intelligent Design, where there is a deliberate attempt to cloak a non-scientific practice in the garb of science to deceive.
[SCF: I intentionally used the term “pseudo-science” to emphasize the harmful nature of the problem—a problem that is being propagated by the infovis research community, including several of its leaders. Rather than worrying about my use of the term “pseudo-science,” I suggest that you worry about fixing the problems that I have taken great pains to describe. Put your energy where it’s most needed. Opposing me is helping no one.]
If his goal is to make a useful contribution to the extensive and ongoing debate about the methods of science, he should not start things out by slinging personal insults. That choice makes it a lot harder to find a way to work with him. Given his choice of rhetoric, I’m not sympathetic to his position that the people who protest his tone are missing the point of his scientific critique. He made that bed, he gets to lie in it.
[SCF: No examples of “slinging personal insults” have come from me, although a few have been directed at me. I have appropriately assigned responsibility for a flawed research paper to the people who wrote it. When people find problems in my work, they assign responsibility to me. That how this works. This is a responsibility that we must all accept when we put our work out there in the world. A fitting example of “slinging personal insults” actually comes from you in the “On Rhetoric” section below.]
Few frequently uses a family of rhetorical device that I find very irritating.
[SCF: What I find irritating are false accusations, especially when they suggest that I am guilty of deception.]
One of these I’ve seen called by many different names, including the loaded question, begging the question, circular reasoning, or presupposed guilt. Perhaps the best-known example of this device is “Have you stopped beating your wife?”, where either a yes or a no answer implies guilt because of the false presupposition.
[SCF: As a longtime student of rhetoric, I can assure you that I am not guilty here of the loaded question, begging the question, circular reasoning, or presupposed guilt. I’ll respond to each of your specific claims below.]
Here’s one of many examples from the comments:
“If you disagree, you should defend the review process, not by quoting statistics about the number of papers, etc., but by explaining why poor research papers are accepted.” (Few to Fekete, post 27)
No. Fekete doesn’t have to explain *why* poor research papers are accepted because he did not agree with your assertion *that* poor papers are accepted.
[SCF: In my article, I pointed out several specific problems in infovis research. At no point did I assume or suggest that Fekete accepted my claims that these problems exist. At no point did I ask a question of Fekete that he could not answer without admitting fault. At no point did I lay a deceptive trap for him. This is not a court of law where someone is confined to a yes or no answer. I made the case that poor research papers are being accepted. He was welcome to respond by countering my argument that they are. Instead, Fekete made a speech about the glories of infovis research. In a debate, when you make an argument, your opponent is obligated to respond to your argument with reason and evidence. Fekete chose to ignore my arguments entirely. In other words, he chose to treat it like a televised debate among political candidates rather than an serious debate among peers.]
Here’s another example that’s even more blatant:
“My statement that professors who produce research papers such as this one will encourage their students to produce pseudo-science is not speculative, assuming that you accept my premise that this paper qualifies as pseudo-science.” (Few to Heer, post 20)
No. Heer explicitly *rejected* the premise that this paper qualified as pseudo-science, in the directly preceding paragraph. He most certainly did not accept the premise.
[SCF: I neither said nor implied that Jeff Heer accepted my premise. Furthermore, you should reread Jeff’s comments in the paragraph preceding my comments. He did not explicitly reject my premise as you claim.]
A related device is the insinuation of things that people did not mean, for example:
“What do you suggest that the infovis research community should do to prevent the kinds of flaws that you and I have both identified in this paper?” (Few to Heer)
Misleading. This phrasing strongly implies that Heer agreed with all of Few’s assertions – but he did not. Heer’s answer deftly sidesteps the attempted trap: “… I would have raised the issues I noted above (which only partially intersect with yours)”. (Heer to Few, post 33)
[SCF: No, the phrasing of my question does not imply that Jeff agreed with all of my assertions. Your argument here is an example of the flaw that you are accusing me of committing. You are insinuating something that I neither said nor meant. I laid no trap for Jeff. Also, the response from Jeff that you quoted as a deft sidestepping of my “attempted trap” is his answer to an entirely different question. The question to which Jeff was responding was, “If you had reviewed the ‘Beyond Memorability’ paper, would you have recommended it for publication in its current form?” I would not describe his sidestepping as deft, but as fearful. I believe that Jeff would not have accepted this paper, but that he fears the recrimination that would result from this admission.]
A third rhetorical device is the continual interweaving between facts that are well substantiated and agreed on by others, and his own opinions – without clearly distinguishing between the two – to present the misleading impression that everything he says is a faithful reflection of the conventional wisdom.
[SCF: When you make an accusation such as this, you should provide an example. I have no idea what you’re referring to.]
Even as I’m irritated by Few’s choices of tone and rhetorical style, the silver lining is that I appreciate his passion for the cause of improving the work that we all do in the field of visualization. I’m delighted that the field is vibrant enough that we both care enough to argue about it – and that a bunch of other people care enough to follow that argument as well through tweets, blogs, and other social media avenues. That’s much better than apathy or disinterest!
[SCF: I’ve contributed a bit more than passion. Suggesting the I can only be valued for my passion is an example of the attitude that makes many leaders among data visualization practitioners dismiss the infovis research community as insular and irrelevant. Why should they get involved if this is how you respond to someone who has contributed as much as I have to the field of data visualization?]