We talked about defamation, and the extent to which it is okay for journalists or researchers to attack researchers in the public domain. And whether there’s a point where it’s okay for the researcher to sue for defamation. We used the Michael Mann case as an example.
We talked about reporting of fMRI data. I confess it is out of my comfort zone to talk about methods for researching fMRI. But we read a paper by Vul, Harris, Winkielman, and Pashler (2009) to spur the discussion. Fortunately, one of my students works with fMRI data, and it was great to have her perspective.
And we discussed some issues to consider in responsible reporting. Such as:
Correcting for number of tests/Type I error.
Statistical significance vs. practical importance/meaningfulness. Studies with enormous sample sizes can demonstrate significant results that, when thought of in effect sizes, are essentially meaningless. On the flip-side, there are sometimes meaningful differences that may not reach the magical .05.
p-hacking/fishing. Gelman has discussed this issue repeatedly on his blog, as have many others. Gelman has also talked about what he refers to as researcher degrees of freedom (decisions made that don’t involve statistical fishing but may still be questionable). I think that fishing is a very common issue in research, particularly with secondary data analysis. It is really useful for students to think about it early on, and to learn how to formulate hypotheses and research questions before running analyses.
Comparing two analyses without statistical comparisons. I have railed about this issue for many years. Gelman and Stern (2006) wrote a great paper called “The Difference Between “Significant” and “Not Significant” is not Itself Statistically Significant” (Yes, I always tell my students not to use article titles in their writing; I make an exception here). The issue is that researchers often say that two things were correlated, and two other things were not correlated, and therefore they are meaningfully different. Or, two things were correlated for one group, but not the other. For instance, maybe parent-child conflict correlated with substance use at .25, p < .05, whereas parent-child closeness correlated with substance use at .22, p > .05. And the researcher/author might then conclude that conflict matters for substance use, but closeness does not. Not okay! In a paper of mine, I once ran regressions predicting sexual behavior from a set of gendered attitudes, and was interested to see if the gendered attitudes mattered more for men’s or women’s sexual behavior. So I included interactions between each gendered attitude and biological sex. A reviewer then said that I needed to instead run the regressions separately for men and women to see what was significant. That was at least 5 years ago, and I clearly still haven’t gotten over it.
Causal conclusions when not warranted.
I learned the term HARKING.
Preregistration: We discussed arguments for and against pre-registration of hypotheses. And, taking pre-registration a step further, the idea of pre-registering hypotheses AND using simulated data to test hypotheses before actually running analyses in the actual data.
Students also generated the additional issues to consider:
Treating p < .05 as a magical/meaningful cutoff
How/when to report marginal significance/trend level findings.
“The post Responsible reporting first appeared on Eva Lefkowitz’s blog on February 12, 2015.”