Meta-analyses - 'Beware the Man of Many Studies'

Low power is the norm for almost all fields, including neuroscience, political science, environmental science, medicine, or breast cancer, glaucoma, rheumatoid arthritis, Alzheimer’s, epilepsy, multiple sclerosis, and Parkinson’s research. When performing a meta-analysis, you are almost certainly working with underpowered research, and meta-analytic results will reflect this. Meta-analysis and corrections for publication bias will only be able to go as far as the provided data allows, and if the quality is low enough, all that can be obtained is a biased and unrealistic result.

As noted above, no amount of correction solves these problems. “Garbage in, garbage out” is a problem that meta-analysis cannot solve; to get around it requires new studies, not the tired reanalysis of garbage. But if one decides to check the effect of study quality as a moderator of meta-analytic effects, they may think they can handle quality issues. How well they’ll be able to do depends on how well they’ve coded study quality with respect to how dimensions of study quality varied with respect to the estimate used in the meta-analysis.

Peer review is not magical. If you’ve ever participated in it or been the subject of it, you’re probably aware of how bad it can get. As many have recently learned from the preprint revolution, it also doesn’t seem to matter for publication quality. The studies I mentioned in the previous section on fraud all passed peer review and it’s almost certain that every bad study or meta-study you’ve ever read did too.

The cachet earned by peer review is undeserved. It does not protect against problems and it’s not clear it has any benefits whatsoever when it comes to keeping research credible. Because peer review affects individual studies heterogeneously, it can also scarcely make a dent in keeping meta-analyses credible. The meta-analyst has to trust that peer review benefitted every study in their analysis, but if, say, a reviewer preference for significant results affected the literature, it could have been the source of publication bias. A preference for any feature by any reviewer of any of the published or unpublished studies in a literature could be similarly harmful. Significance is just one feature that there’s a common preference for.

When it comes to reviewing meta-analyses, peer reviewers could theoretically read through every study cited in a meta-analysis and suggest how to code up study quality or which studies should be kept and removed. Ideally, they would; realistically, when there are a lot of studies, that’s far too much to ask for. And you usually won’t know if it could have helped in any individual case or for meta-analyses because most peer reviews are not publicly reported. Peer review is a black box. If you don’t take expert’s words for granted, why would you trust it?

Peer review is simply not something that helps the man of many studies. At best, it protects him when the meta-analysis is done poorly enough that reviewers notice and do something like telling the researchers being reviewed to change their estimator. If they tell them to seek publication elsewhere, the researchers could keep going until they meet credulous enough reviewers and get their garbage published.

Because of how little evidence there is that peer review matters, I doubt it helps the man of one or many studies often enough to be given any thought.

2 Likes

I agree with the view that most medical and many other forms of behavioral research admit structural and/or functional challenges to validity. I also agree that some studies could have been improved within the existing study design but would note that these kinds of “unforced errors” are not so common as some might think. Additionally, peer review processes have flaws, replications are a challenge, and potentially useful findings do not get published because the results were negative. On the extreme margin, outright fraud exists.

With all of this acknowledged, I disagree with any broad interpretation that scientists could have simply eliminated flaws had they chosen to do so. The behavioral research context, especially as it is adapted for medical research, is complex and permeated with value and ethical issues. Humans cannot be forced to behave in a way that might tighten a research design and researchers must often settle for captured proxy data, hoping they can link it to more tightly controlled data they are producing. Money plays significant role as well. Controlling for error costs money, often large sums of money. It is not unusual that a decision to control for one additional source of variance would double the cost of a project. Funding for multi-year projects with large human study populations can run into the tens of millions of dollars. Unpacking all these issues would take many thousands of words that have already been written elsewhere.

Techniques for metanalysis (developed in a different field and now applied to medical research) introduce additional threats to validity which can stand alone or compound the primary threats contained in the source studies. Do these primary and secondary threats render the generalizations of a metanalysis less valid or less useful than any one of its source studies? Generally, they do not but the scope of generalizability is frequently narrowed. Quite often a metanalysis will validly reveal or strengthen a generalization not clear or compelling in any of the source studies. Can the technique be misapplied or abused? Absolutely but these are outlier issues.

Research is doing one’s damnedest to find out what is going on. It is rarely easy and almost never as easy as it looks from the outside.

1 Like