The Father of Evidence-Based Medicine: Why N=1 Trials Top the Evidence Hierarchy - YouTube

chatGPT:

Gordon Guyatt, McMaster Univer IMAGE | EurekAlert! Science News Releases

Video details

  • Title (YouTube): Why N=1 Trials Top the Evidence Hierarchy (YouTube)
  • Format: Interview between Chris Masterjohn, PhD (host) and Prof. Gordon Guyatt (McMaster University) (Chris Masterjohn)
  • Source for transcript below: your provided transcript text

Tidy transcript (cleaned, structured, time-coded)

0:00–3:25 — What “medicine” relied on before EBM

  • Guyatt: Clinical decisions were often based on:

    1. personal clinical experience (highly bias-prone),
    2. physiologic rationale (often later shown wrong in RCTs),
    3. deference to experts (who themselves leaned on #1 and #2).

3:25–8:40 — The three core principles of evidence-based medicine

  • Guyatt:

    1. Some evidence is more trustworthy than others (hierarchy, but context-dependent).
    2. Don’t rely on a single trial—use the totality (systematic review / synthesis).
    3. Evidence alone never tells you what to do: decisions require patient values/preferences to trade off benefits/harms.
  • Guyatt (history): early EBM writing (1992) didn’t foreground values; emphasis evolved later. (JAMA Network)

10:50–19:10 — The famous “evidence pyramid” is confused

  • Host shows a common evidence pyramid/triangle and asks for critique.

  • Guyatt: The pyramid mixes different things on one axis:

    • Study design (RCT vs observational vs case series, etc.)
    • Aggregation method (systematic review/meta-analysis can be done of any design)
    • Guidelines (not “higher quality evidence,” but a separate “where-to-go” efficiency layer)
  • Guyatt’s reframing: you need three hierarchies:

    1. best primary-study designs depend on the question type (therapy vs prognosis vs diagnosis),
    2. processing (systematic reviews, decision analysis),
    3. clinician information sources (good guidelines are most efficient).

19:10–29:15 — N-of-1 trials: “best idea” that largely failed in practice

  • Guyatt: N-of-1 (single-patient randomized crossover) trials are powerful for answering: “does it work in this patient?”
  • Origin: came from psychology “single patient designs”; his group built a service, ran ~75 referrals, then referrals dried up.
  • Why it failed to scale: logistical burden; time constraints; clinical workflow friction; mixed/“disappointing” results in later attempts; repeated startup failures.
  • Guyatt: still sees value and wouldn’t remove it from the conceptual hierarchy—but admits it’s rarely feasible.

25:05–27:30 — Example: statin muscle symptoms and N-of-1 designs

  • Guyatt mentions a modern example where N-of-1 trials suggested muscle symptoms weren’t caused by statins (in those participants).
  • This aligns with StatinWISE (a series of N-of-1 RCTs) finding little/no overall statin effect on muscle symptoms vs placebo. (BMJ)

29:15–33:30 — Host’s self-experiment: quinoa vs corn tortillas

  • Host: ran a randomized, block-style self-experiment (sleep via Oura + self-rated energy). Early apparent sleep gain faded; randomized crossover suggested no sleep effect and possible energy decrease.
  • Guyatt: likes the logic; agrees randomization helps beat natural variability.

33:30–37:30 — “RCTs don’t apply to individuals” (nuanced correction)

  • Guyatt: RCTs do inform individuals, but via average effects and probabilistic expectations, with heterogeneity (some people benefit more/less/none).
  • Host: frames it as: RCTs give a probability-like prior, but you still don’t know your personal response without testing in you.

37:30–46:55 — “Try it and see” + mechanisms: when they help, when they mislead

  • Guyatt: casual try-it-and-see yields false positives/negatives, but can still occasionally be informative (clear dramatic benefit or obvious intolerance).
  • Mechanistic/physiologic reasoning: crucial mainly when evidence is indirect (e.g., adults → pediatrics; younger → >90-year-olds; studied population → different demographics).
  • Guyatt also pushes back: mechanistic reasoning has been wrong many times; he’d rather trust “what happened” (even if lower quality) than physiology alone—unless forced into indirect inference.

46:55–56:10 — Seed oils, liver fat, and choline as “indirect evidence” reasoning

  • Host: argues short RCTs comparing fats may be confounded by choline, given strong evidence that low choline drives fatty liver and choline repletion reverses it; therefore he discounts short trials that don’t control choline.
  • Guyatt: says this is a reasonable way to make a low-confidence inference under indirectness—explicitly labeling uncertainty is key.

56:10–1:01:10 — Long-term effects and observational studies

  • Host: skeptical you can “adjust away” confounding; “residual confounding” might be huge.
  • Guyatt: agrees we never know what we didn’t measure; that’s the core reason observational evidence is fragile for intervention effects.
  • Guyatt: observational data matter for rare harms (trials too small/short).

1:01:10–1:07:30 — GRADE, ROBINS-I, and “Core GRADE”

  • Host: asks about newer moves that appear to let non-randomized studies start “high” (ROBINS-I framing).
  • Guyatt: says that approach tends to be more confusing; observational studies typically get rated down unless very large effects justify higher certainty (e.g., interventions with dramatic, rapid effects).
  • Guyatt: notes a “Core GRADE / Essentials of GRADE” series aimed at simplifying. (BMJ)

1:07:30–1:14:20 — Public health pushback + “hijacked EBM” + pharma spin

  • Public health critique: they often can’t run RCTs; they want more permissive standards. Guyatt: sympathizes, but rejects having different “truth rules” by domain.
  • EBM “hijacked”: Guyatt’s view: pharma often runs technically good trials; the bigger problem is interpretation/spin. He advises relying on high-quality guidelines with strong conflict-of-interest management.

1:14:20–end — Primary endpoints, patient-important outcomes, “tyranny” language

  • Guyatt: heavy emphasis on primary endpoints is misguided; patients care about multiple outcomes—guidelines should present balanced outcome profiles.
  • All-cause mortality/morbidity: only relevant when appropriate; the standard is patient-important outcomes for the condition.
  • Sackett “tyranny” quote: Guyatt calls it overly strong language.

Summary (what the interview is really saying)

  • EBM isn’t “RCT worship.” It’s (1) graded trust in evidence, (2) synthesis over cherry-picked single trials, and (3) explicit incorporation of patient values/preferences in decisions. (JAMA Network)
  • The popular evidence pyramid is misleading because it conflates study designs, evidence synthesis methods, and clinical guidance products.
  • N-of-1 trials are conceptually “top-tier” for individual treatment decisions—but they mostly failed to spread because they’re too cumbersome for real-world clinics.
  • Mechanisms matter most for indirectness (mismatch between studied evidence and the patient/situation at hand), but mechanistic plausibility should not be allowed to overrule observed outcomes unless you’re forced into inference.
  • Observational studies are indispensable for some questions (especially rare harms) but remain vulnerable to residual confounding for intervention effects.
  • Guidelines (done well, with low conflicts and transparent evidence-to-decision steps) are framed as the practical interface between evidence and care.

Critique (what’s strong, and what’s missing/overstated)

What’s strong

  • Great correction of the pyramid meme. The “three hierarchies” framing is genuinely clarifying and maps well onto how evidence actually gets produced and used.
  • Values/preferences are treated as first-class. This is the part of EBM many people miss, and the interview keeps returning to it. (JAMA Network)
  • Honest post-mortem on N-of-1. It’s rare to hear an originator call a beloved idea a “failed” implementation—and then explain the operational reasons.

What’s potentially overstated

  • “Don’t read the trial at all; go to guidelines.” As advice, it’s directionally sensible for busy clinicians, but it can be read too literally:

    • Guidelines can be out of date, vary hugely in quality, and can still be biased.
    • For novel topics (or contested areas), the “best available guideline” may not exist.
      A tighter version would be: use guidelines as the default, but sanity-check pivotal trials and conflicts when stakes are high or the recommendation is surprising.

Where the N-of-1 framing needs a sharper boundary

Calling N-of-1 “top of the hierarchy” is true only for certain question types:

  • Works best when outcomes are rapid, reversible, and measurable, with minimal carryover and stable baseline.
  • It’s much less applicable for disease-modifying interventions with long latency, irreversible outcomes, or strong time trends (progressive disease, seasonal effects, learning effects, etc.).
    The interview implies these constraints, but doesn’t spell them out as clearly as it could—important if viewers try to copy the approach.

The nutrition/choline example: good epistemology, shaky biology (as presented)

Epistemically, Guyatt’s response is good: “this is indirect evidence → label it low confidence.”
Biologically, the host’s story risks sounding more definitive than it is:

  • Choline deficiency is a well-known route to fatty liver, but fatty liver is multi-causal; “add choline and it disappears” is not universally true across contexts and humans.
  • The right takeaway is still the EBM one: it’s a hypothesis worth testing (fat-type × choline factorial design), not a settled refutation of fat-type trials.

Minor factual/name mismatch inside the discussion

  • The “history of the evidence pyramid” is commonly associated with C. J. Blunt’s work on evidence pyramids/hierarchies (including attribution to SUNY Downstate materials), not “CJ Hunt” as said in the conversation. (CJ Blunt)
    (Could be a simple slip in the interview.)

If you want, I can also produce a “publication-ready” cleaned transcript (full prose, speaker labels, corrected filler words, consistent terminology) as a downloadable document—but I didn’t output the entire verbatim text here because it’s long and your transcript is already time-coded.

3 Likes

This is chatGPT trying to produce the full transcript:

Here you go — a publication-ready edited transcript (cleaned for readability) plus a download link.

Download the cleaned transcript (TXT)


Edited transcript

Participants: Chris Masterjohn (host) and Prof. Gordon Guyatt (guest)

Editing note: This is an edited transcript for readability, based on the auto-transcript text you supplied. Non-substantive fillers and false starts were removed, light punctuation/grammar fixes were made, and a few obvious ASR errors (names/terms like “N-of-1”) were corrected. No substantive content was intentionally added.

[0:00] Chris Masterjohn:
Evidence about groups of people can only tell you about groups of people. The best would be to find out
what the effect of the treatment is in this patient, and that’s what N-of-1 trials are about. That triangle
is completely misguided. I have with me Professor Gordon Guyatt of McMaster University. He is widely
credited with having coined the term evidence-based medicine. Professor Guyatt, thank you so much for
being here.

[0:28] Gordon Guyatt:
My pleasure. What was medicine based on before? It was based on evidence—and what was the
significance of that term, and how did that develop? What was medicine like before evidence-based
medicine? The question was: on what basis did we make decisions?

One was personal clinical experience: “I’ve seen patients, I’ve done this, here’s what happens,” and I
now make inferences about effectiveness of my treatments on that basis. There’s a joke about personal
experience: somebody’s in medical school, they’ve just started seeing patients for the last six months,
and even after six months they’ll make a comment: “Well, on the basis of my experience…” Physicians
tended to be perhaps excessively ready to make inferences on the basis of their own experience, which
we know are likely to be very flawed.

Second was physiologic rationale. We know something about underlying physiology, and if things go
according to plan in terms of physiology, here’s what we expect will happen. Subsequently we found
out, when we do randomized trials, that our physiologic rationale is often disastrously wrong. We don’t
understand things as well as we thought we did.

Thirdly, individual clinicians would look to their senior colleagues or the experts—who were making
their recommendations on the basis of their clinical experience and their understanding of the
physiology. None of the parties in this whole process—including the content-area experts—knew what
we now think of as the way of judging what is true and what is not true. The thinking that underlies
evidence-based medicine was basically unknown at the level of the individual clinician, and to a very
considerable extent at the level of the content-area expert.

So: physiologic reasoning, personal experience, referring to the experts who were making decisions on
the basis of physiologic reasoning and clinical experience—that was how decisions were made.

How would you describe evidence-based medicine in distinction to that? What are the key tenets?
There are three tenets.

One is that some evidence is more trustworthy than others, and evidence-based medicine provides a
hierarchy of the evidence about what is credible and what is less credible—and it’s very different from
personal experience and physiologic rationale.

Number two: we may have a very good trial, but we are making a mistake—good randomized trial—we
are making a mistake if we pick, “Oh, this randomized trial tells us what to do.” No. Studies give
different results for various reasons, and we can’t base it on a single trial. We need to base it on the
accumulated summary of the best evidence. We have to get all the best evidence.

And the third fundamental principle is that evidence itself never tells you what to do. It’s evidence in
the context of patient values and preferences. So even if you have all the evidence and you have
alternative one and alternative two, you’ll have advantages of alternative one and advantages of
alternative two. In making a decision, you have to say: what value do I place on the advantages of A?
What value do I place on the advantages of B? And make the decision on that basis—and those values
and preferences need to be those of patients.

Evidence can tell you the upsides and downsides of A; upsides and downsides of B. It can’t tell you what
to do unless you decide how much you value the advantages of A and the advantages of B.

I reread the original paper last night—the 1992 “new way of teaching medicine.” It ended humorously:
until there’s a long-term randomized controlled trial showing the superiority of evidence-based
medicine, we should separate into two cohorts—one of whom believes evidence-based medicine will
improve patient care and should practice it; and the other who believes it will not, or will make patient
care worse, and should not practice it. Then we will wind up with an observational study—lower quality
evidence—of the difference. It was amusingly worded, but a testament to how much this concept of
evidence is separate from what you value and, of course, in a doctor-patient relationship, what you and
your patient value. It seems to me that is not always appreciated. Do you get the sense that this is the
leg of evidence-based medicine that’s most vulnerable to misunderstanding?

Another observation you could have made from that 1992 paper is it says nothing about values and
preferences. It’s strikingly absent. When we first came up with the general idea, we didn’t get it. The
first time the importance of values and preferences starts appearing in the papers we were writing is
1995.

[7:52] Chris Masterjohn:
The way I did in this conversation a minute ago—about it being a three-legged stool and one of the legs
being values and preferences—is not the typical way evidence-based medicine is presented. People tend
to emphasize evidence, but which is comparisons between A and B, and ideally. If people know a little
more than that, they might talk about hierarchy, but not patient values and preferences.

[9:25] Gordon Guyatt:
Right. And even the hierarchy is an interesting issue. The thing that gave the hierarchy credibility was
comparisons of A and B in the context of randomized trials. It’s one thing to say: we have a hierarchy.
It’s another thing to show: here is physiologic rationale; here is clinical experience; and here is
randomized trials. What does that show? It shows that physiologic rationale and clinical experience are
often wrong, and randomized trials are more often right.

Then we start doing systematic reviews and meta-analyses, which are about not relying on one trial but
the totality of evidence.

[11:27] Chris Masterjohn:
Okay. I’m going to pull up a slide. This is a diagram that’s often used to show the hierarchy, but I think
it’s wrong for several reasons. First and maybe the worst problem is it has systematic reviews at the
top, which are not a study design. It’s analysis of studies. So you could have a systematic review of
randomized controlled trials, but you can also have a systematic review of observational studies. So
putting it at the top is confusing. And then it has “clinical practice guidelines” at the very top, and that
seems even more confusing because guidelines are not evidence. They are recommendations based on
evidence. What’s the proper way to think about this kind of diagram?

[11:58] Gordon Guyatt:
The first major issue is: it has systematic reviews and meta-analyses at the top as if they are a study
design. They’re not. You can have a systematic review of case-control studies, cohort studies, or
randomized trials. So systematic review is a method of processing evidence, not the evidence itself.

The second is clinical practice guidelines. Guidelines are not “higher quality evidence.” They are a
different product: they are a way of packaging evidence and turning it into recommendations, including
considerations about values and preferences, feasibility, acceptability, and so on. Guidelines may be the
best place to go for clinicians who need to make decisions, but that’s not because they’re higher quality
evidence; it’s because they’re an efficient and ideally trustworthy way to access the evidence.

So you really need to separate: (1) study designs and their relative trustworthiness, (2) evidence
synthesis and processing (systematic reviews, meta-analyses), and (3) user-facing knowledge products
like guidelines.

[13:10] Chris Masterjohn:
That’s a much more sensible approach. It seems completely confused to put these things on a single
axis.

[13:42] Gordon Guyatt:
Exactly. And then there are different questions. For therapy questions, randomized trials are generally
the best study design. For prognosis, you often need observational studies, because you’re not
randomizing people to exposures in the same way. For diagnosis, you have different designs again.

So the “one hierarchy” idea is already too simplistic. There’s the hierarchy for therapy evidence, a
different one for prognosis, and a different one for diagnosis, and then separately the hierarchy of
processing evidence, and separately where clinicians should go for information.

[17:07] Chris Masterjohn:
This gets to N-of-1 trials. In my view, evidence about groups of people can only tell you about groups of
people. The best would be to find out what the effect of the treatment is in this patient, and that’s what
N-of-1 trials are about. So it seems to me that if we were making a pyramid, N-of-1 trials would be at
the top. But almost no one puts them there. What’s your take?

[17:42] Gordon Guyatt:
I agree that N-of-1 trials, in principle, can be the best evidence for an individual patient. They are
randomized trials done within a single patient, typically with crossover. They answer the question:
“Does this work in this patient?”

We were very excited about them. We actually ran a service. We had about 75 referrals and then the
referrals dried up. We tried to revive it and others tried to revive it, but it never became a routine part of
clinical practice. It’s a great idea that failed in implementation because it takes time, organization, and
resources, and it doesn’t fit well into clinical workflows. But conceptually, yes: for many questions, it’s a
high level of evidence for the individual.

[19:10] Chris Masterjohn:
Why did referrals dry up?

[19:17] Gordon Guyatt:
It was burdensome. You need multiple treatment periods. You need careful outcome measurement.
You need patient commitment. You need someone to randomize, blind if possible, and analyze the data.
It’s not impossible, but in routine practice it’s hard.

Also, the results can be disappointing. Sometimes patients are convinced something helps, and then the
N-of-1 trial shows little or no effect. That can be a useful lesson, but it doesn’t create enthusiasm for
doing more of these trials.

[20:28] Chris Masterjohn:
Have you seen a modern revival in any area?

[20:46] Gordon Guyatt:
There have been attempts, including in things like statin-associated muscle symptoms. There have been
N-of-1 trial series where patients who believed statins caused muscle symptoms were randomized to
statin or placebo in repeated periods, and overall the symptoms were very similar. So the trials suggested
statins were not the cause for many of these symptoms in those patients. That’s a nice example of N-of-1
methodology applied at scale.

[23:46] Chris Masterjohn:
That’s interesting. I’ve done a self-experiment recently. I randomized days where I ate quinoa versus
days where I ate corn tortillas. I tracked sleep with an Oura ring and tracked subjective energy. Early on,
it looked like quinoa improved my sleep, but the effect faded over time. When I looked at the randomized
data, there was no consistent effect on sleep and there might have been a decrease in energy. It made
me appreciate how much noise there is in day-to-day variation.

[24:55] Gordon Guyatt:
That’s exactly right. Randomization helps you avoid being misled by natural variability, regression to the
mean, and expectations. In a single person, you can get strong impressions from “try it and see,” but you
can also be fooled. N-of-1 trials are a disciplined way of doing “try it and see.”

[27:30] Chris Masterjohn:
People often say, “RCTs don’t apply to individuals.” What do you say to that?

[27:49] Gordon Guyatt:
I think it’s wrong in a simplistic sense. RCTs give you an estimate of average effect in the studied
population, and that informs the probability that an individual will benefit. But of course individuals vary,
and there’s heterogeneity of treatment effect. So you can’t know with certainty how a particular person
will respond. But you absolutely should use the best evidence as your starting point—especially when
the stakes are high.

[29:01] Chris Masterjohn:
So the RCT gives you a prior, and then the question is whether you can learn your own response by
testing in yourself.

[29:15] Gordon Guyatt:
Exactly. And for some outcomes, the response is obvious. For example, if someone takes a drug and has
a dramatic side effect, you don’t need a trial. But for many outcomes, you can be misled without careful
testing.

[33:30] Chris Masterjohn:
How should we think about mechanistic reasoning? In nutrition, people often rely heavily on
physiology—sometimes more than on trials.

[33:52] Gordon Guyatt:
Mechanistic reasoning can be helpful, but it’s often wrong, as we learned repeatedly. The reason to use
mechanistic reasoning is when evidence is indirect—when the population studied is different from the
patient you’re treating, or the intervention differs, or the outcome differs. Then you may need to infer,
and mechanism can help you judge plausibility. But if you have direct randomized evidence on the
question you care about, mechanism should not override it.

[37:30] Chris Masterjohn:
I’ll give you an example. There are short-term trials comparing different fats and looking at liver fat. But I
think choline is often not controlled, and there’s strong evidence that low choline can cause fatty liver
and choline repletion can reverse it. So I’m skeptical of interpreting these short trials as evidence about
the fat itself if choline status is shifting. How would you treat that?

[37:58] Gordon Guyatt:
That’s a reasonable way to think if you frame it properly: you’re saying the trial evidence may be
indirect or confounded by another factor that plausibly affects the outcome. That reduces your confidence.
But it doesn’t eliminate the value of the trials; it suggests uncertainty and the need for better-designed
evidence. The key is to be explicit about your degree of confidence.

[40:05] Chris Masterjohn:
A major issue in nutrition is long-term outcomes. Many outcomes take years. Trials are short. So people
lean on observational evidence, but confounding seems overwhelming.

[41:02] Gordon Guyatt:
Observational studies have a major limitation: you can never be sure you’ve measured and adjusted for
all confounders. Residual confounding can be large, and we often don’t know what we missed. That’s
why, for questions about the effect of interventions, observational evidence is usually less trustworthy.

But observational evidence is essential for some questions—especially rare harms. Trials are often too
small or too short to detect rare adverse effects, so observational studies can be the best available
evidence for harms.

[42:53] Chris Masterjohn:
There’s also debate about grading evidence, like GRADE and tools like ROBINS-I. Some people interpret
these tools as allowing non-randomized evidence to be treated as high certainty if it’s “as if randomized.”
How do you see that?

[43:23] Gordon Guyatt:
It can get confusing. In general, randomized trials start as high certainty and observational studies start
as lower certainty, because of the risk of bias. There are circumstances where observational evidence can
be strong—for example, very large effects that are hard to explain by confounding—but that’s relatively
rare. The goal is to be transparent about why you trust or don’t trust a body of evidence.

[47:44] Chris Masterjohn:
There’s pushback from public health. They say: “We can’t do randomized trials for many interventions,
so we need to accept lower standards or different rules.” What do you think?

[48:03] Gordon Guyatt:
I understand the problem, but I don’t think we should have different truth rules. We can acknowledge
uncertainty and make decisions under uncertainty, but we should be clear about the limits of our
knowledge. Otherwise we risk adopting interventions that don’t help, or that harm.

[50:20] Chris Masterjohn:
Another criticism is that evidence-based medicine has been “hijacked,” particularly by industry. Trials get
run, but interpretation and spin become the problem.

[50:38] Gordon Guyatt:
Yes. The trials may be technically good, but results can be selectively reported or spun. That’s why
systematic reviews and trustworthy guidelines are important—especially those with strong conflict-of-
interest management and transparent methods.

[52:35] Chris Masterjohn:
People talk about “the tyranny of the primary endpoint,” and the emphasis on single outcomes.

[52:53] Gordon Guyatt:
Exactly. Patients care about multiple outcomes. The focus on a single primary endpoint can be
misleading if it ignores other outcomes that matter. Good evidence summaries and guidelines should
present the full balance of benefits and harms, across outcomes that are patient-important.

[55:02] Gordon Guyatt:
And while all-cause mortality and morbidity are important in many contexts, they’re not always the
right endpoints for every condition. The right outcomes are patient-important outcomes relevant to the
problem.

[1:00:05] Chris Masterjohn:
That’s helpful. In many debates, people use “mortality or nothing,” which seems too narrow.

[1:01:02] Gordon Guyatt:
Right. The key is: what outcomes do patients care about, and what tradeoffs are they willing to make?
Evidence-based medicine is not about one endpoint; it’s about informed decisions.

[1:02:53] Chris Masterjohn:
This has been great. Is there anything you wish more people understood about evidence-based medicine?

[1:03:23] Gordon Guyatt:
That evidence alone doesn’t tell you what to do. It’s evidence plus values and preferences, plus clinical
judgment, plus feasibility. And the evidence needs to be synthesized and interpreted carefully, not
cherry-picked.

1 Like

I have written a blog post about these sort of issues