Whole Genome Sequencing

A press release today… seems like there must be a “catch”:

May 11, 2026 /PRNewswire/ – Human Longevity, Inc. today announced the launch of its new Clinical-Grade $599 Whole Genome Sequencing Report , designed to bring advanced genomic medicine and AI-driven disease prevention to the general population.

1 Like

I looked at their website. The price is good… that’s it. Depth: 30x… I’m looking for 100x. Doesn’t seem like you get a file - instead, you get an app and they provide analysis which is regularly updated. Not interested in this dynamic - I want the file I can download, in an open format that is not lossy, and I’ll do my own analysis and updates. PASS. YMMV.

1 Like

IMO: We’re getting down into the weeds here. Unless you have some compelling special reason or have money to burn to satisfy your curiosity, the 30x is more than adequate. You will be paying an additional ~$ 600+ for information that is probably not very relevant. Your bucks, your choice.

“30x” refers to the coverage or depth, meaning each part of the genome is read an average of 30 times for accuracy."

2 Likes

Emphasis on average, meaning that some regions are read much less than 30x (maybe 2-10x). Which until now, I didn’t know.

According to AI, here are the points for doing 100x sequencing:

  • By bumping to 100x, you ensure that even the “low-coverage” areas likely receive 20x or 30x hits, providing enough data to distinguish a real mutation from a machine error.
  • Rare Variant and Mosaicism Detection: If you are looking for mosaicism (mutations present in only a small percentage of cells) or circulating tumor DNA (ctDNA), 30x will miss them entirely. 100x allows you to detect variants that exist in only 5% to 10% of the sample with high statistical confidence.
  • Certain areas, like GC-rich regions (where Guanine and Cytosine are dense), are notoriously difficult for sequencers to read.

Still, the recommnendation for 30x is a good option:

  • it’s more cost-effective
  • good enough if you are going a general health screening, or looking for inherited traits
4 Likes

Why do you think there is a catch? What is so much better about this than the equivalently priced product from Nebula Genomics?

I was surprised because last year I think they were priced at over $10K, and they are the original company that brought WGS to the market, so I didn’t think they’d be working to equal the prices of the low end competitors. Most of their clientele sign up for the $25K a year program, I think. See this story: Longevity Clinics: Human Longevity Inc., Dr. David Karrow

1 Like

I was intrigued with the notion that 30x or 100x represents some kind of average depth and asked Claude if these analyses were in in any way adaptive or otherwise intelligent.

Response
On the variance question: standard short-read WGS is emphatically not adaptive. The depth distribution emerges entirely from the stochastic shotgun process plus systematic biases, with no feedback loop that says “this base is now called with sufficient confidence, stop sampling here.” If everything were unbiased, you’d see roughly Poisson-distributed coverage with variance equal to the mean. In practice, real WGS is meaningfully over-dispersed relative to Poisson because of several structural problems:

  • GC content bias, especially in PCR-amplified libraries, where extreme-GC regions are under-amplified. Even PCR-free preps retain some GC effect from the sequencing chemistry itself.
  • Mappability: repetitive sequence, segmental duplications, and low-complexity regions accumulate reads that map ambiguously; MAPQ filters during variant calling then drop these, so the effective depth there is much lower than the raw read pile-up suggests.
  • Reference bias: reads carrying alleles different from the reference align less well, which under-represents certain variants — a larger problem for indels and SVs than for SNVs.
  • Library prep artifacts: tagmentation has sequence bias at insertion sites, homopolymers sequence poorly, certain motifs cause synthesis stalls.
  • Structurally inaccessible regions: centromeres, telomeres, rDNA arrays, large segmental duplications — effectively unmappable with short reads regardless of how deep you go.

The practical consequence is that at 30x mean depth, roughly 5–10% of callable autosomal positions sit below 10x, and a smaller fraction below 5x — with a heavy left tail in problem regions and a heavy right tail in copy-number-variable or biased sequence. So when people argue 30x vs 100x, they’re really arguing about two things: shrinking that left tail enough to trust calls in difficult regions, and gaining statistical power to discriminate low-frequency events (mosaicism, somatic variation, low-VAF artifacts) from background. For germline calling of well-mapped regions 30x is fine; for confidence in hard regions, SV detection, or anything sub-clonal, 100x is meaningfully better.

Adaptive sequencing does exist, just not in the Illumina world you’re likely working in. Oxford Nanopore’s ReadUntil / adaptive sampling can reject reads in real-time based on early signal classification, enriching specific regions during a run. That’s region targeting, though, not a per-base confidence loop. No current commercial short-read platform watches per-position confidence and decides when to stop.

[Speculation, flagged as such: I’d guess this hasn’t been engineered because the economics fight against it. Short-read sequencing is dominated by fixed costs — library prep, flowcell, run time — not marginal per-base costs, and the bias structure is well-characterized enough that “just run another flowcell” is cheaper than retrofitting real-time adaptive feedback into a throughput-optimized platform. The genuine gains live in long-read + adaptive sampling, which is where ONT (and to a lesser extent PacBio) is pushing.]

2 Likes

My understanding is that CRAM can be lossy by configuration but is structurally not so. Loss, if any, depends on what you tell the encoder to keep; i.e., which axes of “loss” matter and how much. The compression strategy is reference-based: instead of storing each base, CRAM stores the differences from a reference genome, plus enough metadata to reconstruct the read. To be lossless, you must have the reference file at decode time. (I have heard that you can embed the reference data in the file but this can present size problems.

If you choose loss, it can happen in three places:

Quality score compression. Q-scores can be kept verbatim or reduced by several levels. Since Q-scores are the single largest contributor to BAM/CRAM size, they are typically shrunk.

Read names. Can be preserved, replaced with auto-generated tokens (preserving pairing but losing the original instrument-level name), or dropped entirely. I’m not quite sure how this operates but it is claimed not to matter much for downstream analysis but might for forensics.

Auxiliary tags. A smaller contributor, you can select which BAM tags to retain. Common practice keeps the alignment-relevant tags (MD, NM, RG) and drops vendor-specific or large optional tags.

After listening to the Attia episode on genetic testing, I’ll have to wait to get my own. I don’t have a question where the answer couldn’t be obtained via biomarker testing, or a condition that genetics can only explain. And I definitely don’t need more data, just more insight.

He says:

Test with intention. Know what you’re looking for. Know what you do when you’ll find it out, and know what you do if you don’t.

2 Likes

I found this podcast interesting because he only really noted polygenic risk scores in passing. Compare this to Eric Topel’s Super Agers book where he extols the possibilities of polygenic risk scores. Attia dedicated almost all the show to a discussion of single/double genetic variants. As I seek to understand my own data, I do see the limitations of single/double genetic variations. Though I do note they seem helpful for pharmaceutical interactions. My guess is that only time will tell the true benefits of WGS and polygenic risk scores.

2 Likes

I’ve met Emily. This looks interesting…

2 Likes

Source: https://x.com/VamsiMootha/status/2059688512765452488?s=20

I wonder if it might be helpful to track the level of mtDNA mutations that a person has?

From Gemini:

Next-Generation Sequencing (NGS) architectures, including Whole Genome Sequencing (WGS) and targeted mitochondrial sequencing, can identify both the specific sequence variants in mitochondrial DNA (mtDNA) and quantify their heteroplasmy levels—the precise ratio of mutated mtDNA molecules to wild-type (normal) mtDNA molecules within a given sample.

Because cells contain hundreds to thousands of copies of the circular mitochondrial genome, quantifying this ratio is critical; the clinical or physiological impact of an mtDNA mutation is directly dependent on the percentage of shifted genomes.

Sequencing Methodologies and Detection Thresholds

The capacity to accurately quantify mtDNA mutation levels depends heavily on the specific sequencing approach and the depth of coverage achieved.

1. Standard Whole Genome Sequencing (WGS)

  • Mechanism: Standard WGS targets the entire cellular DNA extraction. Because mtDNA is highly abundant relative to nuclear DNA, a standard 30x nuclear genome sequencing run naturally yields an “off-target” mitochondrial coverage depth ranging from 100x to over 1,000x.
  • Sensitivity: This depth allows for the reliable detection and quantification of heteroplasmy levels down to approximately 1% to 5%. Any mutation existing below this frequency threshold generally falls into the baseline sequencing noise of standard WGS pipelines.

2. Targeted deep mtDNA Sequencing

  • Mechanism: This approach isolates or selectively amplifies the 16,569 base-pair mitochondrial genome using long-range Polymerase Chain Reaction (LR-PCR) prior to sequencing.
  • Sensitivity: By concentrating sequencing power exclusively on the mitochondrial genome, coverage depth frequently exceeds 10,000x to 100,000x. This extreme depth allows bioinformatic pipelines to confidently identify ultra-low frequency somatic mutations (micro-heteroplasmy) down to 0.1% or lower.

Technical Challenges and Confounding Factors

While technically feasible, accurate quantification of mtDNA mutations via sequencing must overcome two primary biological and methodological hurdles:

Nuclear Mitochondrial Segments (NUMTs)

Over evolutionary timescales, fragments of mtDNA have migrated and integrated into the nuclear genome, becoming pseudogenes known as NUMTs (Nuclear Mitochondrial DNA segments). Standard sequencing read-alignment tools can mistake these ancient, mutated nuclear fragments for true mitochondrial variants. Advanced bioinformatic filtering is mandatory to separate true mitochondrial reads from background NUMT sequences to prevent false-positive heteroplasmy readings.

Tissue Specificity and Mosaicism

The level of mtDNA mutations is not uniform throughout the human body. Somatic mtDNA mutations accumulate unevenly across different organs.

  • Blood (Liquid Biopsy): Easiest to sample, but rapidly dividing hematopoietic cells actively select against highly deleterious mtDNA mutations over time.
  • Post-Mitotic Tissues: High-energy, non-dividing tissues such as skeletal muscle, cardiac muscle, and cerebral cortex typically accumulate significantly higher levels of somatic mtDNA mutations with age.
  • Implication: A standard blood-derived WGS report may show 0% heteroplasmy for a specific mutation that sits at 40% heteroplasmy in the individual’s muscle tissue.

Relevance to Longevity and Geroscience

In the context of healthspan extension, mapping the accumulation of somatic mtDNA mutations provides a direct readout of mitochondrial decay.

Unlike nuclear DNA, mtDNA lacks protective histones and features less redundant repair mechanisms, leaving it highly susceptible to oxidative damage. While inherited mitochondrial diseases typically require a “biochemical threshold” of 60% to 90% heteroplasmy to manifest as clinical pathology, low-level age-related micro-heteroplasmy (sub-5% shifts spread across multiple loci) degrades electron transport chain efficiency, drives cellular senescence, and accelerates the energetic decline characteristic of biological aging.

While I agree with the general consensus, I do have a slightly different perspective. If an anti-aging intervention reduces Alzheimer’s risk at the expense of increasing cardiovascular mortality, the vast majority of people would rule it out immediately. However, for individuals who have undergone whole-genome sequencing and know they are predisposed to Alzheimer’s, this intervention becomes a compelling, albeit difficult, consideration.

1 Like

The problem is a question of which tissues to track it in. White Blood Cells are easy to access, but if you want to track it in any other type of tissue you need to start with a sample.

My personal view is to look at organ biomarkers and use those as a proxy for mtDNA mutations as they control protein production (and therefore function) via acetylation.

Hence for example you can assume if your kidney function is good that enough of the mtDNA in the kidney are in a good state.

Although the body shares mitochondria and therefore mtDNA this is a bit of a stochastic system of distributing the development clock.

That’s a good point. Also if people have very limited ability for prevention (limited time or resources) then knowing what to prioritize can be useful. This is in contrast to most of us that are super interested in longevity and try preventing everything.

I think all sorts of things have trade offs and hormones come to mind most obviously.

Estrogen is neuroprotective (and bone protective) but potentially a cancer risk factor. Say you have strong family history of breast cancer and AD - my sister.

Testosterone if you have a very low risk of CVD or higher risk of osteo or sarcopenia.

Tadalafil has good evidence for CVD protection but potentially harmful for LBD or maybe AD.

Some of these are all debatable but lots of things would seem to be harmful in one category and beneficial in others.

And never discount that some people are okay to die of CVD or cancer but don’t let them have AD. Or whatever preference for 1 disease over another that might not always be rational.

1 Like

$100 off…

1 Like

This is my new go recommendation for whole genome sequencing for two reasons:

First, they put a high value on privacy:

Second, their new .genome data format should greatly improve LLM token efficiency when analyzing a genome:

2 Likes

what are the differences between 3x, 30x, and 100x?

1 Like