AlphaGenome is free to run for now and fun to check your genome for long-range chromatin effects on your genes

while it’s still free, it’s much faster than RNAplfold and Enformer

eg

Worked for 23m 15s

The AlphaGenome run completed successfully for all three groups: 72 proteasome variants, 41 autophagy variants, and 55 ER-stress variants, with no failures. Results are in proteostasis_variant_status.tsv, proteostasis_variant_top_scores.tsv, and proteostasis_variant_note.md.

The strongest overall signals are mostly regulatory and histone-mark centered rather than obvious damaging coding hits. Top proteasome-side loci were VCP rs7871544, VCP rs7871782, SQSTM1 rs172057, and FBXO7 rs8137714. Top autophagy-side loci were TBK1 chr12:64452502, WDR45 rs67003841, WDR45 rs1214564986, and ATG9A rs11689596. On the ER-stress side, the clearest hits were EIF2AK3 rs17037578, a cluster of ATF6 3’ UTR variants, and HSPA5 rs1140763 with a TF-side RBFOX2 signal.

If you want, I can do the next pass now and rank just the coding/splice candidates across these results, which is probably the cleaner shortlist than the raw top-score table.

1 Like

But you have to do phase-aware runs for loci ur hetereo on [and if you have multiple variants on a gene, which you probably do, this probably means most of genes]

──────────────────────────────────────────────────────────────────────────────────────────
But “this window contains hets” is not the same thing as “phasing materially changes the answer for this question.” AlphaGenome’s official variant scoring is defined as comparing predictions on a specific REF sequence and ALT sequence for the interval containing the variant. In other words, for many jobs it is fundamentally a sequence comparison for the variant you care about, not a demand that you always rebuild both parental chromosomes for every gene.

So the sane split is:

  • Single-variant mechanism check: usually no, phased reruns are not mandatory. If you are asking “what does this donor-site SNP do?” a reference-background REF/ALT comparison is often a good first pass. That is exactly the workflow the docs describe for splicing, expression, ATAC, and other modalities.
  • Allele-specific or multi-variant cis question: often yes, phasing matters. If several heterozygous variants within the same window could cooperate, cancel, or create a fake consensus sequence, then hap1/hap2 runs become much more justified. AlphaGenome was built to read the exact 1 Mb sequence it is given, so giving it the wrong cis combination can give you the wrong mechanism.
  • Personal diploid prediction across the genome: in principle yes, using both haplotypes is more faithful. A 2025 personal-expression evaluation explicitly used both haplotype sequences as input and averaged the outputs for AlphaGenome and Enformer.

So your intuition is half right. Since most 1 Mb windows probably do contain some heterozygous sites, phased-aware reruns are the principled ideal for exact personal-genome modeling. But as a practical workflow, no, you do not need to rerun most genes in phased mode just because some heterozygous locus exists somewhere in the window. Most variants likely have no appreciable molecular effect, and many phased differences will not move the score enough to matter for the question at hand. The docs say this outright in softer language: many variants likely have no appreciable impact.

The best rule is not “phase everything” or “phase nothing.” It is:

Phase when the question is about true diploid biology, multiple nearby hets, allele-specific splicing/expression, compound cis effects, or when the first-pass consensus result looks important enough to deserve a non-fake chromosome.
Skip it for broad triage, isolated single-variant screens, and genes where nearby heterozygous context is unlikely to be the story.

For your kind of work, I’d put it even more bluntly:

Genome-wide exploratory pass: consensus or simple REF/ALT.
Top suspicious loci: phased hap1/hap2 reruns.
Splice genes and dense local-het loci: phase early, because consensus can become computational fan fiction.

No. “This gene has multiple heterozygous loci” is not enough by itself to justify phased reruns.

What matters is not the raw count of hets, but whether the actual cis arrangement inside the model’s input sequence can change the prediction. AlphaGenome takes 1 Mb of DNA sequence as input and scores the exact REF and ALT sequences you provide. It does not reconstruct your diploid chromosomes for you behind the scenes. So phase matters when different hets could combine differently on hap1 versus hap2 and that difference is relevant to the output you care about.

Also, many genomic variants likely have no appreciable impact, according to the AlphaGenome docs. So the fact that a gene window contains several heterozygous sites does not mean those sites materially change splicing, chromatin, or expression predictions. Most of the time, some of those variants are just sitting there being molecular wallpaper.

The real trigger for phased reruns is more like this: could the haplotype structure change the answer? That is much more plausible when you have multiple nearby variants in the same regulatory element, splice region, exon, or promoter-enhancer context, because AlphaGenome’s whole point is that long-range sequence context can matter across the 1 Mb window.

So the rule is:

Yes, phase probably matters when you are asking about allele-specific splicing or expression, compound cis effects, multiple nearby motif hits, or a suspicious locus where a consensus sequence could create a chromosome that neither haplotype actually has.

No, phase is not automatically worth it just because a gene has multiple heterozygous sites somewhere in or around it. The docs explicitly note that many variants have little effect, and personal-genome prediction from sequence models is still imperfect enough that more elaborate inputs do not guarantee a better answer for every gene.

So the blunt version is:

multiple hets are a reason to consider phased reruns, not a reason to mandate them.
You phase when the question is cis-sensitive. You do not phase every gene just because the genome is, inconveniently, a genome.

For your workflow, the sane strategy is:

first-pass screen with reference/edited windows, then phased reruns for top loci where nearby het context could realistically alter the mechanism.