I have my WGS - what next?

My whole genome is sequenced and the files are ready to download.

I am reasonably proficient with R/RStudio. With some review, I could probably resurrect some python skills. I use Claude but only $20 / month version.

The reports that came with the sequencing are useless. What projects should I start with to analyze my data?

3 Likes
  • Alignment to T2T-CHM13 (via minimap2 or pbmm2, skipping BQSR).
  • Deep-Learning SNV/Indel Calling (via DeepVariant or Clair3).
  • Haplotype Phasing (via WhatsHap for cis/trans structural context).
  • Complex SV & STR Profiling (via Sniffles2 for structural variants + TRGT for tandem repeats).
  • Full-Spectrum Annotation (gnomAD v4, ClinVar, AlphaMissense, plus AnnotSV for long-read structures).
  • Custom Phenotype Filtering (via Slivar with specific biohacker logic/JavaScript).
  • Manual IGV Verification (Focusing on long-read soft-clipping at critical loci).
  • RAG-LLM Literature Association (Feeding phased VCF data to generate optimized, weighted lifestyle reports).

This is the overall rough workflow.

Here is what @cl-user is doing… I recommend you review all his posts on the topic:

@Cole many thanks for the reply. Let me a take some time to digest so I can respond intelligently.

@RapAdmin thank you for your reply. I am very impressed with cl-users work. I think I have a some ways to go in understanding the science of genetic pathways. Little by little I hope to get there.

1 Like

I finally just finished calculating my height polygenic risk score with Claude Opus and R. Frankly, it was real grind. In theory, there was a tutorial for it using R’s bigsnpr package, but it too me five weeks to get through it working two to three hours every couple days. Some lessons learned. One, you really need very intelligent model like Opus. Sonnet was just not up to the task. I used Opus with medium effort to balance token usuage with intelligence. High would be better but the sessions would have been much shorter. Two, whoever brought up usegalaxy.org / usegalaxy.eu was very wise. You need a cloud computer. We lost a lot of time trying to do it on my laptop but there were just too many computations. Three, I am very skeptical of anyone or any service that would give you all the major PRS scores. There seem to be lots of places in the pipeline where things can go sideways. The next one I will do with Claude is Coronary Artery Disease. I will update again after that.

2 Likes

Thanks for the update. Great to hear the progress. I’m curious what you think you’ve spent on this effort so far; it would seem that the major budget items are the WGS (did you go for 1X, 30X or 100X?)
And the Tokens for Claude Opus, and the useglaxy.org cloud computing fee… and of course the time.

I’m sure you have seen it already, but for others just stumbling upon this thread, we have a starter thread on Whole Genome Sequencing here: Whole Genome Sequencing

I paid $389 to sequencing.com for a 30x read.

The reports were disappointing. The reports basically just check single variants. This is frankly trivial with Claude helping you code a loop through the variants that matter.

For Claude, I am using the low $20 / version. I am not using Claude code though it might have been less frustrating. I was just a copy paste monkey and even trying to get Claude to the explain it simple terms was often over my head.

Use.galaxy is free. I am using the European server, use Galaxy.eu. I had trouble getting into the que to run jobs on the US version.

1 Like

Perhaps of interest: Ronjon Nag, Just Announced Superbio.ai - lets you work with large-scale data — genomics, proteomics, etc. 100GB+ — by asking questions in plain English

1 Like