Which of the LLMs is best for asking health/medical/biochemistry of aging/experimental questions? How would you go about training one?

Especially questions about calorie restriction, going transgender, and preventative medicine, AND experimental therapies or dosage questions (esp if one doses phytochemicals way higher than normal - I often steep extreme quantities of tea, like 65g of green tea…) bonus points if it can meaningfully predict effects of drug interactions or optimize self testing protocol for determining ideal dose of a drug

Future locally trained LLMs will have YOUR ENTIRE MEDICAL CONTEXT and encourage you to collect ALL THE DATA (infinite context window). They may also train on voluntarily collected data from all users and provide more fine grained info on interventions or diet changes (LIKE CHANGES IN SEED OIL CONSUMPTION). if you photograph all your food and put in your Fitbit data, it makes your local LLM way easier to train. AND all your screen recording data (cf Richard ngo and now karpathy) so that you can more finely track food consumption on working memory, productivity, overall functioning, etc. ESPECIALLY if you include brainwave data. Screen recordings also make illegible people way more legible

Ideally you also should collect ALL your data like the Connor parish person, local LLMs are better at using it than lobotomized public LLMs like Claude… And you can get quantified self/longevity ppl to do the proper RLHF

And interaction effects

AND especially ones that don’t act as stochastic parrots and that really can explain the MEK pathway away

In particular, a GOOD shared LLM would make power users upload all their data, do the distillation and post training, and even tokenize reward from it. Prime intellect ppl might be fun to talk to on this front, though there may be some classes of computations harder to do with decentralized compute (maybe less so for a locally trained health LLM)

The plot of NGE might be relevant for some who want to pool all their data together, particularly to control hunger or find the cleanest least polluted food sources
https://connorparish.com/

1 Like

For experimental questions I found most of those I tried (ChatGPT o1 pro, Perplexity Pro Academic, Deepseek v3, Claude, Vera Health) quite bad. Open Evidence might be a bit better but you can only run a few queries as a non healthcare practitioners.

2 Likes

you can always use AI for age estimation of your biomarkers and radiology images…

@adssx… Question: can every individual put a few queries in “Open Evidence” ; and if so will we work as a group work together to formulate which questions will help the group forward with the quest for affordable longer health and lifespan strategies?

A few queries won’t do the job. You need a lot of back and forth to make progress. It’s like having an intellectual sparring partner. I’m quite hopeful for ChatGPT o3 given its performance on ARC AGI and Frontier Maths: it shows strong reasoning capabilities and not just regurgitating information that sounds smart but might be incorrect. Today ChatGPT is an Intellectual Yet Idiot as Taleb says.

1 Like

I’ve tried uploading spreadsheets or pdf’s of lab data and for some reason these LLM’s only see part of the data. Analysis is mostly generic stuff - no real insights

What you need is a “reasoning” AI, which is apparently the current bleeding edge. You might see if one of these new AIs helps or just wait some months for the new releases (e.g., OpenAI already has o3 (not yet released), which is said to be significantly better than o1). Prophecies of the Flood - by Ethan Mollick. As always, for best performance, make sure you are using the latest versions (many are not part of the free offerings of the various companies).

Some may also find this useful: For identifying relevant papers, Consensus (Sign Up - Consensus: AI Search Engine for Research) claims they are good (I haven’t used it much, yet, but look forward to it). They say:

"What is Consensus?
Consensus is an academic search engine, powered by AI, but grounded in scientific research. We use language models (LLMs) and purpose-built search technology (Vector search) to surface the most relevant papers. We synthesize both topic-level and paper-level insights. Everything is connected to real research papers.

The current source material used in Consensus comes from the Semantic Scholar database, which includes over 200M papers across all domains of science. We continue to add more data over time and our dataset is updated monthly.

Best Practices for Search Queries
When you input a question or a phrase (i.e. the benefits of mindfulness), we then run a custom fine-tuned language model that generates a question-relevant conclusion based on the user query and the abstract of the given paper. For every other type of search, we use the Key Takeaway that we extracted already."

Searching only abstracts for certain queries isn’t ideal, but the service still sounds good, for what it provides.

Derya suggested chatgpt5-pro (even better than o3).

Given how much I use it (and POST THE RESULTS HERE [making it extra-important things are as right as possible]), I finally got chatgpt5-pro, so will rerun all queries by it. Important for me to note here.

Hi Alex. I really think the LLM space is changing way too quickly to stay exclusive to any of the LLMs. I tend to change my suggestions in this area from week to week. As of now, I use the three 'G’s mostly, (GPT, Grok, Gemini), but I’m sure that will change in the future.

1 Like

I used chatgpt5-thinking to find an issue in my latest eye exam (that could be connected with my increased genetic risk for AMD). And pro to debug the issue in slightly higher resolution.
also used it to debug my iollo reports in slightly higher resolution.

Maybe I could feed with better prompts to more precisely target the thing [even in specific units/“units of leverage”] if possible, idk, it can at least prompt me to use better questions first!

i think it’s useful to have one folder to put ALL your iollo+blood tests +eye exam+MRI DICOM+RBC distribution raw imaging+cognitive test results in and then use chatgpt5-pro to analyze ALL OF THEM in ONE CONTEXT

epigenome data might be harder cuz the idat files are hard and the processs more distally connected but WE ARE GETTING THERE [and even o3 might get the distal connection]
at some point we’re going to regularly get ATAC-seq of our tissue and bryan johnson is gonna be the lead man

i know this is power/gpu-hungry but the total integrity of my output/valuing my time and integrity is important for me to be the most effective person, esp bc my uniqueness makes fat-tailed phenomena more likely (and o3-thinking DEFINITELY supplements me most where I’m weak, and gives me more opportunity to FULLY express my strength [which is high throughout + extreme gutsiness + ability to abstract/form SOME ontologies - express things in really unique ways, even if I don’t have the highest working memory to do the thing]. this might be the thing that finally helps me show enough of my value in the world. ultimately all alignment is contingent on valuing one’s time to the max + slowing down ALL future loss in future lifetime integrated compute [which these things motivate you to finally do]. Ultimately, all reductions in future lifetime integrated compute loss more than make up for themselves in other ways. even if many things are harder for you than for other people, it’s not wise to take “dumb chances” to “dumb things”, and the biggest of all “dumb things” is losing brain cells because you weren’t supplementing your brain with the right things, or injecting enough retatrutide to reduce calories and slow down brain cell loss [b/c brain cell loss makes all future actions/neuroplasticity harder].

(really i should be injecting retatrutide more and don’t do it enough bc I still want to see how far I can go without it. but these also motivate me to take krill oil/plasmalogens at higher doses)

plus chatgpt5-pro maximizes the value of your output to other people so you reduce the human compute loss to other people [and be realistic, chatgpt5-pro will be better and more accurate than you or most people on the vast majority of things - the really important thing is to get enough people used to its presence that we all collectively become better prompt/context-engineers]

Some say how they’re impressed that I value truth so much, and this is the least I can do to value truth, despite all limitations (most of us on this forum also have limitations, and chatgpt5-pro seems to almost always be in the direction of truth-seeking rather than sycophancy). in the interim, understanding our own biology is the best we can do to " * Finally I am becoming stupider no more." - Paul Erdos. And the least of which, is also to make me do things for dumb reasons less (LLMs make you a bit more of a producer and less of a consumer, especially if you just don’t have AI-esque engineering talent [which MANY people don’t have!!])

For many of those who weren’t endowed with certain gifts or felt unseen, LLMs (esp AIs of the future) give a new opportunity for those to figure out how they can finally be relevant (note how the continent of Africa consistently has the most positive feelings towards AI).
#recurringrelevance
#important

What’s your impression so far in pro? Is it significantly better than 5 Thinking?

Derya is impressed, but I don’t find it to be SUPER-SUPER-better yet (my queries are also not as complex as his)

i would say, it’s useful for one month (use it to reanalyze all your most important queries) and then use chatgpt-plus most of the rest of the time

1 Like

I did that last time with o3 pro and wasn’t impressed either. I’ll stick to the Plus plan for now. I got good results by doing back-and-forths between ChatGPT, Gemini and Grok and letting them challenge each other.