Which of the LLMs is best for asking health/medical/biochemistry of aging/experimental questions? How would you go about training one?

Especially questions about calorie restriction, going transgender, and preventative medicine, AND experimental therapies or dosage questions (esp if one doses phytochemicals way higher than normal - I often steep extreme quantities of tea, like 65g of green tea…) bonus points if it can meaningfully predict effects of drug interactions or optimize self testing protocol for determining ideal dose of a drug

Future locally trained LLMs will have YOUR ENTIRE MEDICAL CONTEXT and encourage you to collect ALL THE DATA (infinite context window). They may also train on voluntarily collected data from all users and provide more fine grained info on interventions or diet changes (LIKE CHANGES IN SEED OIL CONSUMPTION). if you photograph all your food and put in your Fitbit data, it makes your local LLM way easier to train. AND all your screen recording data (cf Richard ngo and now karpathy) so that you can more finely track food consumption on working memory, productivity, overall functioning, etc. ESPECIALLY if you include brainwave data. Screen recordings also make illegible people way more legible

Ideally you also should collect ALL your data like the Connor parish person, local LLMs are better at using it than lobotomized public LLMs like Claude… And you can get quantified self/longevity ppl to do the proper RLHF

And interaction effects

AND especially ones that don’t act as stochastic parrots and that really can explain the MEK pathway away

In particular, a GOOD shared LLM would make power users upload all their data, do the distillation and post training, and even tokenize reward from it. Prime intellect ppl might be fun to talk to on this front, though there may be some classes of computations harder to do with decentralized compute (maybe less so for a locally trained health LLM)

The plot of NGE might be relevant for some who want to pool all their data together, particularly to control hunger or find the cleanest least polluted food sources
https://connorparish.com/

1 Like

For experimental questions I found most of those I tried (ChatGPT o1 pro, Perplexity Pro Academic, Deepseek v3, Claude, Vera Health) quite bad. Open Evidence might be a bit better but you can only run a few queries as a non healthcare practitioners.

2 Likes

you can always use AI for age estimation of your biomarkers and radiology images…

@adssx… Question: can every individual put a few queries in “Open Evidence” ; and if so will we work as a group work together to formulate which questions will help the group forward with the quest for affordable longer health and lifespan strategies?

A few queries won’t do the job. You need a lot of back and forth to make progress. It’s like having an intellectual sparring partner. I’m quite hopeful for ChatGPT o3 given its performance on ARC AGI and Frontier Maths: it shows strong reasoning capabilities and not just regurgitating information that sounds smart but might be incorrect. Today ChatGPT is an Intellectual Yet Idiot as Taleb says.

1 Like

I’ve tried uploading spreadsheets or pdf’s of lab data and for some reason these LLM’s only see part of the data. Analysis is mostly generic stuff - no real insights

What you need is a “reasoning” AI, which is apparently the current bleeding edge. You might see if one of these new AIs helps or just wait some months for the new releases (e.g., OpenAI already has o3 (not yet released), which is said to be significantly better than o1). Prophecies of the Flood - by Ethan Mollick. As always, for best performance, make sure you are using the latest versions (many are not part of the free offerings of the various companies).

Some may also find this useful: For identifying relevant papers, Consensus (Sign Up - Consensus: AI Search Engine for Research) claims they are good (I haven’t used it much, yet, but look forward to it). They say:

"What is Consensus?
Consensus is an academic search engine, powered by AI, but grounded in scientific research. We use language models (LLMs) and purpose-built search technology (Vector search) to surface the most relevant papers. We synthesize both topic-level and paper-level insights. Everything is connected to real research papers.

The current source material used in Consensus comes from the Semantic Scholar database, which includes over 200M papers across all domains of science. We continue to add more data over time and our dataset is updated monthly.

Best Practices for Search Queries
When you input a question or a phrase (i.e. the benefits of mindfulness), we then run a custom fine-tuned language model that generates a question-relevant conclusion based on the user query and the abstract of the given paper. For every other type of search, we use the Key Takeaway that we extracted already."

Searching only abstracts for certain queries isn’t ideal, but the service still sounds good, for what it provides.