The best sentences I have read the entire year - on the last day of 2025.
This is extremely important to understand:
Everyone’s hyped about “AI for Science.” in 2025! At the end of the year, please allow me to share my unease and optimism, specifically about AI & biology.
After spending another year deep in biological foundation models, healthcare AI, and drug discovery, here are 3 lessons I learned in 2025.
1. Biology is not “just another modality.”
The biggest misconception I still see:
“Biology is text + images + graphs. Just scale transformers.”
No. Biology is causal, hierarchical, stochastic, and incomplete in ways that language and vision are not.
Tokens don’t correspond cleanly to reality.
Labels are sparse, biased, and often wrong.
Ground truth is conditional, context-dependent, and sometimes unknowable.
We’ve made real progress—single-cell, imaging, genomics, EHRs are finally being modeled jointly—but the hard truth is this:
Most biological signals are not supervised problems waiting for better loss functions.
They are intervention-driven problems. They demand perturbations, counterfactuals, and mechanisms, beyond just prediction.
Scaling obviously helps. But without causal structure, scaling mostly gives you sharper correlations.
2025 reinforced my belief that biological foundation models must be built around perturbation, uncertainty, and actionability, not just representation learning.
2. Benchmarks are holding biology back more than compute is.
Let’s be honest: Benchmarking in AI & biology is still broken.
Everyone reports SOTA. Everyone picks a different dataset slice.
Everyone tunes for a different metric. Everyone avoids prospective validation.
We’ve imported the worst habits of ML benchmarking into a domain where stakes are much higher. In biology and healthcare, a 1% gain that doesn’t transfer is worse than useless—it’s misleading.
What’s missing isn’t more benchmarks. It’s hard benchmarks:
•Prospective, not retrospective
•Perturbation-based, not static
•Multi-site, not single-lab
•Failure-aware, not leaderboard-optimized
If your model only works on the dataset that created it, it’s not a foundation model—it’s a dataset artifact.
In 2026, we need fewer flashy plots and more humility, rigor, and negative results.
3. “Reasoning” in biology is not chain-of-thought.
There’s a growing tendency to directly apply the word reasoning onto biological LLMs.
Let’s be careful.
Biological reasoning isn’t verbal fluency, longer context windows, or prettier explanations. Those are surface-level improvements. Real reasoning in biology shows up elsewhere: in forming hypotheses, deciding which experiments to run, updating beliefs when perturbations fail, and constantly trading off cost, risk, and uncertainty.
A model that explains a pathway beautifully but can’t decide which experiment to run next is not reasoning, it’s narrating.
2025 convinced me that the future lies in agentic biological AI:
systems that couple foundation models with experimentation, simulation, and decision-making loops.
Closing thought:
AI & biology is not lagging behind AI for code or language. It’s just playing a harder game.
The constraints are real. The data is messy. The feedback loops are slow. The consequences matter.
If 2025 clarified anything for me, it’s this:
We won’t make progress by treating biology like text. We’ll make progress by building AI that behaves more like a scientist : skeptical, iterative, and willing to be wrong.
No comments:
Post a Comment