The Biotech Revolution: How AI is Transforming Drug Discovery

In November 2020, DeepMind's AlphaFold 2 achieved something that structural biologists had spent fifty years trying to accomplish: it accurately predicted the three-dimensional folded structure of almost any protein from its amino acid sequence alone. The achievement was so definitive - outperforming all prior methods by margins that suggested a qualitative rather than quantitative advance - that many senior researchers described it as the most significant scientific development they had witnessed in their careers.

The downstream consequences of AlphaFold are still unfolding. The availability of high-accuracy structure predictions for virtually the entire human proteome, and the proteomes of most commercially significant pathogens and model organisms, has transformed the starting point for rational drug design. Where drug hunters once spent months and millions of dollars generating a crystal structure or cryo-EM dataset for a target of interest, they can now begin their work with a free, high-quality prediction. The question that the field is now grappling with is whether the same transformation can be achieved for the downstream steps of the drug discovery process.

The Drug Discovery Bottleneck

The dominant model of pharmaceutical drug discovery has remained remarkably stable for the past thirty years. A research team identifies a biological target - typically a protein whose dysfunction is associated with a disease state. They screen large chemical libraries, typically containing millions of compounds, against that target to identify "hits" - compounds that bind the target with measurable affinity. They then embark on a lengthy process of medicinal chemistry optimisation, progressively modifying the chemical structure of the hit compound to improve its potency, selectivity, metabolic stability, and other drug-like properties, while ensuring it can be synthesised reliably and safely. This lead optimisation process typically takes two to five years and consumes a substantial fraction of a drug programme's total research cost.

The inefficiency of this process is well-documented. Despite enormous investments in automation, computational chemistry, and combinatorial synthesis, the attrition rate of drug candidates in clinical development has remained stubbornly high. Approximately 90% of compounds that enter Phase I clinical trials fail to reach the market. The average cost of developing a new approved medicine, including the cost of failures, is estimated at $2 billion or more. The pipeline of genuinely novel mechanisms of action has been contracting, as pharmaceutical companies retreat toward established target classes where the risks are better understood.

The fundamental problem is not a shortage of compounds to test or targets to pursue. It is the inability of current predictive methods to accurately model the interactions between candidate drug molecules and biological systems with sufficient fidelity to guide the optimisation process efficiently. The most powerful predictive tool currently available - the medicinal chemist's intuition, honed by years of experience - is non-scalable, expensive, and highly variable in quality.

What AI Changes About Drug Discovery

The application of machine learning to drug discovery is not new - computational approaches have been part of the medicinal chemistry toolkit for decades. What is new is the dramatic increase in scale and capability enabled by large language models, diffusion models, and other foundation model architectures, combined with the accumulation of unprecedentedly large training datasets from pharmaceutical databases, genomics research, and high-throughput experimental programmes.

The most commercially advanced applications fall into several categories. Virtual screening - using machine learning to predict which compounds in a large library are likely to have activity against a target of interest - has been substantially improved by graph neural networks trained on large experimental binding datasets. Companies including Schrödinger and others have demonstrated meaningful improvements over physics-based approaches for certain target classes. Lead optimisation has been transformed by generative models that can explore the chemical space around a promising lead compound more systematically and intelligently than traditional analogue synthesis programmes.

The most exciting development, in our view, is the emergence of foundation models for molecular design that can generate novel chemical structures with specified activity profiles from scratch, rather than simply searching within known chemical space. These models, trained on datasets of 100 million or more experimental data points combining chemical structure with biological activity measurements, are beginning to demonstrate the ability to identify leads in chemical regions that traditional methods would never have explored. Our portfolio company MolPath AI is building in this space, and the early results from pharmaceutical partnership programmes are genuinely encouraging.

The Biological Complexity Caveat

Any responsible assessment of AI in drug discovery must grapple honestly with the biological complexity problem. Drug development fails not primarily because we cannot design molecules that bind their targets, but because we cannot predict whether binding the target will actually produce the desired therapeutic effect in a living organism, and will not produce unacceptable toxicity. These are problems of biological system complexity that are, at present, far beyond the ability of any AI system to fully model.

The cascade of events from drug-target binding to clinical outcome involves thousands of interacting proteins, feedback loops, compensation mechanisms, and patient-specific variables that collectively defy reduction to a tractable computational model. Every drug candidate that fails in Phase II - and most of them do - represents a case where early molecular optimisation was successful but the biological system behaved differently from what the preclinical models predicted.

This is not an argument against AI-assisted drug discovery. It is an argument for calibrated expectations about where in the drug development process AI creates genuine value. The areas where the impact is clearest and most near-term are in the early stages - target identification, structure prediction, hit generation, and lead optimisation - where the computational models are most tractable and the experimental feedback loops are fastest. The further downstream you go, into the clinic and toward human biology at full complexity, the less current AI approaches can help.

The Platform Business Model Question

From a venture investment perspective, one of the most important questions in AI drug discovery is whether the sustainable business model is to build a drug development platform and license it to pharmaceutical companies, or to use the platform to build a wholly-owned drug pipeline. The answer has significant implications for how to evaluate and value companies in this space.

Platform models offer faster revenue and de-risked business models, since they allow the company to earn service fees and milestone payments from pharmaceutical partners without bearing the full cost and risk of clinical development. The challenge is that platform competitive advantages are difficult to sustain - pharmaceutical companies are building their own AI capabilities, and the barriers to entry in computational chemistry software are lower than in proprietary chemistry or biology.

Pipeline models offer the potential for much larger ultimate returns, since successful drug approvals are worth billions of dollars. They also create more defensible competitive positions, since the intellectual property in a novel therapeutic compound is much harder to replicate than the intellectual property in a software platform. The challenge is the capital intensity and the long timeline to value realisation.

The companies we find most compelling are those that are using their AI platform to build a proprietary pipeline, but are structuring their operations to generate near-term platform revenue that covers a meaningful portion of their operating costs while the pipeline matures. MolPath AI, for example, has built a partnership business that funds a substantial portion of its own R&D spend, while retaining co-development rights on compounds identified from its platform. This is exactly the right approach for a capital-efficient biotech at the early stage.

The RNA Therapeutics Parallel

I want to draw a parallel that I think is genuinely instructive for understanding where AI drug discovery is in its development cycle. The mRNA therapeutics field, which delivered the most successful vaccines in history in 2021, had been in development for over thirty years before it achieved commercial success at scale. The underlying science - that you could encode instructions for protein synthesis in a single-stranded RNA molecule and deliver it to cells - was established in the 1980s and 1990s. The challenges of chemical modification to improve stability, lipid nanoparticle delivery systems, and manufacturing scale-up took decades to solve, and required sustained investment through multiple cycles of disappointment and renewal.

AI drug discovery is, I believe, at a comparable stage of its development. The fundamental science - that large neural networks trained on chemical and biological data can make useful predictions about molecular properties - has been established. The translation into commercial drug development pipelines is underway. The full transformative potential of the technology, across all stages of the drug development process, will not be realised for many years. But the investment case at the early stage is strong for the same reason the mRNA investment case was strong in the early 2000s: the science works, the commercial pathway is clear, and the companies building in this space have the potential to transform an industry that is ready to be transformed.

We are active investors across multiple AI drug discovery modalities - small molecules, RNA therapeutics, and diagnostic applications - and we expect this to remain a major focus of our investment activity for the foreseeable future.