From AlphaFold to the Clinic: How Protein Structure Prediction Is Transforming Therapeutic Design

Structural biology protein visualisation

In November 2020, DeepMind's AlphaFold2 achieved what the structural biology community had considered one of the field's grand challenge problems: predicting the three-dimensional structure of a protein from its amino acid sequence with accuracy comparable to experimental determination. The Critical Assessment of Protein Structure Prediction (CASP14) competition results were unambiguous — AlphaFold2 outperformed all competing methods by a margin that made the result feel less like an incremental advance and more like a phase transition.

The subsequent open-access release of AlphaFold predictions for essentially the entire human proteome — approximately 20,000 protein structures — and then the expanded database covering over 200 million proteins across the tree of life, represented one of the most significant scientific data releases in the history of biology. For drug discovery, the implications are profound and still unfolding. But the relationship between structural prediction and therapeutic development is more complex than the initial headlines suggested. Understanding the opportunities and the limitations is essential for anyone seeking to invest in, or build, the next generation of computational biology companies.

What AlphaFold Solved and What It Did Not

AlphaFold2 solved the problem of static protein structure prediction with remarkable accuracy — median TM-score above 0.9 for most globular proteins, which by the standards of experimental structural biology represents an impressive result. This achievement unlocks several capabilities that were previously bottlenecked by the time and cost of experimental structure determination. X-ray crystallography, cryo-electron microscopy, and NMR spectroscopy are powerful but expensive, slow, and not applicable to all proteins. For the large fraction of the human proteome that had no experimentally determined structure, AlphaFold2 provides a structural reference that is good enough for many downstream applications.

However, several limitations are worth understanding carefully. First, AlphaFold2 predicts equilibrium structures of isolated proteins. Most proteins in biological systems exist not in isolation but in complexes with other proteins, nucleic acids, and small molecules. While AlphaFold-Multimer and subsequent developments have extended the approach to protein complexes, prediction of large, dynamic, heterogeneous assemblies remains substantially more challenging. Second, proteins are not static objects. They fluctuate, undergo conformational changes, and adopt different states depending on context. The AlphaFold2 structure represents the most probable conformational state, but cryptic binding sites — pockets that are accessible only in alternative conformations — are invisible to a static prediction. Third, the model's confidence estimates (pLDDT scores) flag intrinsically disordered regions, which constitute approximately 30 percent of the human proteome and are particularly important as therapeutic targets because they mediate many protein-protein interactions involved in disease.

The practical consequence is that AlphaFold structures are an excellent starting point for structure-based drug design, but they are not a replacement for careful experimental characterisation of target biology. The companies best positioned to create value from this technology are those that understand where the model is reliable, where it requires experimental validation, and how to integrate structural predictions with the broader biochemical and cellular context.

The Structure-Based Drug Design Revolution

Structure-based drug design (SBDD) is not a new concept. The HIV protease inhibitors developed in the late 1980s and 1990s were among the first drugs explicitly designed using three-dimensional structural information. Since then, SBDD has become a standard component of pharmaceutical industry drug discovery, with structural data guiding lead optimisation across most major therapeutic programmes.

What AlphaFold has changed is the economics and the reach of this approach. Previously, obtaining a high-quality crystal structure for a target of interest required months of work by specialist structural biologists and, frequently, significant investment in expression, purification, and crystallisation screening that did not always succeed. For a company with finite resources and many potential targets, the structural biology bottleneck constrained which targets could be prosecuted computationally. AlphaFold essentially eliminates this bottleneck for targets with no known binding partner structure.

The downstream impact is clearest in fragment-based drug discovery and virtual screening. Fragment screens — which test thousands of low-molecular-weight compounds for weak binding to a target protein — have historically required experimental structures to interpret binding modes and drive fragment elaboration. With AlphaFold structures, fragments can now be docked computationally against predicted binding sites with a fidelity that, for well-folded domains, is sufficient to prioritise experimental follow-up. Several pharmaceutical companies have reported 2-4x improvements in the efficiency of their virtual screening cascades since incorporating AlphaFold structures, though the ultimate driver of progress remains experimental binding data.

The more transformative opportunity, however, lies not in traditional SBDD but in the application of generative AI models that learn from structural data. This is where companies like our portfolio company MolPath AI are building capabilities that go substantially beyond what was possible before the structural proteomics revolution.

Generative Models for Molecular Design: The New Frontier

Traditional computational drug design is fundamentally a search problem. Given a protein binding site, the goal is to identify molecules that bind tightly and selectively. Virtual screening libraries, even large ones, sample only a tiny fraction of the approximately 10^60 molecules in chemical space. Docking algorithms are approximate, and their scoring functions have well-documented limitations in predicting affinity and selectivity reliably.

Generative models approach this problem differently. Rather than searching a fixed library, they learn a probability distribution over chemical space — or, in more sophisticated implementations, over the joint space of molecular structures and protein binding sites — and sample novel molecules from that distribution with desired properties. The training data for these models combines chemical structure databases (ChEMBL contains over 2 million bioactive molecules with experimental activity data), protein structure databases (the PDB contains over 200,000 experimental structures, now complemented by the 200 million AlphaFold predictions), and increasingly, high-throughput screening datasets that capture structure-activity relationships at unprecedented scale.

The current generation of generative models for drug design falls broadly into two paradigms. Structure-based generative models — represented by approaches like DiffSBDD, RFDiffusion for protein design, and various equivariant graph neural network architectures — learn to generate ligands conditioned on the three-dimensional structure of a binding site. These models can in principle generate novel chemical scaffolds that are geometrically and chemically complementary to a target, without being constrained by existing bioactive chemical matter. Their practical utility depends critically on the quality of the structural input: for precisely determined binding sites with well-understood binding modes, they can generate genuinely useful design hypotheses. For predicted structures with uncertain binding site geometries, the output requires careful experimental validation.

Property-based generative models — represented by approaches such as REINVENT (Astrazeneca's open-source tool), graph variational autoencoders, and diffusion models on molecular graphs — learn to generate molecules with desired physicochemical and pharmacological properties. These models are more target-agnostic and are particularly powerful when combined with predictive models for ADMET properties (absorption, distribution, metabolism, excretion, and toxicity) — the pharmacological profile that determines whether a compound can become a drug in the clinic.

MolPath AI, in which Lumino Capital invested in late 2023, has built a platform that integrates both paradigms. Its architecture combines a structure-based generator trained on AlphaFold2 structures and experimental complex structures with a multi-task ADMET predictor trained on over 200 million data points from public and proprietary datasets. The system generates novel molecular candidates conditioned simultaneously on target binding site complementarity and predicted ADMET profile, generating a multi-objective Pareto front that drug hunters can explore interactively. In internal validation studies against historically prosecuted targets, the system identified lead-quality molecules in an average of 14 days of computational screening, compared to an industry average of 6-18 months for traditional high-throughput screening followed by lead optimisation.

Targeting the Undruggable Proteome

Perhaps the most significant scientific opportunity opened by the structural proteomics revolution is the possibility of addressing protein targets that have historically been considered undruggable. Approximately 80 percent of human proteins associated with disease have no approved drug targeting them. Of this undruggable proteome, a large fraction falls into categories that have resisted traditional small molecule drug discovery: transcription factors, intrinsically disordered proteins, and large protein-protein interaction surfaces.

Transcription factors control gene expression by binding to specific DNA sequences and recruiting co-activator or co-repressor complexes. They are among the most important drivers of oncogenesis and many other diseases. But their DNA-binding domains are typically small and highly charged — difficult to target with small molecules — and their activation domains are often intrinsically disordered. AlphaFold structures reveal, in many cases, previously unrecognised structural elements in transcription factor activation domains that may represent allosteric binding sites accessible to small molecules. Several recent academic publications have described the use of AlphaFold structures to identify cryptic pockets in transcription factors like MYC and p53 that were not apparent from previous structural analysis.

Protein-protein interactions (PPIs) are another historically intractable target class. PPI interfaces are typically large and flat — presenting few of the deep hydrophobic pockets that small molecules bind well. But not all PPI interfaces are equal: "hot spot" residues contribute disproportionately to binding energy, and targeting hot spot regions with fragment-based approaches has produced PPI inhibitors for a growing number of targets including MDM2/p53 (Navitoclax), BCL-2 (Venetoclax), and BET bromodomains. AlphaFold protein complex predictions now make it possible to computationally identify PPI hot spots across the entire interactome, opening a systematic approach to PPI inhibitor discovery that was previously bottlenecked by the difficulty of experimentally determining complex structures.

Targeted protein degradation — exemplified by PROTAC and molecular glue technologies — provides an orthogonal approach to the undruggable proteome by exploiting the cellular degradation machinery rather than trying to directly modulate protein activity. Structural prediction is becoming central to PROTAC design: ternary complex models (predicting the structure of the target-PROTAC-E3 ligase complex) are increasingly being used to guide linker length and geometry, improving degradation efficiency. Several academic groups have published promising results using AlphaFold-Multimer ternary complex models, though the accuracy of these predictions for PROTAC complexes is still lower than for globular protein targets.

From Target to Clinic: The Pipeline Reality

Despite the genuine excitement about computational drug design, it is important to maintain a clear-eyed view of where the field stands in relation to clinical impact. As of early 2026, no drug has yet reached clinical approval that was designed de novo using generative AI methods, though several candidates generated with AI assistance are in clinical trials. Insilico Medicine's INS018_055, a fibrosis candidate designed using generative AI, entered Phase II trials in 2023 and represents the most advanced clinically validated example of AI-designed drug design.

The pipeline reality is that computational methods — including AlphaFold-based SBDD and generative molecular design — are most powerful at the earlier stages of drug discovery: target identification, hit generation, and lead optimisation. These are precisely the stages where traditional drug discovery is most expensive and time-consuming. The FDA approval process itself — Phase I, II, and III clinical trials — remains a biological and regulatory process that computational tools cannot shortcut. A company that reduces lead identification from 18 months to 3 months still faces the same 10-15 year clinical development timeline as its traditional competitors.

This has important implications for investment. The companies that will create the most value from computational drug design are not necessarily those that discover the most drugs, but those that build the most defensible data-generating assets and the most scalable platforms for applying those assets across multiple therapeutic areas. Platform value is accumulative: each programme generates data that trains better models and validates more biological hypotheses, creating a widening competitive moat over time.

The European Computational Biology Ecosystem

Europe has several structural advantages in computational biology that are worth acknowledging. The European Bioinformatics Institute in Cambridge, the Pasteur Institute in Paris, the Max Planck Institute for Biophysical Chemistry in Göttingen, and the Wellcome Sanger Institute are among the world's most productive computational biology research institutions. DeepMind itself, while now part of Alphabet, was founded in London and maintains significant research operations here.

The regulatory environment in the EU, while more conservative than the FDA in some respects, is becoming more sophisticated in its treatment of AI-assisted drug discovery. The EMA's PRIME designation scheme for breakthrough therapies is applicable to AI-discovered compounds, and the agency has published increasingly detailed guidance on the use of computational predictions in regulatory submissions. For companies building products targeting European markets, understanding the regulatory pathway for computationally derived data is a strategic advantage.

At Lumino Capital, we are actively building our pipeline of investments at the intersection of structural biology, computational chemistry, and therapeutic development. The combination of a stronger academic base, more affordable computational talent relative to San Francisco, and access to first-class hospital systems for translational biology makes Europe a compelling location for the next generation of companies in this space. We believe the companies that will transform drug discovery over the next decade will include several built in London, Munich, Amsterdam, and Stockholm.

Conclusion

The protein folding revolution has genuinely transformed the landscape for therapeutic drug design. AlphaFold2 has removed a long-standing structural biology bottleneck and enabled applications of structure-based drug design across the entire human proteome for the first time. The generative AI models that are being trained on this structural data are beginning to demonstrate genuine capability in novel molecular design, as evidenced by compounds entering clinical trials. The undruggable proteome — the 80 percent of disease-relevant proteins that have resisted traditional drug discovery — is becoming more tractable.

These developments create a compelling opportunity for investors who understand both the biology and the technology. The companies that will succeed are those that combine deep scientific expertise in structural biology and medicinal chemistry with genuine machine learning capability, and that build the data generation infrastructure needed to train models that improve over time. Those companies exist, they are being founded by some of the best-trained scientists in Europe, and they represent — in our view — one of the most important investment categories of the coming decade.