Designing tomorrow’s biologics: how AI is enabling de novo protein discovery

Molecular recognition – proteins binding with specificity to other molecules – lies at the heart of biology. From immune defence to cellular signalling, life is orchestrated by protein interactions that have emerged over billions of years through evolutionary sequence searching. Yet nature’s reach, while extensive, has only scratched the surface of the immense protein sequence space. Artificial intelligence (AI)-powered methods are now offering a means to go beyond this natural repertoire, fundamentally changing our approach to biologics discovery.

Industrial biologics discovery has, by convention, been tethered to the antibody paradigm. Their natural role as binders has made them central to drug development, diagnostics and research tools. Traditional methods like animal immunisation and phage display dominate hit discovery pipelines. While effective, these approaches have intrinsic limitations; long timelines, high costs and reliance on sampling a narrow subset of potential sequences. Phage display libraries typically contain up to 109 sequences, and even the immune repertoire of a single animal may explore around 108 sequences. In contrast, the number of theoretical sequences for a modest 100-amino-acid protein exceeds 10130 .¹ The result is that vast swathes of the protein universe remain unexplored. Additionally, conventional discovery methods have failed for many validated therapeutic targets. According to the Therapeutic Targets Database, only a minority of known disease-involved targets have associated therapeutic protein binders.²

Generative AI (genAI) methods, adapted from fields like language modelling and image generation, are making this possible. Large language models such as ProtGPT2 have repurposed natural language architectures to learn the syntax and semantics of protein sequences, generating novel proteins with unprecedented diversity.4 Meanwhile, diffusion-based generative models, first developed for producing realistic images, have been reimagined for proteins. Tools like RFDiffusion and hallucination-based approaches embed structure prediction into iterative design frameworks, allowing AI to sculpt sequences that fold into highly specific 3D forms.5,6

Binder design is rapidly progressing across multiple modalities; miniproteins, peptides and even de novo antibodies. Hit rates for miniprotein binders have improved from <0.01% to over 1% in just a few years.6,7 Some of these hits now exhibit nanomolar or sub-nanomolar affinity, competitive with, or even superior to, traditional antibodies. De novo antibody design remains more challenging, particularly in modelling the flexible complementarity-determining region loops. Nonetheless, recent studies have reported micromolar-affinity designs, showing steady progress.8 While genAI is also applied in lead optimisation, refining known binders to improve pharmacological traits, the de novo design of binders represents a fundamentally new paradigm. It opens the door to discovering leads for targets previously considered undruggable and makes the early-stage biologics pipeline vastly more scalable.

References

Dill K A et al (2011), ‘The protein folding problem: when will it be solved?’, Curr Opin Struct Biol, 21(2), 187-193
Visit: db.idrblab.net/ttd
Jumper J et al (2021), ‘Highly accurate protein structure prediction with AlphaFold’, Nature, 596(7873), 583-589
Ferruz N et al (2022), ‘ProtGPT2 is a deep unsupervised language model for protein design’, Nat Commun, 13, 4347
Visit: nature.com/articles/s41586-023-06415-8
Jendrusch M A et al (2025), ‘AlphaDesign: A de novo protein design framework based on AlphaFold’, Molecular Systems Biology, 1-24
Cao L et al (2022), ‘Design of protein-binding proteins from the target structure alone’, Nature, 605, 551-560
Visit: pubmed.ncbi.nlm.nih.gov/38562682/
Ford Versypt A N (2021), ‘Multiscale modeling in disease’, Curr Opin Syst Biol, 27, 100340
Coveney P V et al (2016), ‘Big data need big theory too’, Phil Trans R Soc A, 374(2080), 20160153

Designing tomorrow’s biologics: how AI is enabling de novo protein discovery

The AI renaissance: predicting and designing proteins

A paradigm shift in biologics discovery

The challenges ahead: toward AI-designed drugs

The systemic view: physics, complexity and integration