Digital: AI & ML

Designing tomorrow’s biologics: how AI is enabling de novo protein discovery

Artificial intelligence-powered de novo protein design is opening new pathways in biologics discovery, allowing researchers to create synthetic proteins beyond nature’s existing repertoire

Kashif Sadiq at DenovAI Biotech

Molecular recognition – proteins binding with specificity to other molecules – lies at the heart of biology. From immune defence to cellular signalling, life is orchestrated by protein interactions that have emerged over billions of years through evolutionary sequence searching. Yet nature’s reach, while extensive, has only scratched the surface of the immense protein sequence space. Artificial intelligence (AI)-powered methods are now offering a means to go beyond this natural repertoire, fundamentally changing our approach to biologics discovery.

For over a century, natural proteins like insulin and, later, antibodies, have served as the basis for biological therapeutics. But until recently, protein engineering has largely relied on nature’s templates, discovering or modifying proteins already available via natural processes.

Industrial biologics discovery has, by convention, been tethered to the antibody paradigm. Their natural role as binders has made them central to drug development, diagnostics and research tools. Traditional methods like animal immunisation and phage display dominate hit discovery pipelines. While effective, these approaches have intrinsic limitations; long timelines, high costs and reliance on sampling a narrow subset of potential sequences. Phage display libraries typically contain up to 109 sequences, and even the immune repertoire of a single animal may explore around 108 sequences. In contrast, the number of theoretical sequences for a modest 100-amino-acid protein exceeds 10130 .¹ The result is that vast swathes of the protein universe remain unexplored. Additionally, conventional discovery methods have failed for many validated therapeutic targets. According to the Therapeutic Targets Database, only a minority of known disease-involved targets have associated therapeutic protein binders.²

While structural biology has enabled rational engineering of known proteins, this process is complex, costly and slow, often limiting innovation to safer, more tractable targets.

Image

The result is a risk calculus that discourages tackling difficult diseases and stymies progress against unmet medical needs.

The AI renaissance: predicting and designing proteins

AI is rapidly dismantling the constraints of traditional protein discovery. The landmark moment was the emergence of DeepMind’s AlphaFold, a deep learning model that achieved unprecedented accuracy in predicting protein structures from sequence alone.³ This breakthrough catalysed progress across structural biology, enabling researchers to model proteins that had resisted crystallographic methods for decades.

But protein design is the inverse of structure prediction: instead of asking ‘what shape will this sequence take?’, it asks ‘what sequence will produce a desired shape and function?’ This is the essence of de novo protein design – creating proteins entirely from scratch to fulfil a set of engineered features.

Generative AI (genAI) methods, adapted from fields like language modelling and image generation, are making this possible. Large language models such as ProtGPT2 have repurposed natural language architectures to learn the syntax and semantics of protein sequences, generating novel proteins with unprecedented diversity.4 Meanwhile, diffusion-based generative models, first developed for producing realistic images, have been reimagined for proteins. Tools like RFDiffusion and hallucination-based approaches embed structure prediction into iterative design frameworks, allowing AI to sculpt sequences that fold into highly specific 3D forms.5,6

These innovations enable not just the creation of enzymes or protein self-assemblies, but synthetic binder proteins – novel molecules capable of targeting disease-relevant proteins beyond the reach of antibodies.

A paradigm shift in biologics discovery

AI-driven de novo binder design represents a fundamental shift in process. Traditional methods rely on massive experimental screening; in contrast, genAI transfers the bulk of that search into in silico space. This approach breaks free from nature’s limited sampling, exploring protein sequence space orders of magnitude larger than any lab library could.

Binder design is rapidly progressing across multiple modalities; miniproteins, peptides and even de novo antibodies. Hit rates for miniprotein binders have improved from <0.01% to over 1% in just a few years.6,7 Some of these hits now exhibit nanomolar or sub-nanomolar affinity, competitive with, or even superior to, traditional antibodies. De novo antibody design remains more challenging, particularly in modelling the flexible complementarity-determining region loops. Nonetheless, recent studies have reported micromolar-affinity designs, showing steady progress.8 While genAI is also applied in lead optimisation, refining known binders to improve pharmacological traits, the de novo design of binders represents a fundamentally new paradigm. It opens the door to discovering leads for targets previously considered undruggable and makes the early-stage biologics pipeline vastly more scalable.

The challenges ahead: toward AI-designed drugs

Despite its promise, de novo hit discovery is only the beginning. A binder is not a drug. Beyond affinity, therapeutic proteins must possess a suite of optimised properties – stability, solubility, specificity, low immunogenicity, favourable pharmacokinetics and more.

Currently, these properties are addressed during iterative lead optimisation, often aided by AI. But an emerging vision is that of single-shot design, where all necessary drug-like features are engineered simultaneously during initial binder design. Such a leap could compress development timelines, reduce costs and accelerate translation from design to clinic.

The convergence of AI and protein science is significantly advancing biologics discovery – moving from modifying nature’s proteins to designing our own

However, this vision faces significant hurdles. High-quality data for model training remains sparse, particularly for complex properties like immunogenicity or cell-type specificity. While deep learning excels in pattern recognition, it struggles with limited or noisy data. Moreover, protein function is shaped not just by structure but by dynamics, interactions and cellular context – all challenging to capture with static models.

The systemic view: physics, complexity and integration

Proteins are dynamic, atomically detailed systems governed by the laws of physics. They exist not in isolation, but in crowded, multi-scale biological environments. Capturing this complexity requires more than sequence-to-structure mappings. It calls for integrative modelling; blending data-driven AI with mechanistic, physics-based simulations.

Multi-scale modelling has already been used to simulate protein interactions, cellular pathways and tissue-level responses.9 A future biologics platform might integrate sequence design, structural prediction, functional modelling and systemic impact into a unified pipeline. This would enable not just accurate design of binders, but reliable prediction of their effects within the human body, without exhaustive experimental screening.

The road ahead still presents obstacles. Achieving a single-shot design of drug-ready biologics will require tackling challenges in data quality, model generalisability and systemic understanding. Progress will depend on merging deep learning with physics-based modelling and embracing the full complexity of biological systems.10 In doing so, we may eventually reach a future where we no longer need to search blindly for molecules that work, but can simply design them to do exactly what we want.

This approach represents a convergence of AI and fundamental science. AlphaFold, despite its transformative impact, tackled a clean problem, predicting structure from data on solved proteins. Biologics design in the real world is messier. It requires integrating heterogeneous data sets across diverse biological scales, incorporating uncertainty and reasoning beyond current empirical boundaries.

The convergence of AI and protein science is significantly advancing biologics discovery – moving from modifying nature’s proteins to designing our own. GenAI methods have transformed de novo binder design from an academic curiosity into a viable alternative to traditional screening, opening new doors for therapeutic innovation.

References

  1. Dill K A et al (2011), ‘The protein folding problem: when will it be solved?’, Curr Opin Struct Biol, 21(2), 187-193
  2. Visit: db.idrblab.net/ttd
  3. Jumper J et al (2021), ‘Highly accurate protein structure prediction with AlphaFold’, Nature, 596(7873), 583-589
  4. Ferruz N et al (2022), ‘ProtGPT2 is a deep unsupervised language model for protein design’, Nat Commun, 13, 4347
  5. Visit: nature.com/articles/s41586-023-06415-8
  6. Jendrusch M A et al (2025), ‘AlphaDesign: A de novo protein design framework based on AlphaFold’, Molecular Systems Biology, 1-24
  7. Cao L et al (2022), ‘Design of protein-binding proteins from the target structure alone’, Nature, 605, 551-560
  8. Visit: pubmed.ncbi.nlm.nih.gov/38562682/
  9. Ford Versypt A N (2021), ‘Multiscale modeling in disease’, Curr Opin Syst Biol, 27, 100340
  10. Coveney P V et al (2016), ‘Big data need big theory too’, Phil Trans R Soc A, 374(2080), 20160153


Image

Dr Kashif Sadiq is the founder and CEO of DenovAI Biotech, a techbio company pioneering AI-driven de novo protein design for therapeutic and biotech applications. With a 20-year academic career spanning molecular physics, protein dynamics, multi-scale modelling and AI-based biologics discovery, Dr Sadiq has held leading research positions across Europe and is recognised for his work at the intersection of computational biology and pharmaceutical innovation.