Discovery & Development: Proteomics

How proteomics and digital transformation can redefine therapeutic discovery

How are recent advances in proteomics, such as artificial intelligence and machine learning, transforming drug discovery and development?

Veronica L DeFelice at Sapio Sciences

The human proteome represents an extraordinary reservoir of untapped therapeutic potential, offering unprecedented opportunities for transformative discovery. Recent advances in genomics and structural biology have opened new pathways to previously inaccessible protein targets, including those with dynamic binding sites and complex regulatory networks. These molecular advances hold promise for developing new treatments targeting diseases that lack conventional solutions.

The fusion of advanced proteomics, artificial intelligence (AI)-driven analytics and robust laboratory informatics is transforming drug discovery. This convergence offers a direct, dynamic view of functional biology – illuminating proteins not just as passive outputs of genetic expression but as active participants in disease pathology. Critically, it enables researchers to systematically identify, validate and prioritise novel therapeutic targets, including those involving intrinsically disordered regions long considered intractable.

These insights are also fuelling the development of next-generation therapeutics, including proteolysis-targeting chimeras (PROTACs), antibody-drug conjugates (ADCs) and RNA-based medicines that can engage biological mechanisms with unprecedented precision.

How advanced proteomics redefines target discovery

With over 100,000 known 3D protein structures, tools like PyMOL are helping visualise structure-function relationships of biologics and researchers to analyse 3D structures.1 Transcriptomics and proteomics are both powerful research approaches, and they focus on different aspects of cellular activity. Transcriptomics studies the full set of RNA molecules (the transcriptome) to reveal gene expression patterns. Proteomics, in contrast, examines the complete set of proteins (the proteome), including their modifications and interactions, offering insight into functional molecular activity.2

Instead of relying on assumptions, researchers now have direct access to the protein layer. This includes not just presence or absence, but also quantity, condition and behaviour. That distinction matters because many diseases are caused not by missing proteins but by proteins that are misfolded, overactive or altered by post-translational modifications.

Quantitative proteomics techniques, such as stable isotope labelling by amino acids in cell, tandem mass tag (TMT), and isobaric tags for relative and absolute quantitation, let scientists measure changes across samples and time points with high precision. Top-down proteomics, which analyses intact proteins instead of fragments, allows researchers to observe proteoforms – distinct versions of proteins shaped by modifications, splicing or truncation – that may behave differently in health and disease.

While some proteins are more complex, proteomics can assist researchers in the analysis and data gathering pertaining to various targets. Even complex proteins can be analysed with modern proteomic techniques, enabling comprehensive target profiling.

Expanding targets with novel therapeutic modalities

While the disease landscape may remain largely unchanged, the definition of target identification is evolving. Traditional therapies focus on well-structured proteins, such as enzymes and receptors. However, many of the most critical disease drivers, particularly in oncology, neurology and immunology, lack conventional binding sites.

PROTACs can degrade proteins without requiring a deep active site.3 This makes them ideal for targets like KRAS or transcription factors such as STAT proteins.4 As of 2024, three PROTACs, ARV-471 (phase 3), BMS-986365 and BGB-16673 (phase 3), are in late-stage trials.4 About six others are in phase 1 or 2, and development has accelerated in recent years, with ARV-110 and ARV-471 being among the most advanced.5 Rather than inhibiting a protein’s function, PROTACs recruit the cellular degradation machinery (E3 ubiquitin ligases) to tag and remove proteins entirely. Designing effective PROTACs still requires insight. Proteomics enables this by using tools like TMT labelling to measure selectivity, track off-target effects and map downstream signalling. Workflows such as target-guided degradation optimisation and DEL-PROTAC test candidates in parallel across the proteome. depends on delivery and correct protein expression. Single-cell proteomics reveals uptake patterns and confirms that encoded proteins are properly folded and functional. Likewise, it helps identify early biomarkers of therapeutic response and adverse effects. Proteomics unites these modalities, enabling discovery and validation across therapeutic strategies. It’s the system that enables discovery produced by a biological system in the context of therapeutic proteins.

Image

Proteomics is not just expanding the target landscape; it is redefining drug design itself.

ADCs combine precision targeting with cytotoxic payloads. Their success depends on identifying tumour-specific surface proteins. Proteomics provides this level of specificity by mapping the ‘surfaceome’ of cancer cells. In Alzheimer’s research, for instance, ADCs targeting amyloid are tracked in cerebrospinal fluid to monitor both plaque clearance and inflammatory biomarkers.6

Managing the complexity of modern discovery

Modern proteomics generates vast data. One experiment can capture thousands of proteins across time points, conditions and even individual cells. That’s an incredible advance, but it brings challenges, not in data generation, but in data interpretation.

RNA-based therapies, including mRNA vaccines, siRNAs and antisense oligos, also benefit. Their effectiveness

A single study may produce terabytes of proteomic data. These data sets are complex, and annotations may be incomplete. Even well-designed experiments can have missing values or batch effects. Often, the number of protein variables far exceeds the number of samples, limiting the use of traditional statistical methods. Meaningful insight depends not on more data, but on more connected data.

Without structure, insights remain hidden. That’s why standardised, reproducible workflows are essential. Tools like laboratory information management systems (LIMS) and electronic laboratory notebooks (ELNs) help bring order. They connect samples to methods and researchers to results. When used well, they ensure that data can be shared, repeated and trusted.

To understand complex disease biology, researchers must look across data types. Proteomic data must be viewed alongside genomics, transcriptomics, metabolomics and clinical outcomes. This is the promise of multi-omics integration: deeper insight from diverse sources. The UK Biobank offers a model. By linking proteomic, genomic and clinical data, researchers have uncovered causal drivers, such as spondin-1 in cardiac disease.7 This reduces the risk of pursuing ineffective targets and guides rational design.

Achieving this clarity requires infrastructure: machine learning (ML) systems that can model high-dimensional data; deep learning models that detect hidden features; and knowledge graphs that map relationships across data sets. Better connections between data are what unlock precise discovery.

Image

Rational drug design powered by digital transformation

For years, the bottleneck in drug discovery was data generation. Today, the challenge lies in converting complex, high-dimensional data into actionable insight.8 Digital transformation plays a pivotal role – it is a fundamental shift in how scientific research is conducted. This evolution involves connecting people, instruments and processes through intelligent, interoperable systems.9

At the core of this transition are LIMS and ELNs. These systems form the backbone of data capture, regulatory compliance and traceability. A well-configured LIMS can track every sample, often using barcodes or radio-frequency identification to reduce errors. ELNs document methods and observations in searchable, version-controlled formats. When integrated, LIMS and ELNs streamline workflows and ensure data reliability for downstream analysis, including ML applications.

However, data organisation is only the beginning. The critical value lies in interpretation.

AI and ML assist researchers in identifying patterns beyond human perception.10 AI models can integrate structural biology, gene expression and proteomic data sets to prioritise drug candidates. Deep learning algorithms simulate protein-ligand binding, even in intrinsically disordered proteins that lack defined binding pockets. Knowledge graphs further map complex biological interactions into frameworks that are both searchable and testable.

Achieving this clarity requires infrastructure: ML systems that can model high-dimensional data; deep learning models that detect hidden features; and knowledge graphs that map relationships across data sets

The outcomes are tangible. Drug candidates can be screened earlier, off-target risks flagged sooner and attrition rates reduced. AI-driven platforms, such as Atomwise and Insilico Medicine, already use these capabilities to model molecular docking and rank compound libraries. These technologies have shortened development timelines while improving candidate quality.

In this context, rational design does not require reinventing pharmacology. Instead, it calls for data-driven decision-making – from initial target selection to molecular optimisation. It represents a move toward smarter, faster and more predictive therapeutic development.

Conclusion

Biopharma R&D is no longer constrained by the ability to observe biology but by the capacity to interpret it clearly, consistently and at scale. The convergence of proteomics, digital infrastructure and innovative therapeutics marks a paradigm shift in biomedicine. It is already transforming how targets are identified, how biomolecules are designed and how therapies are brought to patients. By combining experimental proteomics with machine-learned predictions and target safety profiles, researchers are beginning to study more of the human proteome previously labelled as deeply complex.

References

  1. Visit: nature.com/articles/s41586-021-03819-2
  2. Tzec-Interián J A et al (2025), ‘Bioinformatics perspectives on transcriptomics: A comprehensive review of bulk and single-cell RNA sequencing analyses’, Quant Biol, 13(2), e78
  3. Crews C M et al (2024), ‘PROTACs: Current and Future Potential as a Revolutionary Therapeutic Modality’, Molecular Cancer Therapeutics, 23(4), 454463
  4. Bai L et al (2022), ‘A potent and selective small-molecule degrader of STAT3 achieves complete tumor regression in vivo’, Signal Transduct Target Ther, 7, 173
  5. Zhou Q et al (2024), ‘Antibody–Drug Conjugates for Cancer Therapy’, in Innovative Drug Delivery Systems, Singapore: Springer Nature Singapore, 239-267
  6. Li X et al (2024), ‘Development of an anti pGlu3 Aβ monoclonal antibody to target Aβ aggregates for immunotherapy of Alzheimer’s disease’, Molecular Neurodegeneration, 19(1), 55
  7. Sun B B et al (2023), ‘Integrative proteomic analyses across common cardiac diseases yield mechanistic insights and enhanced prediction’, Nature Cardiovascular Research, 2, 826-837
  8. Pino L K et al (2025), ‘A data ecosystem for quantitative proteomics’, Journal of Proteome Research, 24(5), 1533-1544
  9. Lau E et al (2025), ‘Chemical proteomics for target discovery’, Cell Chemical Biology, 32(6), 309-323
  10. Vora L K et al (2023), ‘Artificial intelligence in pharmaceutical technology and drug delivery design’, Pharmaceutics, 15(7), 1916a


Image

Veronica L DeFelice is director of Biologic Products at Sapio Sciences. Veronica DeFelice began her career in biologics discovery, identification and optimisation before expanding into end-to-end product development, from early research through commercialisation. Her expertise spans multiple domains, including bench science, intellectual property and product strategy, with a strong focus on antibody discovery and cell and gene therapy. Today, she specialises in integrating biologics with AI, combining technical innovation with strategic product vision. Veronica’s experience enables her to address complex challenges and drive cross-disciplinary progress in biologic sciences. Her accomplishments include a patent from Technion University on cardiac genetic patterns, therapeutic development roles at Senti Biosciences and Nurix Therapeutics, and a publication from the National Institute of Standards and Technology on 3D collagen scaffolding for cell communication.