Leveraging ligand-based and structure-based approaches for advancing drug discovery

SBDD approaches are applicable only when the 3D structure of the target (typically a protein) is available, often obtained experimentally through X-ray crystallography/cryo-electron microscopy or predicted using an artificial intelligence (AI) method such as AlphaFold or more conventional homology modelling. SBDD approaches predict how ligands interact with their targets and estimate binding affinities. However, caution must be exercised with predicted structures as inaccuracies can impact the reliability of SBDD methods. In contrast, LBDD strategies can be applied even when target structures are unavailable, which is common during early-stage drug discovery. Instead of relying on direct structural information, LBDD infers binding characteristics from known active molecules that bind and modulate or inhibit the function of the target. It often serves as a starting point when structural information is sparse, and its speed and scalability make it attractive in the early phases of hit identification.

Both SBDD and LBDD approaches have proven effective for virtual screening, improving the likelihood of selecting active compounds from large virtual libraries – often measured as enrichment, or the improvement in hit rate over a random selection. Accurate affinity prediction can differentiate between active and inactive compounds, or rank-order compounds to inform prioritisation before synthesis. These methods also provide critical insights into protein-ligand interactions, especially by accurately predicting binding poses, which can guide the design of compounds to improve binding affinity or to improve other properties without compromising binding affinity. Leveraging the complementary strengths of these methods can reduce the number of compounds that need to be synthesised and tested in the identification, optimisation and prioritisation of novel active compounds, saving time and costs.1

A core technique in SBDD is molecular docking, which predicts the bound poses (ie, the orientation and conformation) of ligand molecules within the binding pocket of the target and provides a ranking of their binding potential. This ranking is based on a docking score, which incorporates various interaction energies such as hydrophobic interactions, hydrogen bonds, Coulombic interactions and ligand strain. It is a valuable tool both in virtual screening and lead optimisation. Once a high-quality target structure becomes available, libraries of compounds can be subjected to docking, and the predicted bound poses can be scored and ranked to identify potential hits. During lead optimisation, docking helps rationalise structural modifications to improve the lead compound’s binding affinity, potency or other desirable properties.2,3 Most docking tools perform flexible ligand docking while often treating proteins as rigid – a simplification that does not account for binding pocket flexibility but allows for high-throughput screening.

Docking algorithms face challenges with large, flexible molecules such as macrocycles and peptides, primarily due to difficulties in effectively exploring all potential 3D conformations. Two key elements for successful docking are efficient and rigorous conformational sampling of ligands, and scoring functions that can accurately rank correct poses. Thorough conformational searches for macrocycles are critical (Figure 1), and algorithms that can generate high-quality starting conformations are more likely to succeed in predicting the correct bound poses.3 Molecular dynamics (MD) simulations are often used to further refine docking predictions by exploring the dynamic behaviour of protein-ligand complexes.5 This accounts for flexibility in both the ligand and the target protein, and provides insights into binding stability.

One of the most widely used LBDD techniques is similarity-based virtual screening. Its underlying assumption is that structurally similar molecules exhibit similar activities. New hits can be identified from large libraries by comparing candidate molecules against known actives, either using 2D (eg, molecular fingerprints) or 3D (eg, shape, H-bond donor/acceptor geometries, electrostatic) descriptors. Successful 3D similarity-based virtual screening requires accurate ligand structure alignment with known active molecules.7 Additionally, alignments of multiple known active compounds can help generate a meaningful binding hypothesis, which can then be used to screen large compound libraries.

Another fundamental LBDD technique is quantitative structure-activity relationship (QSAR) modelling, which uses statistical and machine learning (ML) methods to relate molecular descriptors (physicochemical properties, 2D fingerprints, substructure patterns, 3D shape, etc) to biological activity. Both 2D and 3D QSAR models are commonly used for virtual screening and to prioritise compounds before synthesis, enabling better use of available time and resources. However, traditional 2D QSAR models often require large data sets of active compounds and may struggle to extrapolate to novel chemical space.

Sequential integration
In one common workflow (Figure 2), large compound libraries are rapidly filtered with ligand-based screening based on 2D/3D similarity to known actives or via QSAR models. The most promising subset of compounds then undergoes structure-based techniques like docking and/or binding affinity predictions.10 Ligand-based screening narrows chemical space, enabling structure-guided approaches to be more focused. The initial ligand-based screen can identify novel scaffolds (scaffold hopping) early, offering chemically diverse starting points that can later be analysed through docking to optimise binding.

One example selects the top n% of compounds from both ligand-based similarity rankings (or predicted binding affinities) and structure-based docking scores (Figure 2) without requiring a consensus between them.11 While this may result in a broader set of candidates, it increases the likelihood of recovering potential actives. This parallelism helps mitigate the limitations inherent in each approach. For instance, when docking scores are hindered by inaccurate pose prediction or scoring functions, similarity-based methods may still recover actives based on known active ligand features.

Such ensembles, often derived from experimental co-crystal structures, are frequently accompanied by a diverse set of ligands with well-resolved binding poses that provide complementary insights and are a rich source of information for ligand-based similarity screening. Even in the absence of full structural characterisation for novel targets, the chemical features of these co-crystallised ligands can identify new actives through 2D or 3D similarity metrics or QSAR-based models. Similarly, 3D QSAR-based binding affinity predictions in combination with FEP-based affinity predictions have

Ultimately, the strength of combining SBDD and LBDD lies in their ability to capture complementary views of the drug-target interaction landscape. Structure-based methods provide atomic-level information about specific protein-ligand interactions (eg, hydrogen bonds and hydrophobic contacts), while ligand-based methods infer critical binding features from known active molecules and excel at pattern recognition and generalisation. In practical applications, integrating both approaches helps to prioritise structurally promising and chemically diverse compounds.

Challenges and conclusions

References

Sadybekov A V et al (2023), ‘Computational approaches streamlining drug discovery’, Nature, 616(7958), 673-685
Temml V et al (2021), ‘Structure-based molecular modeling in SAR analysis and lead optimization’, Computational and Structural Biotechnology Journal, 19, 1431-1444
Jain A N et al (2023), ‘Complex peptide macrocycle optimization: combining NMR restraints with conformational analysis to guide structure-based and ligand-based design’, Journal of Computer-Aided Molecular Design, 37, 519-535
Cleves A E et al (2024), ‘Structure-based pose prediction: Non-cognate docking extended to macrocyclic ligands’, Journal of Computer-Aided Molecular Design, 38, 33
Kapla J et al (2021), ‘Can molecular dynamics simulations improve the structural accuracy and virtual screening performance of GPCR models?’, PLoS Computational Biology, 17(5), e1008936
Ross G A et al (2023), ‘The maximal and current accuracy of rigorous protein-ligand binding free energy calculations’, Communications Chemistry, 6, 222
Cleves A E et al (2019), ‘Electrostatic-field and surface-shape similarity for virtual screening and pose prediction’, Journal of Computer-Aided Molecular Design, 33, 865-886
Cleves A E et al (2018), ‘Quantitative surface field analysis: learning causal models to predict ligand binding affinity and pose’, Journal of Computer-Aided Molecular Design, 32, 731-757
Cleves A E et al (2021), ‘Synergy and Complementarity between Focused Machine Learning and Physics-Based Simulation in Affinity Prediction’, Journal of Chemical Information and Modeling, 61, 12, 5948-5966
Khan S U et al (2019), ‘Sequential ligand-and structure-based virtual screening approach for the identification of potential G protein-coupled estrogen receptor-1 (GPER-1) modulators’, RSC Advances, 9, 2525-2538
Costa G et al (2019), ‘Novel natural non-nucleoside inhibitors of HIV-1 reverse transcriptase identified by shape-and structure-based virtual screening techniques’, European Journal of Medicinal Chemistry, 161, 1-10
Cleves A E et al (2020), ‘Structure-and Ligand-Based Virtual Screening on DUD-E+: Performance Dependence on Approximations to the Binding Pocket’, Journal of Chemical Information and Modeling, 60, 9, 4296-4310

Matthew Segall is CEO of Optibrium. He has an MSc in Computation from the University of Oxford and a PhD in theoretical physics from the University of Cambridge, both UK. Since 2001, Matthew has led teams developing predictive models and intuitive decision-support and visualisation tools for drug discovery. Matt has published over 40 peer-reviewed papers and book chapters on computational chemistry, cheminformatics and drug discovery. In 2009, he led a management buyout of the StarDrop business to found Optibrium.

Himani Tandon PhD is a senior scientist in the research division at Optibrium. Her work focuses on applying both 3D structure-based and ligand-based design strategies for lead discovery and optimisation. She also applies AI and ML techniques to model the properties of potential drug candidates, including both small molecules and macrocyclic peptides. Himani holds a PhD in Computational Structural Biology and Bioinformatics from the Indian Institute of Science, Bangalore, India, and completed her postdoctoral research at the MRC Laboratory of Molecular Biology in Cambridge, UK.

Innovations in Pharmaceutical Technology (IPT)

IPT provides a platform for cutting-edge ideas, concepts, and developments shaping the future of pharmaceutical R&D.

This is the footer menu.

info@samedanltd.com

www.samedanltd.com

IPT Archive

Leveraging ligand-based and structure-based approaches for advancing drug discovery

Innovations in Pharmaceutical Technology (IPT)

Categories

Social Footprints