Discovery and Development: Drug Design Differences
Computational drug discovery relies on two major approaches: ligand-based drug design and structure-based drug design. This article discusses the principles, techniques and applications of both methodologies, as well as a combined approach; providing insights into when to use each method and highlighting complementary insights provided by the two approaches
Matthew Segall and Himani Tandon at Optibrium
Computational approaches, including structure-based drug design (SBDD) and ligand-based drug design (LBDD), have long been crucial in the identification of promising candidates during the early stages of drug discovery. In recent years, advancements in computational power, algorithms and data availability have significantly enhanced the speed, accuracy and scalability of these methods, furthering their impact.
SBDD approaches are applicable only when the 3D structure of the target (typically a protein) is available, often obtained experimentally through X-ray crystallography/cryo-electron microscopy or predicted using an artificial intelligence (AI) method such as AlphaFold or more conventional homology modelling. SBDD approaches predict how ligands interact with their targets and estimate binding affinities. However, caution must be exercised with predicted structures as inaccuracies can impact the reliability of SBDD methods. In contrast, LBDD strategies can be applied even when target structures are unavailable, which is common during early-stage drug discovery. Instead of relying on direct structural information, LBDD infers binding characteristics from known active molecules that bind and modulate or inhibit the function of the target. It often serves as a starting point when structural information is sparse, and its speed and scalability make it attractive in the early phases of hit identification.
Figure 1: An example of conformational sampling for a macrocyclic peptide – Aureobasidin A. As the size and flexibility of a macrocycle increases, the number of accessible conformers grows exponentially due to the increased degrees of freedom. This makes exhaustive conformational sampling not only challenging, but also critical for accurate docking
Both SBDD and LBDD approaches have proven effective for virtual screening, improving the likelihood of selecting active compounds from large virtual libraries – often measured as enrichment, or the improvement in hit rate over a random selection. Accurate affinity prediction can differentiate between active and inactive compounds, or rank-order compounds to inform prioritisation before synthesis. These methods also provide critical insights into protein-ligand interactions, especially by accurately predicting binding poses, which can guide the design of compounds to improve binding affinity or to improve other properties without compromising binding affinity. Leveraging the complementary strengths of these methods can reduce the number of compounds that need to be synthesised and tested in the identification, optimisation and prioritisation of novel active compounds, saving time and costs.1
Examples of structure-based and ligand-based approaches
Structure-based drug design: molecular docking and free-energy perturbation (FEP)
A core technique in SBDD is molecular docking, which predicts the bound poses (ie, the orientation and conformation) of ligand molecules within the binding pocket of the target and provides a ranking of their binding potential. This ranking is based on a docking score, which incorporates various interaction energies such as hydrophobic interactions, hydrogen bonds, Coulombic interactions and ligand strain. It is a valuable tool both in virtual screening and lead optimisation. Once a high-quality target structure becomes available, libraries of compounds can be subjected to docking, and the predicted bound poses can be scored and ranked to identify potential hits. During lead optimisation, docking helps rationalise structural modifications to improve the lead compound’s binding affinity, potency or other desirable properties.2,3 Most docking tools perform flexible ligand docking while often treating proteins as rigid – a simplification that does not account for binding pocket flexibility but allows for high-throughput screening.
Despite docking being a mature field, validation strategies are still commonly limited to re-docking ligands into their cognate protein pocket. However, in real-world scenarios, the application of docking is always non-cognate, ie, it is used to predict the binding mode of compounds that differ structurally from those determined experimentally. Hence, docking protocols should be validated with non-cognate ligands to improve the accuracy and reliability of pose prediction.4
Docking algorithms face challenges with large, flexible molecules such as macrocycles and peptides, primarily due to difficulties in effectively exploring all potential 3D conformations. Two key elements for successful docking are efficient and rigorous conformational sampling of ligands, and scoring functions that can accurately rank correct poses. Thorough conformational searches for macrocycles are critical (Figure 1), and algorithms that can generate high-quality starting conformations are more likely to succeed in predicting the correct bound poses.3 Molecular dynamics (MD) simulations are often used to further refine docking predictions by exploring the dynamic behaviour of protein-ligand complexes.5 This accounts for flexibility in both the ligand and the target protein, and provides insights into binding stability.
Ligand-based drug design: similarity-based virtual screening and quantitative structure-activity relationships
One of the most widely used LBDD techniques is similarity-based virtual screening. Its underlying assumption is that structurally similar molecules exhibit similar activities. New hits can be identified from large libraries by comparing candidate molecules against known actives, either using 2D (eg, molecular fingerprints) or 3D (eg, shape, H-bond donor/acceptor geometries, electrostatic) descriptors. Successful 3D similarity-based virtual screening requires accurate ligand structure alignment with known active molecules.7 Additionally, alignments of multiple known active compounds can help generate a meaningful binding hypothesis, which can then be used to screen large compound libraries.
Another fundamental LBDD technique is quantitative structure-activity relationship (QSAR) modelling, which uses statistical and machine learning (ML) methods to relate molecular descriptors (physicochemical properties, 2D fingerprints, substructure patterns, 3D shape, etc) to biological activity. Both 2D and 3D QSAR models are commonly used for virtual screening and to prioritise compounds before synthesis, enabling better use of available time and resources. However, traditional 2D QSAR models often require large data sets of active compounds and may struggle to extrapolate to novel chemical space.
In contrast, recent advances in 3D QSAR methods, particularly those grounded in causal, physics-based representations of molecular interactions, have improved their ability to predict activity even in the absence of structural data.8 While SBDD methods like FEP are often limited to small structural changes around a known reference compound, 3D QSAR models can generalise well across chemically diverse ligands for a given target, despite being trained using limited structure-activity data.9
Combining structure-based and ligand-based approaches
While traditionally used independently, SBDD and LBDD offer a powerful combination, particularly in early-stage drug discovery where data may be incomplete or evolving. An integrated approach maximises the utility of both target-specific information and known ligand activity data, resulting in improved prediction of binding poses, better prioritisation of compounds and improved prediction of biological activity.4
Another highly accurate but computationally expensive method is FEP, which estimates binding free energies using thermodynamic cycles.6 FEP is largely used during lead optimisation, where small changes in chemical structure need to be quantitatively evaluated for their impact on binding affinity. However, it should be noted that FEP is limited to small perturbations to a reference structure, and predicting affinities for more structurally diverse compounds remains a significant challenge.
By leveraging structural data, these techniques allow for rational design, wherein a detailed understanding of protein-ligand interactions guides modifications to a molecule.
Sequential integration
In one common workflow (Figure 2), large compound libraries are rapidly filtered with ligand-based screening based on 2D/3D similarity to known actives or via QSAR models. The most promising subset of compounds then undergoes structure-based techniques like docking and/or binding affinity predictions.10 Ligand-based screening narrows chemical space, enabling structure-guided approaches to be more focused. The initial ligand-based screen can identify novel scaffolds (scaffold hopping) early, offering chemically diverse starting points that can later be analysed through docking to optimise binding.
Figure 2: Workflow illustrating the integration of structure-based and ligand-based approaches to enhance hit identification and prioritisation during early-stage drug discovery
Since structure-based methods are generally more computationally intensive than ligand-based approaches, this two-stage process improves the overall efficiency by applying the more resource-intensive methods only to a narrowed set of candidates. This approach works particularly well when time and resources are constrained, or when protein structural information emerges progressively.
Parallel or hybrid screening approaches
Advanced pipelines now employ parallel screening, running both structure-based and ligand-based methods independently but simultaneously on the same compound library. Each method generates its own ranking or scoring of compounds, and results are compared or combined in a consensus scoring framework.
Alternatively, hybrid scoring multiplies the compound ranks from each method to yield a unified rank order (Figure 2).12 This favours compounds ranked highly by both methods, thus prioritising specificity. By doing so, it reduces the number of candidates, potentially lowering sensitivity but increasing the confidence in selecting true positives.
Capturing complementary information
When employing docking for virtual screening or lead optimisation, using ensembles of protein pocket conformations offers a way to capture binding site flexibility, improving the robustness of pose prediction and subsequently enriching hit rates.12 A single structure may not fully capture the accessible binding space.
One example selects the top n% of compounds from both ligand-based similarity rankings (or predicted binding affinities) and structure-based docking scores (Figure 2) without requiring a consensus between them.11 While this may result in a broader set of candidates, it increases the likelihood of recovering potential actives. This parallelism helps mitigate the limitations inherent in each approach. For instance, when docking scores are hindered by inaccurate pose prediction or scoring functions, similarity-based methods may still recover actives based on known active ligand features.
Such ensembles, often derived from experimental co-crystal structures, are frequently accompanied by a diverse set of ligands with well-resolved binding poses that provide complementary insights and are a rich source of information for ligand-based similarity screening. Even in the absence of full structural characterisation for novel targets, the chemical features of these co-crystallised ligands can identify new actives through 2D or 3D similarity metrics or QSAR-based models. Similarly, 3D QSAR-based binding affinity predictions in combination with FEP-based affinity predictions have
demonstrated complementarity in both prediction error and applicability domains.9
Ultimately, the strength of combining SBDD and LBDD lies in their ability to capture complementary views of the drug-target interaction landscape. Structure-based methods provide atomic-level information about specific protein-ligand interactions (eg, hydrogen bonds and hydrophobic contacts), while ligand-based methods infer critical binding features from known active molecules and excel at pattern recognition and generalisation. In practical applications, integrating both approaches helps to prioritise structurally promising and chemically diverse compounds.
Challenges and conclusions
Despite their power, both SBDD and LBDD approaches come with inherent limitations. Structure-based methods are heavily dependent on the availability and quality of target structures. Ligand-based methods, in contrast, rely on the availability of sufficient example of active compounds, which can introduce bias and limit generalisability. Both approaches can be computationally intensive, particularly when using large chemical libraries or ensemble docking strategies.
Nevertheless, the complementary nature of these methods offers a compelling case for integration. By thoughtfully combining SBDD and LBDD, researchers can accelerate hit identification, improve prediction accuracy and, ultimately, enhance the efficiency of early-stage drug discovery. Looking ahead, further integration of ML with SBDD and LBDD is expected to dramatically accelerate virtual screening, enabling efficient exploration of chemical libraries containing billions of compounds.
References
Matthew Segall is CEO of Optibrium. He has an MSc in Computation from the University of Oxford and a PhD in theoretical physics from the University of Cambridge, both UK. Since 2001, Matthew has led teams developing predictive models and intuitive decision-support and visualisation tools for drug discovery. Matt has published over 40 peer-reviewed papers and book chapters on computational chemistry, cheminformatics and drug discovery. In 2009, he led a management buyout of the StarDrop business to found Optibrium.
Himani Tandon PhD is a senior scientist in the research division at Optibrium. Her work focuses on applying both 3D structure-based and ligand-based design strategies for lead discovery and optimisation. She also applies AI and ML techniques to model the properties of potential drug candidates, including both small molecules and macrocyclic peptides. Himani holds a PhD in Computational Structural Biology and Bioinformatics from the Indian Institute of Science, Bangalore, India, and completed her postdoctoral research at the MRC Laboratory of Molecular Biology in Cambridge, UK.