Accelerating Life Sciences Research and Delivery with Knowledge Graphs

From aiding in early drug discovery, to better understanding the connections between genes, proteins, cells and tissues, life sciences researchers are applying the power of graph databases to what were previously intractably hard problems. 1The context here is the sector’s adoption of new standards. SDTM (Study Data Tabulation Model), a new way of organising human clinical and nonclinical study data tabulations, and one of the required standards for data submission to the FDA (US) and PMDA (Japan), is of huge importance. Another important new standard is ADaM (Analysis Data Model), which defines data set and metadata standards for the efficient generation, replication and review of clinical trial statistical analyses. Finally, there’s CDISC 360, an initiative aimed at implementing standards as linked metadata, which provides the additional semantics needed to support metadata driven-automation across the end-to-end clinical research data life cycle.

Knowledge graphs are multidimensional and work on the basis that every data set is a connected element. Unlike traditional SQL databases, which store data in tables with fixed columns and rows, knowledge graphs store data as nodes (or entities) connected by edges (or relationships). It is in the power of those interconnections that the breakthrough insights lie. For example, in the Panama Papers the use of a knowledge graph made it possible to represent the complex network of offshore accounts, shell companies and individuals involved in the scandal.2

Because knowledge graphs are designed to represent complex data, they can be used in a wide range of applications beyond just financial investigations. For example, they can be used in biological science to represent the complex interrelationships and correlations between information about diseases, genes, the environment, diet, behaviour and other factors. The more these interrelationships and correlations can be analysed, the richer the knowledge and the faster important deductions can be made. Modern native graph databases have made it possible to perform mass-scale cross-comparisons involving billions of connections, which can help researchers identify patterns and connections that might not be immediately obvious; this has the potential to transform fields such as medicine.

Kannas explains that graphs help in drug discovery because, by default, chemical reactions create networks. When you have a reaction, the product of one reaction can enable other reactions, which is by default a kind of graph structure anyway. A data scientist can use path queries between two molecules and understand how the reactions are connected together. The information the scientist can glean from linked molecules helps train new lead prediction algorithms.

Reduce the Impacts of Restandardisation of Data

Other ambitions in the use of knowledge graphs at the firm include how risk-based monitoring might be done proactively instead of reactively. There is a Google-like question-and-answer system that allows users to quickly get any answer from their clinical trial data. Powerful AI algorithms developed for preclinical data sets can be applied to patient-level clinical data. And the chosen tool for managing the data set is a clinical knowledge graph that offers a patient-centric data model that integrates all domain silos and allows everyone to understand its clinical data.

Knowledge Graphs’ R&D Potential

Knowledge graphs can be especially useful in the context of clinical trials, particularly for rare conditions where small patient populations can make it difficult to achieve statistical significance. As some of the growing body of work in diabetes research shows, knowledge graphs can help in phenotype mapping, where researchers are trying to understand the relationship between different phenotypes (observable characteristics or traits) in humans and animals. This can be particularly challenging when the clinical parameters and observations used to measure these phenotypes are not directly comparable between species.

As the clinical opportunity for pharma grows ever more rewarding, yet more demanding and complex in scope, knowledge graphs are potentially transformative. Understanding the value of relationships between data is every bit as important as understanding what those individual data points tell us in their own right. Without the ability to mine those correlations for new insights, companies will lack vital context and find themselves compromised in their ability to make accurate advanced predictions.

Dr Alexander Jarasch is the technical consultant for Pharma and Life Sciences at native graph database leader Neo4j. He was previously head of Data Management and Knowledge Management at Germany’s National Center for Diabetes Research (DZD). He is a visionary speaker on the future of clinical investigation, plus AI and data management in the pharma and healthcare space, in particular the potential to advance pharmaceutical analytics and unlock terabytes of hard-to-parse research/trial data by revealing data relationships for better predictive accuracy.