The Path to Smarter Digital Health

The increasing complexity and diversity of the information management needed for drug development and digital health require smarter and more flexible automation. Data collection in CROs has increased exponentially in size and diversity since the onset of digital data collection in the early 2000s. In addition to traditional types of data, the drug development value chain now routinely incorporates rich sources of information, such as genomic, proteomic, and metabolomic data, as well as diverse sources of information, such as electronic health records (EHR), real-world evidence (RWE), and data collected from wearables and other personal devices, all of which add to an ever-growing, complex ecosystem. The well-established and deepening culture of data-driven operational decision-making also demands seamless and intelligent integration across disparate operational systems to achieve higher levels of transparency and accuracy. Traditional documentation is no longer sustainable for increased transparency of regulatory standards and increased accuracy for trial participant safety. In response to this increase in complexity and scale, drug development and digital health require smarter, more flexible solutions based on artificial intelligence (AI) to ensure that quality data can be accessed and interpreted with ease, alleviating unnecessary burdens on clinical investigators and clinical trial participants and improving their engagement with clinical trials.

Conventional digital storage and retrieval methods continue to provide fundamental stability in managing trial data. Extract/transfer/load integration and data warehousing provide backbone architecture for many operational needs. However, the increasing depth and diversity of healthcare information and the expectation for increased speed, flexibility, and insight require new approaches that benefit from a more layered, AI-enabled information ecosystem. Novel AI approaches are already underway across a range of service areas, but there are two main challenges to overcome for AI innovation to mature in our industry. Firstly, how can we enable seamless and flexible data flow in real time, and do it securely in full compliance with data privacy regulations? Secondly, and perhaps more critically, how can we build or find the necessary connections of meaning between disparate data sources in a truly flexible and business-responsive way? We need both solutions to unleash the full potential of AI.

Biometrics and clinical data management offer a focal point where many data issues and opportunities are ripe for innovation. The primacy of the clinical trial protocol, the increasing scope and sophistication in the design of clinical trials, and the breadth and depth of data collected all mean that the biometrics service area faces significant challenges, but also holds fantastic opportunities. The widespread adoption of electronic data capture (EDC) systems in the mid-2000s transformed clinical data management by centralising the collection of biometric data. The parallel growth in other electronic, but non-EDC sources, such as interactive response technologies, lab systems, specialised clinical measurements, clinical outcome assessments, and personal devices, has created new complexities and opportunities. We need to process all these data types as efficiently as possible, but, more importantly, get maximum insight. To realise those opportunities, we need to be able to relate meaning (semantics) to achieve smarter data integration that yields critical analytics beyond data silos in real time.

AI is already playing an increasing role in many clinical data applications. We need to clean the data on an ongoing basis and reconcile values across clinical data domains and EDC and non-EDC sources. We need to reduce the noise generated by those activities, and not overwhelm busy clinical investigators with unnecessary queries. We need to focus on patients and ensure that we are doing all we can to better enable their participation in clinical trials and also protect them by detecting, anticipating, and mitigating risks, whether that’s to do with safety or protocol compliance. We need innovative solutions for delivering decentralised trials, creating synthetic control arms where appropriate, conducting virtual trials, developing digital endpoints, predicting placebo effects, combining historical information from EHRs with imaging, genetic, and molecular test data to achieve highly targeted oncology treatments. All of these advances can reduce the burden on patients participating in clinical trials.

Dealing with the physical interoperability of data first, we have to make the flow of data independent of semantic constraints. Using a layered and scalable architecture that combines serverless cloud services, service-oriented architecture, representational state transfer application programming interface, and NoSQL (not only structured query language), we can ingest any data source, from highly structured tabular data to completely unstructured content, stage ingested data in an exchange hub for onward distribution, consolidate data for quality management and analysis, make data available for consumption by analytics and other informatics applications, and expose data to robotic process automation (RPA) and AI processes. The principle is that data flows need to be automated, real-time, centralised, accessible, discoverable, explorable, and scalable.

This serverless functionality set can drive the unbounded rapid roll-out of almost any point-to-point data delivery need across the full range of good variable practice systems, such as clinical trial management systems, electronic trial master file systems, clinical data management systems, regulatory information management systems, operational data warehouses, study start-up (SSU) systems, supplier and vendor management systems, biomarker and ‘omics’ data repositories, master data management systems, and integrated healthcare delivery networks. The latter is just one way to connect to the hundreds of remote devices increasingly used to provide patient solutions in a clinical research setting. This matters because clinical trials are becoming more decentralised and more patient centric, and there is a growing use of RWE and EHR data arriving in real time for processing and decision-making.

The semantic challenge is more difficult. To connect meaning across an ever-expanding ecosystem of clinical trial data and utilise AI to its fullest potential, you could try to standardise everything into a unified semantic layer, but that approach would be too restrictive and labour intensive. Instead, we can turn to AI techniques, such as knowledge graphs, which use concept schemas, or ontologies, to interrelate information points, or semantic meaning, in graph networks. The connections don’t have to be exact, and can be fuzzy. Semantic resilience is achieved by the web-like nature of the graph. This is how popular search engines and social media platforms work, and semantic and ontological models are already taking shape in the latest wave of AI innovation in the life science industry.

Knowledge graphs can be combined with other AI approaches, such as transfer learning, reinforcement learning, natural language processing, and neural nets to automate the discovery of semantic and ontological layers. Knowledge graphs can find natural connections that span domains and build semantic and ontological links between data that would otherwise remain siloed, unlocking connections across superordinate and subordinate concepts. In essence, a collection of descriptions can be stored and connected through adjoining entities, or nodes, to form a conceptual framework that is flexible and organic. This framework can provide a type of map that allows you to identify where you are and how to get to less, or more, detail. Each time a new description is created, it can be joined with the rest of the connected concepts, building up improved recommendations or mapping systems. These methods become even more powerful when combined with RPA, optical character recognition, and advanced data hubs.

There is, however, more to answering the problem than just more technology. Currently, no AI can provide a complete, error-free replacement of the manual judgement process currently in use. However, as we mentioned earlier, the aim is not to create a perfectly connected and standard output. The aim for implementing these technologies with better outcomes is to combine an AI-powered semantic layer with a human-in-the-loop approach that allows experts to freely and intuitively interject their knowledge straight into the system. The result is a balance between the connectivity provided by the model and the adjusted recommendation provided by the experts.

With a system that is highly aware and interactive, a real-time approach towards collaboration across all members in the trial ecosystem can be achieved. By increasing collaboration across experts along a single, flexible platform, higher transparency and accuracy can be achieved at all levels of trial activity. More automated and intelligent data quality checks, including the assessment of drug-drug and drug-disease interactions, can enable clinical data scientists to develop algorithms for a host of use cases, from risk detection to digital endpoint creation, allowing experts across the trial landscape to collaborate in a higher impact environment.

Other critical trial areas, such as SSU, already leverage flexible data infrastructure with AI techniques, but offer promising opportunities for making better use of semantic and ontological methods to unlock insights for optimising site selections for each trial. Even before a protocol is finalised, SSU begins an intricate process of feasibility planning, and site identification, before moving on to the preparation of sites for first participant enrolment. During this time, you may want to combine multiple sources of information and apply AI to create a strong feasibility plan or site shortlist. You may want to expedite the collection of essential paperwork, while minimising the often tedious back-and-forth communication with sites and regulatory authorities. When you can collect, centralise, and distribute the data and content, and apply AI and ontological cross-referencing, you jump to a whole new level of efficiency and intelligent decision-making. You might not automate 100% of the process, but you will likely reach 70-80% automation and provide improved insight and transparency. Such improvements make a big difference to people, who can then use their brains in a more focused and satisfying way.

Although much of the technology mentioned might refer to a type of connected intelligence, the intention is to not stop at connection alone. The full intention is to provide seamlessly integrated clinical research as a standard of care option for patients and doctors, by way of interactive and intelligent platforms that freely allow medical assistance end to end in a timely matter. For many patients, the timing of treatment can mean the difference between a normal life and disability, or even death. For every patient, technology should hold itself to the highest aims of enhancing the best of traditions and innovations in medicine, making it accessible to everyone right at the time each patient needs it, and right at the time it is effective and safe.

With a framework for semantic automation, it is possible to imagine a near future where medical research is no longer a burden, but a natural integration of life and science that builds upon the course of everyday healthcare. The integrated interaction between experts and technology, particularly AI, can free doctors to provide greater personal care for their patients, and allow those of us who manage clinical research activities to efficiently provide the services needed to deliver the next vaccine or treatment just in time.

Michael Phillips has worked for ICON for around 10 years, and currently leads the Innovation Data Science Team. He has over 20 years’ experience in IT, business intelligence, data analytics, and eClinical innovation, accumulating strong experience in team leadership, business partnering, solution design, and customer engagement. He has broader experience in academic biomedical research, with a PhD in drug metabolism enzymology, and spent 10 years working in management roles in biomedical publishing. Michael is the author of TIBCO Spotfire – AComprehensive Primer.

The Path to Smarter Digital Health

How Do We Achieve These Goals?