AI & ML

How AI&ML are changing the landscape of pharma technology

Richard Lee at ACD/Labs

From target identification and molecular design to process optimisation and quality control, artificial intelligence and machine learning are being positioned as transformative enablers across the pharma landscape. However, as adoption accelerates, it is becoming clear that the impact of these technologies is not determined solely by model sophistication or computational power. Rather, their effectiveness is fundamentally constrained by the availability, quality and structure of the underlying data

Artificial intelligence and machine learning (AI&ML) are no longer emerging concepts within the pharma industry; they are increasingly sought-after technologies across R&D and manufacturing operations. Advances in computational power, algorithmic sophistication and data availability have positioned AI&ML as powerful tools for augmenting the discovery process, improving operational efficiency and enhancing decision-making. Applications now extend from early-stage target identification and molecular design through clinical development, process optimisation and quality assurance. Yet, as enthusiasm for AI-enabled transformation continues to grow, it is becoming evident that the true determinants of success lie not only in models and algorithms, but also in the underlying data foundations.

Beyond discovery, AI&ML are being deployed to improve experimental design and execution. Predictive models can inform experimental parameters, optimise assay conditions, and support iterative experiment and process optimisation. In manufacturing, AI can support process control, predictive maintenance and real-time quality monitoring, aligning closely with broader initiatives such as quality by design and continuous manufacturing adjustments.

Expanding AI&ML applications across the pharma life cycle

In drug discovery, AI&ML have gained significant traction in areas such as target identification, hit discovery and lead optimisation. ML models are increasingly applied to large biological and chemical data sets to uncover insights that would be difficult or impossible to detect through traditional analysis. These approaches can help prioritise targets, predict molecular properties and guide medicinal chemistry efforts, reducing both time and cost in early-stage research.

While these applications vary widely in scope and maturity, they share a common dependency: access to reliable; well-curated; and context-rich data. Without this foundation, even the most advanced AI models risk producing outputs that are difficult to interpret, reproduce and, ultimately, trust.

The pharma sector is inherently data-rich, and also highly data-fragmented. Experimental results, analytical measurements, process parameters and contextual metadata are generated continuously across laboratories, pilot plants and manufacturing facilities. While AI&ML promise to extract value from this information at scale, their effectiveness is fundamentally constrained by the availability, quality, structure and interoperability of the data they consume. As a result, the industry is at a critical inflection point: realising the benefits of AI&ML requires systematic streamlining of data assembly and standardisation enabled by automation.

The data reality behind AI ambitions

Despite decades of investment in lab automation, informatics platforms and digital transformation initiatives, pharma data ecosystems remain highly heterogeneous. Data is generated by a wide array of analytical instruments such as liquid chromatography, mass spectrometry, nuclear magnetic resonance and optical spectroscopy systems, often using proprietary formats and vendor-specific software. This data is further contextualised by experimental protocols, sample metadata, instrument configurations and operator inputs, which may be captured inconsistently or not at all.

In many organisations, critical experimental data remains siloed within individual systems, projects or departments. Manual data handling, file-based workflows and ad hoc data transformations are still common practice, particularly in research environments. While such processes may suffice for localised analysis, they present significant barriers to scalable AI&ML deployment. Models trained on incomplete, poorly contextualised or inconsistently formatted data are unlikely to generate effective reliable insights.

Image

Moreover, AI&ML models are inherently sensitive to biases, noise and inconsistencies in their training data (ie, incomplete data). In pharma contexts, where decisions can have substantial scientific, financial and patient safety implications, these risks are particularly acute.

The challenge, therefore, is not merely one of data volume, but of data readiness.

Data assembly as a foundational capability

Digitalised data assembly – the process of bringing together diverse data sets into a coherent, standardised and machine-consumable form – is emerging as a critical enabler of AI&ML in pharma tech. This concept extends beyond simple data aggregation. In analytical workflows, for example, effective data assembly requires the integration of raw analytical data, processed results, metadata and experimental context into unified representations that preserve scientific meaning while supporting computational workflows.

At a practical level, data assembly involves automated data capture directly from instruments, normalisation across formats, alignment with standardised data models and enrichment with contextual information such as experimental conditions and parameters. Importantly, this process must be reproducible and scalable, minimising manual intervention and reducing the risk of transcription errors or data loss.

Historically, data assembly has been treated as a downstream or ancillary activity, often addressed late in the research or development process. In the context of AI&ML, however, it must be recognised as a primary concern. Without systematic approaches to data assembly, organisations risk building AI initiatives on unstable foundations, limiting their long-term impact and scalability.

Automation and informatics infrastructure

The growing emphasis on data assembly has significant implications for pharma informatics infrastructure. Traditional systems designed primarily for data storage, visualisation or reporting are often insufficient to support AI-ready workflows. Instead, there is an increasing need for platforms that can orchestrate automated data flows, enforce data standards and integrate seamlessly with both lab systems and computational tools.

Automation plays a central role in this evolution. Automated data ingestion from analytical instruments reduces latency and ensures that raw data is captured consistently and completely, including all metadata associated with the analytical conditions. Workflow automation enables standardised processing, transformation and validation steps to be applied uniformly across experiments and projects. When combined with robust metadata management, these capabilities create a foundation for generating high-quality data sets suitable for AI&ML training.

Importantly, such infrastructure must be flexible enough to accommodate diverse experimental modalities and evolving analytical techniques. As new instruments, assays and data types are introduced, data assembly systems must adapt without requiring extensive custom development. This flexibility is particularly important in research environments, where innovation and methodological diversity are core strengths.

Interoperability and standardisation challenges

Standardisation is often cited as a prerequisite for effective data integration, yet implementing it in practice remains challenging. The pharma industry encompasses a wide range of data standards, ontologies, data formats and variation of data structures, many of which address specific domains (ie, analytical technique, low versus high resolution etc). While these formats provide valuable frameworks to their respective systems, industry-wide adoption is unachievable and large gaps remain, particularly at the level of raw analytical data and experimental context.

Interoperability between systems, both within organisations and across external partners, is another persistent challenge. Collaborative research, contract manufacturing and regulatory submissions all require data to move seamlessly between disparate platforms. AI&ML further magnify this need, as models may be trained on data originating from multiple sources and deployed across different operational environments (different models may be generated based on the same data sources, depending on the need or application).

Addressing these challenges requires not only technical solutions, but also organisational alignment and governance. Decisions about data models, standards adoption and system integration must be informed by both scientific requirements and long-term strategic objectives. In this context, data assembly systems can serve as intermediaries, translating between heterogeneous sources while enforcing consistent internal representations.

Trust, traceability and reproducibility

Beyond enabling AI&ML initiatives, standardisation achieved through automation delivers broader and more enduring value to pharma organisations. Automated, standardised data workflows improve overall data governance, enhance experimental reproducibility and strengthen confidence in the reuse of data for secondary analyses. When data is consistently captured and contextualised at the point of generation, downstream users retain a clear understanding of the experimental conditions and intent under which the data was produced, enabling more appropriate interpretation and application.

Robust, automated data assembly pipelines play a central role in achieving these outcomes by establishing end-to-end traceability from raw analytical data through to derived results and model-ready data sets. Automation reduces variability introduced by manual processing, and ensures consistent application of data transformations across experiments and projects. When coupled with comprehensive audit trails and version control, these pipelines support both internal validation activities and external regulatory review.

Although formal guidance on the use of AI&ML in pharma contexts continues to evolve, long-standing principles – such as data integrity, traceability, reproducibility and data provenance from source to final data destination – remain fundamental. Investments in data assembly and automation extend well beyond enabling advanced analytics; they underpin compliance, quality assurance, and long-term trust in digital and AI-enabled workflows.

Moving towards AI-enabled, data-centric operations

The pharma industry’s experience with AI&ML to date suggests that transformative impact is achievable, but not inevitable. Organisations that focus narrowly on algorithm development without addressing underlying data challenges are likely to encounter diminishing returns. Conversely, those that invest in data-centric infrastructure – emphasising automated data capture, standardised assembly and contextual integrity – are better positioned to scale AI&ML across the enterprise.

This shift has implications for how technology investments are prioritised. Rather than viewing data preparation as a preliminary or supporting activity, it must be recognised as a strategic capability. Informatics platforms that enable self-service automation, low-code workflow configuration, and seamless integration with analytical and computational tools can play a pivotal role in this transition.

Ultimately, AI&ML are catalysts rather than endpoints. Their ability to change the pharma technology landscape depends on the strength of the systems that support them and the robustness of the data they are built upon. By addressing the persistent challenges of data heterogeneity, assembly and standardisation, the industry can move beyond isolated and proof-of-concept AI use cases towards truly integrated, AI-augmented workflows that deliver reliable, reproducible, and meaningful scientific and operational outcomes.

Image

Richard Lee is director of Core Technology and Capabilities at ACD/Labs. He obtained his PhD from McMaster University, Canada, where he focused on strategies for metabolite identification and metabolomics studies. From McMaster, he moved on as a scientist at the Centre of Probe Development and Commercialization in Hamilton, Ontario, Canada, which developed radiopharmaceuticals as imaging agents and therapeutics for oncology. He has been with ACD/Labs since 2012, and during this time has held responsibility for the inception and development of software to support metabolite identification.