How AI&ML are changing the landscape of pharma technology

From target identification and molecular design to process optimisation and quality control, artificial intelligence and machine learning are being positioned as transformative enablers across the pharma landscape. However, as adoption accelerates, it is becoming clear that the impact of these technologies is not determined solely by model sophistication or computational power. Rather, their effectiveness is fundamentally constrained by the availability, quality and structure of the underlying data
Richard Lee at ACD/Labs

Artificial intelligence and machine learning (AI&ML) are no longer emerging concepts within the pharma industry; they are increasingly sought-after technologies across R&D and manufacturing operations. Advances in computational power, algorithmic sophistication and data availability have positioned AI&ML as powerful tools for augmenting the discovery process, improving operational efficiency and enhancing decision-making. Applications now extend from early-stage target identification and molecular design through clinical development, process optimisation and quality assurance. Yet, as enthusiasm for AI-enabled transformation continues to grow, it is becoming evident that the true determinants of success lie not only in models and algorithms, but also in the underlying data foundations.

The pharma sector is inherently data-rich, and also highly data-fragmented. Experimental results, analytical measurements, process parameters and contextual metadata are generated continuously across laboratories, pilot plants and manufacturing facilities. While AI&ML promise to extract value from this information at scale, their effectiveness is fundamentally constrained by the availability, quality, structure and interoperability of the data they consume. As a result, the industry is at a critical inflection point: realising the benefits of AI&ML requires systematic streamlining of data assembly and standardisation enabled by automation.

Expanding AI&ML applications across the pharma life cycle

In drug discovery, AI&ML have gained significant traction in areas such as target identification, hit discovery and lead optimisation. ML models are increasingly applied to large biological and chemical data sets to uncover insights that would be difficult or impossible to detect through traditional analysis. These approaches can help prioritise targets, predict molecular properties and guide medicinal chemistry efforts, reducing both time and cost in early-stage research.

Beyond discovery, AI&ML are being deployed to improve experimental design and execution. Predictive models can inform experimental parameters, optimise assay conditions, and support iterative experiment and process optimisation. In manufacturing, AI can support process control, predictive maintenance and real-time quality monitoring, aligning closely with broader initiatives such as quality by design and continuous manufacturing adjustments.

Despite decades of investment in lab automation, informatics platforms and digital transformation initiatives, pharma data ecosystems remain highly heterogeneous. Data is generated by a wide array of analytical instruments such as liquid chromatography, mass spectrometry, nuclear magnetic resonance and optical spectroscopy systems, often using proprietary formats and vendor-specific software. This data is further contextualised by experimental protocols, sample metadata, instrument configurations and operator inputs, which may be captured inconsistently or not at all.

In many organisations, critical experimental data remains siloed within individual systems, projects or departments. Manual data handling, file-based workflows and ad hoc data transformations are still common practice, particularly in research environments. While such processes may suffice for localised analysis, they present significant barriers to scalable AI&ML deployment. Models trained on incomplete, poorly contextualised or inconsistently formatted data are unlikely to generate effective reliable insights.

Digitalised data assembly – the process of bringing together diverse data sets into a coherent, standardised and machine-consumable form – is emerging as a critical enabler of AI&ML in pharma tech. This concept extends beyond simple data aggregation. In analytical workflows, for example, effective data assembly requires the integration of raw analytical data, processed results, metadata and experimental context into unified representations that preserve scientific meaning while supporting computational workflows.

Automation plays a central role in this evolution. Automated data ingestion from analytical instruments reduces latency and ensures that raw data is captured consistently and completely, including all metadata associated with the analytical conditions. Workflow automation enables standardised processing, transformation and validation steps to be applied uniformly across experiments and projects. When combined with robust metadata management, these capabilities create a foundation for generating high-quality data sets suitable for AI&ML training.

Standardisation is often cited as a prerequisite for effective data integration, yet implementing it in practice remains challenging. The pharma industry encompasses a wide range of data standards, ontologies, data formats and variation of data structures, many of which address specific domains (ie, analytical technique, low versus high resolution etc). While these formats provide valuable frameworks to their respective systems, industry-wide adoption is unachievable and large gaps remain, particularly at the level of raw analytical data and experimental context.

Interoperability between systems, both within organisations and across external partners, is another persistent challenge. Collaborative research, contract manufacturing and regulatory submissions all require data to move seamlessly between disparate platforms. AI&ML further magnify this need, as models may be trained on data originating from multiple sources and deployed across different operational environments (different models may be generated based on the same data sources, depending on the need or application).

Beyond enabling AI&ML initiatives, standardisation achieved through automation delivers broader and more enduring value to pharma organisations. Automated, standardised data workflows improve overall data governance, enhance experimental reproducibility and strengthen confidence in the reuse of data for secondary analyses. When data is consistently captured and contextualised at the point of generation, downstream users retain a clear understanding of the experimental conditions and intent under which the data was produced, enabling more appropriate interpretation and application.

Robust, automated data assembly pipelines play a central role in achieving these outcomes by establishing end-to-end traceability from raw analytical data through to derived results and model-ready data sets. Automation reduces variability introduced by manual processing, and ensures consistent application of data transformations across experiments and projects. When coupled with comprehensive audit trails and version control, these pipelines support both internal validation activities and external regulatory review.

Ultimately, AI&ML are catalysts rather than endpoints. Their ability to change the pharma technology landscape depends on the strength of the systems that support them and the robustness of the data they are built upon. By addressing the persistent challenges of data heterogeneity, assembly and standardisation, the industry can move beyond isolated and proof-of-concept AI use cases towards truly integrated, AI-augmented workflows that deliver reliable, reproducible, and meaningful scientific and operational outcomes.

Richard Lee is director of Core Technology and Capabilities at ACD/Labs. He obtained his PhD from McMaster University, Canada, where he focused on strategies for metabolite identification and metabolomics studies. From McMaster, he moved on as a scientist at the Centre of Probe Development and Commercialization in Hamilton, Ontario, Canada, which developed radiopharmaceuticals as imaging agents and therapeutics for oncology. He has been with ACD/Labs since 2012, and during this time has held responsibility for the inception and development of software to support metabolite identification.