Conexiant
Login
  • The Analytical Scientist
  • The Cannabis Scientist
  • The Medicine Maker
  • The Ophthalmologist
  • The Pathologist
  • The Traditional Scientist
The Medicine Maker
  • Explore

    Explore

    • Latest
    • Features
    • Interviews
    • Business & Trends
    • Technology & Manufacturing
    • Product Profiles
    • White Papers

    Featured Topics

    • Biopharma
    • Small Molecules
    • Cell & Gene
    • Future of Pharma

    Issues

    • Latest Issue
    • Archive
    • Cell and Gene Therapy Supplement
  • Topics

    Topics

    • Drug Discovery
    • Development & Clinical
    • Formulation
    • Drug Delivery
    • Bioprocessing
    • Small Molecules
    • Cell and Gene
    • Facilities & Equipment
    • Outsourcing
    • Packaging
    • Supply Chain
    • Regulation & Standards
  • News & Blogs

    News & Blogs

    • Industry News
    • Research News
    • Blogs
  • Events
    • Live Events
    • Webinars
  • Community & Awards

    Community & Awards

    • Power List
    • Sitting Down With
    • Innovation Awards
    • Company of the Year Awards
    • Authors & Contributors
  • Multimedia
    • Video
    • Podcasts
    • eBooks
Subscribe
Subscribe
The Medicine Maker / Issues / 2024 / Articles / Jul / Predictive AI in Drug Discovery: Five Steps to Success
Discovery & Development Drug Discovery Digital Technologies Trends & Forecasts Small Molecules Research News Facilities & Equipment Business & Trends Small Molecules

Predictive AI in Drug Discovery: Five Steps to Success

The use of AI in small molecules drug discovery is driving the sector forwards in big ways – but there are big challenges too. Here are five steps to success

By Mirit Eldor 07/10/2024 5 min read

Share

The preclinical phase of drug discovery is the most time intensive stage of the R&D lifecycle – taking up to six years and accounting for more than 40 percent of total drug development costs. To reduce the billions spent on preclinical drug development, faster, more efficient R&D workflows must be a priority across the industry. So it’s no surprise that pharmaceutical and biotechnology companies are looking to use machine learning (ML) to revolutionize R&D and AI to generate and validate small molecule drug discovery pipelines. 

Research organizations that successfully deploy AI are already gaining a competitive edge. There is emerging evidence that these organizations get through preclinical stages quicker and cheaper than the traditional approach, with savings of around 30 percent of time and cost. The approach is already gaining traction; one study by the Boston Consulting Group found that biotech companies that have adopted an AI-first approach, “…have more than 150 small molecule drugs in discovery and more than 15 already in clinical trials.”

Predictive AI is one AI approach that many pharmaceutical and biotech companies are exploring today. Here are five steps that research leaders should follow to realize success.

1. Identify the right use cases
 

Before investing in predictive AI, research leaders must define the problems, or use cases, that they want to tackle. Typically, the best applications for predictive AI are discrete tasks and processes where measurable, tangible gains can be achieved. In early drug discovery, examples of predictive AI use cases include predicting the 3D structure of a protein, relationships between molecules based on their chemical structure, and drug-target interactions. 

In small molecule discovery, predictive retrosynthesis combines high-quality reaction data with AI to find structural or chemical patterns that correlate with specific compound properties and accelerate synthesis planning of novel molecular entities. The potential benefits of predictive retrosynthesis over traditional approaches are significant; routes can be generated for novel compounds in minutes rather than weeks.

2. Source accurate and high-quality data
 

The nuance of research questions in drug discovery demands a level of precision that requires high-quality, verified training data. Without accurate and high-quality data, researchers will lack confidence in predictive AI outcomes. For predictive models to work, researchers will want to include data from multiple sources in addition to their internal data. This will typically include data from scientific literature, plus other databases containing patent data, regulatory data, clinical trials data, safety data, and data from patient records. 

For example, a predictive AI chemistry model requires a breadth of chemistry inputs that includes not only proprietary data and data on failed reactions, but also published literature. A predictive model that is fine tuned using incomplete data will produce inferior results whose shortcomings may not be immediately identified, leading to expensive incorrect decisions. 

3. Prepare and structure the data
 

Once data is acquired it must be structured to power predictive AI successfully. Much of the data R&D organizations source are not AI-ready; datasets are siloed and stored in myriad formats with insufficient metadata, making it difficult to retrieve and use in predictive AI models. Standardizing and structuring datasets via the application of ontologies is a critical step. 

Ontologies are human-generated, machine-readable descriptions of categories. They standardize data against an agreed vocabulary, providing a shared language across an organization. Vocabularies can include terms specific to an organization – such as product names – alongside industry recognized concepts and terms. Ontologies define semantic relationships to other classes and capture synonyms, which is essential where there are multiple ways to describe the same entity in scientific literature and other datasets. For example, the gene PSEN1 can also be referred to as PSNL1 or Presenilin-1.

4. Semantic enrichment
 

To extract insights, datasets must be enriched and annotated. Semantic enrichment is a key step that unlocks the full potential of data in structured and unstructured, public and proprietary, datasets. It transforms text into clean, contextualized data, free from ambiguities and synonyms, through annotation, tagging and adding metadata. It works by employing text analytics to extract key words, concepts, and terms for predictive models, and harmonizes synonymous terms for better accuracy. 

Data harmonization is especially important when using databases from multiple sources as technical terms or abbreviations are often used. For example, sophisticated semantic enrichment software can identify and extract relevant terms or patterns in text and harmonize synonyms, such as “heart attack” and “myocardial infarction”, so they are identified as the same entity by a predictive model. This eliminates “noise” and ensures predictive AI models are underpinned by high-quality, enriched data.  

5. Domain specificity
 

Structuring data for predictive AI through ontologies and applying semantic enrichment methods is highly specialized work that requires expert understanding of the domain under investigation. General purpose AI models developed by technology companies have utility in broad areas such as marketing and operations, but scientific research represents a set of niche challenges that necessitates domain expertise. 

Few biopharma companies today will have the right mix of skills needed for tasks such as creating ontologies in-house. And though they are experts in their scientific field, researchers lack the technological capabilities required. Best positioned to solve this challenge are data scientists who can couple technology skills with scientific domain expertise. Such data scientists can bring an understanding of the context of questions asked in relation to the data available. They can further ensure ontologies and vocabularies are built so that predictive AI models return relevant results, and no essential data is missed. 

The world is in agreement: AI will be a game-changer for every industry. For those working in preclinical drug discovery, the opportunity is huge – but so is the challenge. To accelerate drug discovery to meet the medical needs of patients around the world, pharma and biotech organizations need to bring together data, technology, and expertise. When these elements converge, AI can serve as a valuable support tool for researchers to usher in a new era of drug discovery.    

Newsletters

Receive the latest analytical science news, personalities, education, and career development – weekly to your inbox.

Newsletter Signup Image

About the Author(s)

Mirit Eldor

Managing Director, Life Sciences Solutions, Elsevier

More Articles by Mirit Eldor

False

Advertisement

Recommended

False

Related Content

Understanding the H5N1 Threat
Vaccines Drug Discovery
Understanding the H5N1 Threat

February 3, 2025

4 min read

With new cases of avian influenza appearing, what does this mean for global health and what are drug developers doing about it?

Battle of the Superbugs
Drug Discovery Technology and Equipment
Battle of the Superbugs

December 1, 2014

0 min read

Can phage endolysins revolutionize the way bacterial infections are treated – and prevent drug resistance?

Antibiotics: Going With the Flow
Drug Discovery Small Molecules
Antibiotics: Going With the Flow

April 2, 2025

2 min read

How fluid flow through the body can affect the ways in which antibiotics work.

Combatting the Side Effects of Treatments for Parkinson’s
Drug Discovery Small Molecules
Combatting the Side Effects of Treatments for Parkinson’s

April 7, 2025

4 min read

Celon Pharma CEO hopes their new compound could be a potential breakthrough for Parkinson’s patients.

The Medicine Maker
Subscribe

About

  • About Us
  • Work at Conexiant Europe
  • Terms and Conditions
  • Privacy Policy
  • Advertise With Us
  • Contact Us

Copyright © 2025 Texere Publishing Limited (trading as Conexiant), with registered number 08113419 whose registered office is at Booths No. 1, Booths Park, Chelford Road, Knutsford, England, WA16 8GS.