Graphs and graph data science in drug discovery

When considering a complex problem, we need to be efficient when choosing what to explore and what to ignore.

This is true for all kinds of data analysis but is especially true in drug discovery wherein 90% of clinical drug development fails and only 6% of all the drugs in clinical trials make it onto the market.^1,2

Given how time-consuming and financially taxing that process is, any ability to focus on high-potential avenues during the early stages of drug discovery is a huge advantage.

However, as most drug candidates comprise relatively “small” molecules, it is challenging to discern which ones warrant further exploration or not.²

Small molecules — defined in chemistry as compounds with a molecular mass less than 1000 atomic mass units and smaller than proteins and nucleic acids — exert their effects by binding to and altering the functions of proteins and nucleic acids.

Given the small size and structure of molecules, discriminating between promising and futile avenues represents a formidable obstacle in any drug discovery workflow.³

The challenges in early drug discovery are further compounded by the vast quantities of unstructured data we need to work with to find good candidates and the disparate sources of information that must be navigated.

Not only do research teams need to contend with the data generated from their own experimentation, but they must also sift through public third-party medical datasets.

Graphs and graph data science in drug discovery

Amidst these challenges in early drug discovery, a promising methodology has emerged. Initially, “graph technology” and the concept of “knowledge graphs” gained widespread prominence in the realm of social media networks by industry giants such as LinkedIn and Facebook.

Knowledge graphs provide a structured and interconnected representation of information, enabling the integration and organisation of diverse data streams into a unified framework.

By establishing relationships and connections between a very large set of data points, knowledge graphs facilitate the extraction of valuable insights and the identification of patterns that may be obscured within the vast expanse of unstructured information.

This kind of underlying pharma research data engine also offers a performance advantage. Unlike SQL (relational) database management systems, which store data in tables with fixed columns and rows, knowledge graphs store data as nodes (or entities) connected by edges (or relationships).

By conscious design, this technology wants to model the real world and how it connects in more naturalistic ways than the more abstract SQL. The benefit here is that the power of breakthrough insights lies within those interconnections.

Obtaining a hit rate of 1400%

Prominent pharmaceutical organisations are actively exploring the potential of applying knowledge graphs to streamline and enhance early drug discovery. One such company is Servier, a multibillion dollar French-headquartered international pharmaceutical company.

Institut de Recherches Servier, the company’s research and development (R&D) arm, has extensive expertise in working with small molecules.

Traditionally, its approach involved meticulously scanning through an expansive database housing millions of these minuscule biological entities to identify promising candidates for potential drug applications.

Before using graph technologies, the team had to painstakingly comb through vast (and often very sparse) collections of data. Despite their best endeavours, they could identify potential small molecule candidates at an unsatisfactorily low success rate of 1%.

However, by adopting a new approach centred around graph technologies, Servier has achieved exceptional results and is now consistently attaining a success rate of 15% when identifying promising small molecule candidates.

That’s a 1400% increase compared with the previous method. Moreover, this increased efficiency is being realised across a more focused dataset of approximately 1000 small molecules.

This is a significant improvement compared with the unwieldy dataset of one million molecules they previously contended with.

All this is because of its new graph-based decision support tool: Pegasus.4 By efficiently sifting therapeutic targets to identify the most relevant screening modalities, Pegasus is helping Servier’s lab team to design more appropriate trials.

Why relationships are key in drug discovery

Graphs are key to this approach. Thierry Dorval, the Institut’s Head of Data Sciences and Data Management, comments: “Graph is about relationships and really focused on how entities are linked together … rather than the entities themselves.”

“Typically, when we are using graphs for research, we’re talking about entities such as drugs, small molecules, proteins and pathways, so what really matters to us is not so much the protein itself as opposed to the way these proteins interact, the way the small molecules interact with those proteins and whether the small molecules are similar to each other.”

Thierry’s colleague, Jeremy Grignard, a Data and Research Scientist at the organisation, adds: “Pegasus represents the different data we’re handling as well as the various algorithms; as such, we can query the graph to keep finding answers to all the biological questions we want to ask.”

Ultimately, the key value proposition that graphs offer researchers is a means to navigate sparse data landscapes in a manner that illuminates the valuable connections hidden within the information.

For their part, Dorval and Grignard are firmly convinced that graph technologies possess the transformative potential to revolutionise drug discovery and development by facilitating the organisation of complex data and generating actionable knowledge to support critical decision-making processes.

It is also interesting to note the time that this approach is saving in terms of nudging the work away from poor search options.

“Graphs mean that the compound has been selected rationally, so you know why your hit has been selected; but, just as importantly, you can ask why it didn’t,” Grignard observes. “And that gives us really useful information and knowledge about what worked and what didn’t.”

Has Servier discovered a truly groundbreaking approach to saving time and effort during drug discovery? The evidence certainly points in that direction.

References

www.ncbi.nlm.nih.gov/pmc/articles/PMC9293739/#:~:text=However%2C%20true%20validation%20of%20any,%2C%2010%2C%2011%2C%2012.
www.pghr.org/post/what-makes-drug-discovery-and-development-so-difficult.
www.scientificamerican.com/blog/the-curious-wavefunction/why-drug-discovery-is-hard-part-2-easter-island-pit-vipers-where-do-drugs-come-from/.
https://neo4j.com/event/pegasus-unlocking-the-power-of-knowledge-graphs-in-new-drug-research-at-servier/.

Companies:
Neo4j

Graphs and graph data science in drug discovery

Obtaining a hit rate of 1400%

Why relationships are key in drug discovery

You may also like

Servier expands AI partnership with Google Cloud to bolster pharma R&D efforts

Trending Articles

Upcoming event

DCAT Week

You may also like

Servier expands AI partnership with Google Cloud to bolster pharma R&D efforts

Shimadzu introduces a new site for its portfolio of LIB solutions

PSI launches platform to speed up site ID, startup and enrolment in pivotal studies

Veeva launches eSource application to eliminate paper and streamline clinical trial data flow

Medical device manufacturer bridges the IT/OT gap

Mitsubishi Electric training strengthens FDA Packaging’s automation capabilities

Beyond the portal: building the digital foundations for global CGT delivery

Become a Subscriber

Get our newsletter

Follow us