Making best use of big data

The volume of data generated during drug discovery is prodigious, and now data from vast new sources is to be added. So how is this information explosion being used and managed?

Big data is creating opportunities that can help the pharma industry to develop and manufacture drugs more effectively. However, the sector faces a number of challenges in the way it manages and analyses the increasingly broad range of data that is available, and collaboration is essential for the industry to capitalise on the potential of the big data explosion. That was the key message from a Big Data in Pharma conference, organised by business information company SMi in London in May.

A key focus of the conference was real world data (RWD) – also referred to as real world evidence (RWE), which is generally used to describe data collected from a real-life setting – such as in a healthcare environment – in contrast to data from a clinical trial.

Rob Thwaites, Vice President of Health Economics and Epidemiology at life science strategist Evidera and Chair of the Association of the British Pharmaceutical Industry’s pharma industry health information group, noted that a growing range of RWD is emerging that can be useful for the industry. He observed that there are increasing sources of healthcare data, such as electronic health records (EHR) – also called electronic medical records (EMR) – increasingly widely used by health organisations. These comprise electronic collections of health information about an individual patient or a country’s population, such as medical history, immunisation status, laboratory test results, radiology images and vital signs.

A growing volume of data is emerging from self-generated information and through digital engagement

Thwaites said that growing amounts of genetic data are also becoming available, such as tumour mutation patterns and whole genome sequencing. There are a number of genetic data projects across the world, such as the UK Biobank, which recruited 500,000 people aged between 40–69 years in 2006–2010 from across Britain to provide blood, urine and saliva samples for future analysis, as well as detailed information about themselves. The UK Biobank is now monitoring these participants’ health, creating a powerful databank that is helping scientists discover why some people develop particular diseases and others do not.

Thwaites also observed that a growing volume of data is emerging from self-generated information and through digital engagement, with examples including healthcare professional and patient forums designed to allow the sharing of health-related experiences, plus similar information from social media such as Twitter. He added that such data is complemented by existing sources, such as government, academic and commercial data open to the public, including aggregated population health data, and summary clinical trial results. He noted, for example, that around 3,000 open datasets on the UK public data website www.data.gov.uk are health-related.

Life cycle approaches

Niklas Bergvall, Senior Director, global HE & OR neuroscience at Novartis, acknowledged that RWE is becoming increasingly important in the development of drugs, with demand growing for ‘continuous evidence generation’ of the effects of medicines. ‘Payers, healthcare decision-makers and regulatory authorities are increasingly asking for post-launch evidence of therapies in a real-world setting,’ he said.

He explained that this is being driven by a number of factors, including a greater awareness of the gap between the efficacy of a drug in a standard trial and in a real-world setting. He noted that regulators are using RWE to revise product labels to highlight safety concerns; medicine purchasers are re-evaluating products post-launch by using comparative RWE; and physicians are using RWE to inform guidelines that influence clinical practice.

Bergvall said these advances are having a significant impact on the drug development process: ‘The former sprint to achieve pricing and market access is now a marathon of evidence generation throughout the product life cycle,’ he stated. ‘Evidence is now required across the product life cycle, and new approaches to evidence generation are needed.’

The former sprint to achieve pricing and market access is now a marathon of evidence generation throughout the product life cycle

To adapt, he explained that Novartis has created different ‘franchise data marts’ – these assemble data from a wide variety of sources for different areas of focus within Novartis. For example, RWE data sources from the US used in Novartis’ neuroscience data mart include prescription data from pharmacy records, representing around 50% of speciality prescriptions; insurance claims data from more than 800,000 physicians for patients in health plans; patient encounter information collected from more than 650 hospitals; and an EMR database that covers around 23 million patient records from more than 40,000 physicians.

This RWE data is combined with clinical trials data, including randomised controlled trial data (RCT); observational study data; and patient-reported outcome (PRO) study data. ‘This helps us develop not only a life cycle for the drug, but also for the patient, allowing us to understand all the interactions a patient has with the healthcare system,’ Bergvall said.

Dr Athula Herath, Statistical Director, R&D statistics, MedImmune, spoke about how statistical analysis of RWD data can help develop new, targeted drugs or personalised medicines.

He said that MedImmune has developed a therapeutic landscape map using a multi-dimensional statistical model that creates a graphical display of outcomes experienced by patients following a particular therapy. The map is created using extensive data sources, including published RCT data, observational study data, and data from RWE studies from different countries, which Herath said are used to establish ‘meaningful clinical phenotypes that can describe the potential therapeutic landscape.’

He said that two further layers of information are added to the map: available molecular expression data, such as gene expression and protein expression data; and published summary data and meta-analysis output of relevant current therapies, including those being developed by MedImmune and others.

Herath said this helps to show what therapeutic effects are realised, or can potentially be caused by different therapies across different population segments. He explained that a linear combination of outcome measures is used on the map to display numbers and types of patients according to their outcomes, using different colours. This highlights segments of patients with good outcomes, in red, and with poor outcomes, in blue, and therefore illustrates where there are unmet needs among patients for particular medicine types.

Mapping data

Herath said the map can provide ‘a powerful tool for pinpointing the unmet need in stratified medicine, [which] helps to design new clinical development programmes with high precision by fully characterising the strata – or clusters – in terms of their clinical characteristics, such as inclusion and exclusion criteria and diagnostics’.

He added that its ability to use data from different national populations is also proving increasingly valuable. ‘This is important because companies are trying to develop drugs for the global market, rather than individual countries,’ he said.

It’s the increasing complexity of the data that causes issues when linking it with other sets of information

Rachel Uphill, Enterprise/Information Architect, R&D IT Solutions, delivery & integration, at GSK medicines research centre, agreed that the growing volume of RWD is presenting major new opportunities for pharma companies, but warned they also face challenges in ensuring they handle RWD effectively: ‘It’s the increasing complexity of the data that causes us [the industry] issues when linking it with other sets of information,’ she said, adding: ‘The data groups are growing, and we’ve got disparate data sets – we’ve got data on hard drives, on the Cloud, and in SAS data sets stored somewhere else.’

Uphill said that to make good use of all the data – both RWD and data from clinical trials – it is vital for a pharma company to ensure it fully understands all its data, to link it together effectively. She said companies should adhere to strict data quality standards to ensure the data’s provenance is clear; and also use master data management, so all data is co-ordinated and linked, with changes made automatically where the data is being used.

As an example, Uphill pointed to Socrates, a search tool application that GSK implemented in 2012, which helps the user find archived scientific knowledge. In addition to standard text indexing, it uses sophisticated text analytics to identify chemical structure, gene, species and disease entities, helping provide the user with any relevant available information associated with a particular disease or drug. The system indexes information from a variety of sources, including electronic lab notebooks, Documentum archives, different databases and file shares.

US firm Proteus Digital Health has designed a device to alert relatives and doctors when someone has taken a medicine by sending a text message from inside the body

Looking ahead, Thwaites outlined the potential of the big data explosion over the coming years as the sources of RWD are set to continue expanding. He noted that ‘smart devices will offer greater volumes of data in the future’ – for example, digital pills. One is being developed by US firm Proteus Digital Health and is designed to alert relatives and doctors when someone has taken a medicine by sending a text message from inside the body when the pill is taken.

Thwaites also explained that there is a growing range of collaborative opportunities that can help pharma companies to capitalise on big data. He pointed to the European Federation of Pharmaceutical Industries and Associations figures ranking pharma spending on R&D by country in Europe, which showed that the UK spends the most, followed by France, Germany and Switzerland.

He said that in the UK, the development of e-health-related capabilities has been identified as a key factor underpinning the strength of the UK’s R&D capabilities.

The development of e-health-related capabilities has been identified as a key factor underpinning the strength of the region’s R&D capabilities

Thwaites said that among the most notable examples of UK government investment in this area is the Farr Institute of Health Informatics Research, which was set up in 2013 following government investment of around £30m in e-health informatics research centres. The Farr Institute is a network of four groups of academic and research organisations, centred in the north west of England, Scotland, Wales and London, and led by the Universities of Manchester, Dundee, Swansea and University College London.

The Farr Institute aims to deliver research linking electronic health data with other forms of research and routinely-collected data, while building capacity in health informatics research.

Thwaites stressed that the institute illustrates the value of collaboration. ‘The advantage and strength of some of these groups is not only their e-health capabilities but also that they are already strong in research,’ he said, noting that the north west of England group has both expertise and large volumes of RWD in oncology and in respiratory diseases, and the Wales group has access to the biggest registry of multiple sclerosis RWD in the world. ‘These groups have already got skills and capabilities and assets that hopefully we will be able to tap into the future,’ he said.

Companies