Ensuring Data Integrity with the Allotrope Framework

Data integrity is a hot topic as we move into 2019 and for good reason. Pharma and biotech companies produce an enormous amount of experimental R&D data. The sheer volume and complexity of the data makes managing this critical asset challenging. Leading companies are thus investing heavily in initiatives aimed at more effectively managing experimental data across its full lifecycle in hopes of driving the innovation necessary to stay competitive and ensure data integrity.
The FDA has been stepping up its efforts to evaluate data integrity issues within the pharmaceutical industry due to a growing trend of observed data integrity violations. While the Quality Control laboratory has historically been the main focus of inspections, R&D laboratories, clinical research efforts and batch records in production are now also starting to come under regulatory data integrity scrutiny.
In 2012, a group of pharmaceutical and biotechnology companies came together to form the Allotrope Foundation to pool their collective resources to develop a solution to the data integrity and data management issues plaguing the industry. We will discuss how the Allotrope Framework – an advanced data architecture that harmonizes the collection, exchange, and management of experimental laboratory data over its complete lifecycle – can be utilized to improve both data management and data integrity. The overall methodology implemented in the Allotrope Framework is very much applicable to other industries.
Data Integrity Points of Failure
In its recently released guidance on data integrity, the FDA states that, “For the purposes of this guidance, data integrity refers to the completeness, consistency, and accuracy of data. Complete, consistent, and accurate data should be attributable, legible, contemporaneously recorded, original or a true copy, and accurate (ALCOA).” In order to maintain data integrity and withstand FDA scrutiny, pharmaceutical companies must take steps to ensure that data records are accurate, complete, consistent and maintained within their original context.
Experimental data goes through a well-defined lifecycle to deliver value to an organization:
- Data must first be acquired, processed and then analyzed.
- The results from the analysis are shared and used to generate reports.
- Next, data is stored so it can be reused/mined.
- Finally, the data is archived, and in some cases, destroyed.
There are three main points of failure in the systems used to drive the experimental data lifecycle in pharmaceutical companies that contribute to data integrity issues:
- Proprietary file formats.
- Inconsistent contextual metadata.
- Incompatible software.
The Allotrope Framework was designed to address these foundational data integrity issues in order to enable the extraction of the maximum amount of value from experimental data.
Proprietary File Formats
Over the years, an extensive array of instruments and software applications have been developed to support analytical chemistry in the laboratory. These technologies have been developed by many different companies and thus don’t necessarily speak the same language.
Proprietary data formats create a significant problem for organizations that need to share data or electronic methods between business units or with partners like contract research organizations (CROs). The problem is compounded when partners or sites use different hardware or software in their workflows.
To work around this issue, data are oftentimes converted into static PDF documents that are then sent to the recipient. The data in these PDF files then need to be transcribed by hand into the proprietary formats required by the recipient’s software system. These manual conversions are very time-consuming and costly, and also introduce the possibility of human error and/or misinterpretation into the data lifecycle.
Static PDF documents are typically unable to adequately represent the complete data/metadata package (e.g., audit trail information) which exists in the original electronic records. The end result of these proprietary data formats is that a lot of the information and potential value of the experimental data is lost as it progresses along its natural lifecycle, and the possibility for data integrity issues is dramatically increased.
Inconsistent Contextual Metadata
The FDA states that “data should be maintained throughout the record’s retention period with all associated metadata required to reconstruct the CGMP activity.” Metadata contains the information (who, what, where, when, why and how the data was generated) needed to understand the data in its proper context. Most software systems do not capture this information in sufficient detail to provide a complete contextual understanding of the experimental data at a future date. Scientists involved in the experiment must therefore fill in the metadata gaps with free text entry, or by creating descriptions from a limited set of predefined vocabulary that is often inconsistent (due to proprietary file formats) and not shareable with other systems.
The reliance on free text entry by scientists to log metadata creates a number of data integrity issues that reduce the value of the metadata, and ultimately the data that it is associated with – conveying inaccurate information, spelling errors, deviations from predefined format, blank fields, etc. The reality is that contextual metadata is often incompletely, inaccurately, or incorrectly captured throughout the analytical workflow.
While humans can correctly interpret an abbreviation or misspelling, software typically cannot. This makes software-based searching or aggregation of contextual metadata very difficult, usually requiring some kind of human intervention. Pieces of critical experimental context are therefore spread out across different software applications that don’t communicate with each other. This makes any effort to reconstruct the experiment very time-consuming. As time goes on, the possibility of losing awareness of where these different pieces of information are stored, and how the vocabulary is to be interpreted increases, and thus the potential for accurately reconstructing the experiment diminishes.
Incompatible Software
The challenges of connecting all the different systems necessary to drive the data lifecycle in a typical pharmaceutical company together in a way in which they are all speaking the same language is challenging indeed. Much of the time, companies need to hire an external consultant to integrate this patchwork of applications and instruments and allow the seamless transfer of data across the product lifecycle. The end result is a highly customized environment that makes software upgrades, version changes and the addition of new software systems difficult and expensive. No matter how much time and effort companies invest in digital continuity initiatives, a fully integrated data sharing environment often remains elusive.
Given the architectural complexity of the informatics systems in modern pharmaceutical companies, finding and correcting a mistake leading to a data integrity issue can be quite challenging, particularly if some of the data is missing or contradictory. If there are any gaps or issues in data accuracy or accessibility (data integrity) at any point along this lifecycle, the trustworthiness of the measured final result/conclusion is impacted in a negative way.
The Allotrope Framework Solution
The Allotrope Foundation’s ultimate goal is the development of an advanced data architecture that will harmonize the collection, exchange, and management of laboratory data over its complete lifecycle. Towards this end, the Foundation has created a suite of software tools called the Allotrope Framework that allow software developers to implement a consistent set of data standards into the software that laboratories use to manage their workflows and data. The Framework has three components:
The Allotrope Data Format (ADF) is a family of vendor and platform agnostic specifications that are designed to standardize the collection, exchange and storage of the analytical data and metadata captured in laboratory workflows. Class libraries provide reusable software components that can be used to adapt existing applications or create new software solutions. The Foundation also provides a free ADF explorer – an application that can open any ADF file to view the data (data description, data cubes, data package) stored within. An ADF file can tell you:
- Why the data was gathered (sample, study, purpose)
- How the data was generated (instrument, method)
- How the data was processed (analysis method)
- The shape of the data (dimensions, measures, structure)
The ADF is intended to facilitate speedy real-time access to, and long-term stability of, archived analytical data. It has been designed to meet performance requirements of modern instrumentation, and also to be extensible by allowing new techniques and technologies to be incorporated while maintaining backwards compatibility with previous versions. The end result is a data format that is portable, allowing easy file transfer and use across operating systems and vendor platforms that are independent of the instrument that created the data.
The Allotrope Taxonomies and Ontologies (AFO) provide a controlled vocabulary and semantic model for the contextual metadata that is needed for the representation of laboratory analytical processes (tests and measurements) and eventual interpretation of the data. The domains modeled include: Equipment, Material, Process and Results. The standard language is being developed to cover a broad range of analytical techniques and instruments.
Allotrope Data Models (ADM) use the Shapes Constraint Language (SHACL) to define data structures (schemas, templates) that describe how to use the ontologies in a standardized (i.e. reproducible, predictable, verifiable) manner.
The Allotrope Framework addresses data integrity concerns at their source by providing a single common data format for any analytical technique, a controlled vocabulary and software components to adapt existing software. The Allotrope Framework is essentially a software development kit that allows manufacturers of analytical equipment to render their machine output in the ADF, and software developers to embed the Allotrope data format and terminologies into software and interact via standardized application programming interfaces (APIs).
With the Allotrope Framework applied, metadata stored alongside data will naturally grow as the data moves through the product lifecycle. This allows a data map of the laboratory to be seamlessly created and searched as needed. By eliminating the need to convert between file formats and manually transcribe data, the Allotrope Framework eliminates many common data integrity issues.
Why Astrix
In order to realize the benefits of the Allotrope framework, organizations will need to:
- Understand what the ADF, AFO, ADM are and how they are intended to be used
- Decide how the Allotrope Framework is going to fit into their project
- Work with their subject matter experts to define the desired shape of their data (data description, data format, raw data)
- Work with Allotrope to understand how the Allotrope ontologies map to their datasets
- Understand the current state/format of their data, regarding:
- Instruments
- Methods
- Analyses
- Software
- Craft a staged project plan to move from their current state to the desired state
- Define the processes and tools that you will use to convert data from its present state into Allotrope-compliant files
- Train their in-house resources
- Support their in-house resources
- Plan for downstream uses of the data, including:
- Regulatory compliance
- Data archiving
- Maintain and evolve the system as needs change
As a member of the Allotrope Partner Network, Astrix Technology Group is uniquely positioned to assist your organization in developing and implementing an effective Allotrope Framework architecture and strategy. Laboratory automation and informatics are the central focus of our organization. Leveraging our scientific domain knowledge, technology expertise, industry experience and extensive partner network, our Team can help you with each one of the steps described above.
In addition, Astrix provides nearshoring options to support competitive pricing strategies for all its services. Astrix nearshoring offices provide expert informatics consulting across the full range of scientific informatics professional services such as project management, business analysis, managed services, software development, implementation, integration, QA, vendor selection and more. In addition, the Astrix nearshoring offices are located in time zones similar to the U.S. mainland, allowing our nearshoring teams to easily attend conference calls during local business hours, or travel on-site as needed to work with clients.
Conclusion
Through the Allotrope Framework, the Allotrope Foundation has created an effective solution to the pharmaceutical and biotech sector’s data management and data integrity challenges. These challenges are certainly not unique to the Life Science Industry, however. The overall methodology implemented in the Allotrope Framework solution is very much applicable to other industries. As our world becomes more and more connected through technology, companies in many industries are beginning to think about standardization and interoperability of technologies as a key factor in the push to optimize innovation and maintain a competitive edge. The Allotrope Framework provides an effective model that promises to be a part of the solution to data integrity issues now and into the future.