Ensuring Consistency Between Define Files and SDTM Datasets for FDA Submissions: A Metadata-driven Approach
Abstract: This article discusses the importance of maintaining consistency between define files and the corresponding Study Data Tabulation Model (SDTM) datasets for FDA submissions. It introduces a “metadata-driven” method for creating a controlled terminology library within SDTM programming specifications, which addresses the issue of permissible values not presented in SDTM data. Additionally, a SAS Macro SAS Tool is developed for automatic checking of the consistency of controlled terminology between SDTM datasets and specifications, ensuring consistency between SDTM datasets and define files.
1.Introduction
In the pharmaceutical industry, data quality and consistency are crucial components for regulatory submissions, such as those made to the Food and Drug Administration (FDA). The FDA has identified a lack of consistency between define files and the corresponding SDTM datasets as a significant problem in many submissions. This issue can lead to delays in the review process and potential rejections, negatively impacting drug development timelines and costs. This article highlights the importance of ensuring consistency between define files and SDTM datasets for FDA submissions and introduces a metadata-driven approach to address this issue.
2.Background
2.1 SDTM Datasets and Define Files
The Study Data Tabulation Model (SDTM) is a standard data model for clinical trial data, developed by the Clinical Data Interchange Standards Consortium (CDISC). SDTM datasets are designed to facilitate the organization, standardization, and exchange of clinical trial data for regulatory submissions. Define files, typically in XML format (define-xml), provide metadata that describes the content, structure, and format of SDTM datasets.
2.2 Challenges in Ensuring Consistency
Many pharmaceutical companies use a “data-driven” method to populate controlled terminology in define-xml files, retrieving it directly from SDTM data. However, this approach can result in a loss of information for permissible values not present in the SDTM data. The CDISC SDTM Implementation Guide V3.2 requires that “all values in the permissible value set for the study should be included, whether they are represented in the submitted data or not” [3]. Therefore, a new method is needed to address this requirement and ensure consistency between define files and SDTM datasets.
3.Metadata-driven Approach
To meet the CDISC requirement and maintain consistency, we propose a “metadata-driven” method for managing controlled terminology in clinical trials. This approach involves creating a controlled terminology library within the SDTM programming specifications, which serves as the authoritative source for metadata management and define file generation.
3.1 Controlled Terminology Library
The controlled terminology library is a comprehensive and standardized collection of terms and their permissible values, created and maintained within the SDTM programming specifications. This library provides a central repository for managing controlled terminology across all SDTM datasets, ensuring consistency and adherence to CDISC standards.
3.2 SDTM Programming Specifications
The SDTM programming specifications serve as the single source of truth for metadata management in the metadata-driven approach. These specifications include detailed information on variable names, labels, formats, and controlled terminology, ensuring that all SDTM datasets adhere to the same standards.
4.SAS Macro SAS Tool for Automatic Consistency Checking
To further streamline the process and ensure the consistency of controlled terminology between SDTM datasets and programming specifications, a SAS Macro SAS Tool was developed. This tool automates the consistency checking process, reducing the risk of human error and enhancing efficiency.
4.1 Tool Functionality
The SAS Macro SAS Tool reads the SDTM datasets and programming specifications, comparing the controlled terminology in each to identify any discrepancies. If inconsistencies are detected, the tool generates a report highlighting the issues and potential solutions. This automated process ensures that any inconsistencies between the SDTM datasets and programming specifications are quickly identified and addressed, ultimately ensuring consistency between SDTM datasets and define files.
4.2 Benefits of the SAS Macro SAS Tool
The SAS Macro SAS Tool offers several benefits in the process of ensuring consistency between define files and SDTM datasets, including:
a) Automation: The tool automates the consistency checking process, reducing manual intervention and the potential for human error.
b) Efficiency: By automating the consistency checking process, the SAS Macro SAS Tool speeds up the overall process, allowing for quicker identification and resolution of inconsistencies.
c) Quality Control: The tool serves as an additional layer of quality control, ensuring that SDTM datasets adhere to programming specifications and CDISC standards.
d) Streamlined Reporting: The tool generates a report outlining any inconsistencies and suggested resolutions, simplifying the process of addressing discrepancies between the SDTM datasets and programming specifications.
5.Implementation and Results
To validate the effectiveness of the metadata-driven approach and the SAS Macro SAS Tool, we applied this method to several clinical trials. The results demonstrated a significant improvement in the consistency between define files and SDTM datasets, meeting the requirements set forth by CDISC and the FDA.
5.1 Application in Clinical Trials
We implemented the metadata-driven approach in multiple clinical trials, creating a controlled terminology library within the SDTM programming specifications and using the SAS Macro SAS Tool to automatically check for consistency between the SDTM datasets and specifications. This process was integrated into the standard workflow for SDTM dataset creation and define file generation.
5.2 Improved Consistency and Regulatory Compliance
The metadata-driven approach and the use of the SAS Macro SAS Tool resulted in a marked improvement in the consistency between define files and SDTM datasets. By addressing the CDISC requirement for including all permissible values in the define files, our approach ensured compliance with regulatory standards, reducing the risk of submission delays or rejections.
Conclusion
Ensuring consistency between define files and SDTM datasets is crucial for FDA submissions and regulatory compliance. The metadata-driven approach introduced in this article addresses the challenges posed by the data-driven method, creating a controlled terminology library within the SDTM programming specifications and leveraging a SAS Macro SAS Tool for automatic consistency checking. This method offers a comprehensive solution for maintaining consistency between define files and SDTM datasets, ultimately improving regulatory submission quality and reducing the risk of delays or rejections.