Data Cut-Off in Oncology Studies: A Step-by-Step Guide

Christian Baghai
6 min readFeb 10, 2023

--

Photo by Carlos Muza on Unsplash

In clinical trials, particularly in oncology, it is common practice to cut data based on a specified date or when a certain number of events have occurred. This process is referred to as Data Cut-Off (DCO) and it plays a crucial role in supporting interim analysis, as it has a significant impact on the interpretation of trial results. In this article, we will describe the methodology of performing DCO on collected RAW data and provide detailed instructions for each step of the DCO process.

Step 1: Generating the DCO Specification

The process of DCO begins with a SAS macro that automatically generates an Excel file of DCO Specification and a Word file containing DCO rules for each RAW data set. The information in the Excel file is based on the RAVE Architect Loader Specification (ALS) file, which is a standard file format used to transfer data between systems.

Step 2: Cutting the RAW Data

Once the DCO Specification has been generated, the next step is to develop a data cut macro to read the Excel Specification and apply the DCO to the RAW data. This macro will cut the data based on the rules specified in the DCO Specification and generate the post-DCO data sets.

Step 3: Checking the Post-DCO Data

The final step in the DCO process is to examine the post-DCO data sets to ensure they indeed follow the Specification.

The Importance of Data Cut-Off (DCO) in Oncology Clinical Trials

In order to perform a formal interim analysis, the Data Management (DM) team typically has several weeks after the cut-off date to perform data cleaning. During this time, the Electronic Data Capture (EDC) system remains open to investigators, allowing them to add records that may start or occur after the cut-off date. Once data cleaning is completed, data is extracted from the EDC and the RAW data may contain subjects and records that should not be included in the interim analysis.

The DCO process creates two subsets of data: one subset contains only data collected on or before the DCO date (data kept), and another subset contains data collected after the DCO date (data removed). The DCO date is decided by the study team for analysis, regulatory, or other purposes.

The implementation of DCO is most commonly performed in oncology studies when a certain number of events have occurred or when a study milestone or specified duration of follow-up is reached. This can include, but is not limited to, supporting sample size recalculation for adaptive trial design and preparing supportive Table, Listing, Figures (TLFs) for regulatory activities such as a Development Safety Update Report (DSUR), Investigational New Drug (IND) application, Breakthrough Therapy Designation (BTD) application, and New Drug Application (NDA).

It is important to carefully implement DCO, as incorrect implementation can lead to differences in endpoint assessment, particularly in early phase oncology studies when there are a limited number of patients enrolled. As such, it is crucial to thoroughly understand the methodology of performing DCO on collected RAW data and to carefully follow the steps involved in the process to ensure accurate and reliable results.

Consideration to take into account

The DCO process can be carried out at different stages of the data process, either on RAW data collected from the CRF and extracted from EDC or provided by external sources such as central lab or on SDTM data sets, which are CDISC standards for study data tabulation.

Performing the DCO on RAW data means that the SDTM and ADaM data sets would be generated from the same post-DCO data, which would greatly increase traceability. However, manipulating source data could lead to confusion of missing original records. On the other hand, performing the DCO on SDTM data sets would avoid coding on source data, but it has been proven to be more difficult.

Taking all of these factors into consideration, it is recommended to perform the DCO on RAW data. To avoid confusion on missing source data, the DCO process splits the RAW data into two sets of data, the pre-DCO data set, and the post-DCO data set, which are stored in different libraries.

Data Cleaning

In clinical studies, data cleaning is an essential step in the analysis process to ensure the quality and validity of the data. A crucial aspect of data cleaning is the cut-off date, which serves as the boundary for the data to be included in the analysis. The cut-off date helps to limit the impact of data that may be collected after the completion of the study, ensuring that the data used in the analysis is consistent and accurate.

Subject Level Cut Subject level cut is the first step in the data cleaning process. This level of cut removes all the data collected from subjects who did not provide informed consent before the cut-off date. This step helps to ensure that all subjects included in the data analysis have given their consent for the use of their information. The subject level cut is applied to all data sets before proceeding to the record level cut.

Record Level Cut After the subject level cut, the next step is the record level cut, which removes records with assessment dates, event dates, or intervention dates after the cut-off date. The record level cut follows specific rules listed below:

  • For partial dates, only the available date part is compared with the same part of the cut-off date. If the year and month are available, the record is removed if the year-month is later than the year-month of the cut-off date. If only the year is available, the record is removed if the year is later than the year of the cut-off date.
  • For completely missing dates, the general principle is to keep the record.

It is common for the date variable to have missing or unknown components. In such cases, the general principle is to include as many records as possible.

Complex Cutting Logic In some cases, such as the overall response of tumor assessment, there is no need to perform record level cut because the date is unavailable, or the date of the scan does not determine the response. For example, in Dizal Data Management practice, tumor assessments might be collected in multiple data sets, and one visit could cross several dates. If the overall response is progression disease, the data of the overall assessment is usually mapped to the earliest date, while if the subject is a responder, the overall assessment date is mapped to the latest. In this case, the cut-off process should be completed in the program of SDTM development.

Another example is when an adverse event starts before the cut-off date and the toxicity grade changes after the cut-off date. In this case, all toxicity grade change information should be kept in the adverse event domain regardless of whether the date of toxicity change is before or after the cut-off date, as the adverse event action or outcome should be revised if the grade change after the cut-off date is eliminated. This will be explained in the cSDRG and the complex logic will be handled during the ADaM development.

In conclusion, the cut-off date is a crucial step in the data cleaning process, ensuring the validity and quality of the data used in clinical studies. By applying the subject level and record level cuts, and following the rules governing the data cleaning process, the analysis will be based on accurate and consistent data.

--

--

Christian Baghai
Christian Baghai

No responses yet