STEP-BY-STEP SDTM DATASET CREATION GUIDE FOR CLINICAL TRIAL PROGRAMMERS USING SAS | UNIQUE CYBERSECURITY-BASED DEMOGRAPHICS EXAMPLE | PROC DATASETS | PROC PRINT | PROC CONTENTS | PROC EXPORT | SAS MACRO AUTOMATION FOR SDTM COMPLIANCE
/*Let's walk step-by-step through creating a unique SDTM (Study Data Tabulation Model) dataset*/
Step 1: Understand the SDTM DM Dataset
The Demographics (DM) dataset captures subject-level data such as:
Study ID
Subject ID
Site ID
Age, Sex, Race
Country
Date of Birth
Randomization date (RFSTDTC)
Arm assignment
Unique Subject Identifier (USUBJID)
Step 2: Define Variables
We will use the following variables for the DM dataset:
|
Variable |
Label |
Description |
|
STUDYID |
Study
Identifier |
"CYBER001" |
|
DOMAIN |
Domain
Abbreviation |
"DM" |
|
USUBJID |
Unique
Subject Identifier |
"CYBER001-SITEID-SUBJID" |
|
SUBJID |
Subject
Identifier |
Subject number |
|
SITEID |
Study
Site Identifier |
e.g., "101",
"102" |
|
SEX |
Sex |
"M" or "F" |
|
AGE |
Age |
Integer |
|
AGEU |
Age
Units |
"YEARS" |
|
RACE |
Race |
Standard categories |
|
COUNTRY |
Country |
e.g., "INDIA",
"USA" |
|
ARM |
Description
of Planned Arm |
e.g., "CyberRelax",
"Placebo" |
|
RFSTDTC |
Subject
Reference Start Date/Time |
Randomization date in ISO8601 |
Step 3: Create the Raw Input Data
Placing 12 unique observations in the Raw Dataset. Let’s create an input dataset in SAS.
data raw_dm;
infile datalines dlm=',' dsd;
input SUBJID :$3. SITEID :$3. SEX :$1. AGE RACE :$20. COUNTRY :$10. ARM :$15. RFSTDTC :$10.;
datalines;
001,101,M,28,Asian,INDIA,CyberRelax,2024-06-01
002,101,F,34,White,USA,Placebo,2024-06-03
003,102,M,41,Black,USA,CyberRelax,2024-06-05
004,102,F,29,Asian,INDIA,CyberRelax,2024-06-06
005,101,M,36,White,USA,Placebo,2024-06-07
006,103,F,26,Other,GERMANY,CyberRelax,2024-06-08
007,103,M,33,Asian,INDIA,Placebo,2024-06-09
008,101,F,45,White,UK,CyberRelax,2024-06-10
009,102,M,38,Black,USA,Placebo,2024-06-11
010,103,F,30,Asian,INDIA,CyberRelax,2024-06-12
011,104,M,27,White,CANADA,Placebo,2024-06-13
012,104,F,31,Other,JAPAN,CyberRelax,2024-06-14
;
run;
proc print;run;
Output:
| Obs | SUBJID | SITEID | SEX | AGE | RACE | COUNTRY | ARM | RFSTDTC |
|---|---|---|---|---|---|---|---|---|
| 1 | 001 | 101 | M | 28 | Asian | INDIA | CyberRelax | 2024-06-01 |
| 2 | 002 | 101 | F | 34 | White | USA | Placebo | 2024-06-03 |
| 3 | 003 | 102 | M | 41 | Black | USA | CyberRelax | 2024-06-05 |
| 4 | 004 | 102 | F | 29 | Asian | INDIA | CyberRelax | 2024-06-06 |
| 5 | 005 | 101 | M | 36 | White | USA | Placebo | 2024-06-07 |
| 6 | 006 | 103 | F | 26 | Other | GERMANY | CyberRelax | 2024-06-08 |
| 7 | 007 | 103 | M | 33 | Asian | INDIA | Placebo | 2024-06-09 |
| 8 | 008 | 101 | F | 45 | White | UK | CyberRelax | 2024-06-10 |
| 9 | 009 | 102 | M | 38 | Black | USA | Placebo | 2024-06-11 |
| 10 | 010 | 103 | F | 30 | Asian | INDIA | CyberRelax | 2024-06-12 |
| 11 | 011 | 104 | M | 27 | White | CANADA | Placebo | 2024-06-13 |
| 12 | 012 | 104 | F | 31 | Other | JAPAN | CyberRelax | 2024-06-14 |
Step 4: Define a SAS Macro to Transform to SDTM Format
Add STUDYID as “CYBER001”
Add DOMAIN as “DM”
Create USUBJID = STUDYID || "-" || SITEID || "-" || SUBJID
Add AGEU as “YEARS”
%macro create_sdtm_dm(input=, output=);
data &output.;
set &input.;
length STUDYID $10 DOMAIN $2 USUBJID $25 AGEU $5;
STUDYID = "CYBER001";
DOMAIN = "DM";
USUBJID = catx("-", STUDYID, SITEID, SUBJID);
AGEU = "YEARS";
label
STUDYID = "Study Identifier"
DOMAIN = "Domain Abbreviation"
USUBJID = "Unique Subject Identifier"
SUBJID = "Subject Identifier for the Study"
SITEID = "Study Site Identifier"
SEX = "Sex"
AGE = "Age"
AGEU = "Age Units"
RACE = "Race"
COUNTRY = "Country"
ARM = "Description of Planned Arm"
RFSTDTC = "Subject Reference Start Date/Time";
run;
proc print;run;
%mend;
Step 5: Execute the Macro
%create_sdtm_dm(input=raw_dm, output=sdtm_dm);
Output:
| Obs | SUBJID | SITEID | SEX | AGE | RACE | COUNTRY | ARM | RFSTDTC | STUDYID | DOMAIN | USUBJID | AGEU |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | 001 | 101 | M | 28 | Asian | INDIA | CyberRelax | 2024-06-01 | CYBER001 | DM | CYBER001-101-001 | YEARS |
| 2 | 002 | 101 | F | 34 | White | USA | Placebo | 2024-06-03 | CYBER001 | DM | CYBER001-101-002 | YEARS |
| 3 | 003 | 102 | M | 41 | Black | USA | CyberRelax | 2024-06-05 | CYBER001 | DM | CYBER001-102-003 | YEARS |
| 4 | 004 | 102 | F | 29 | Asian | INDIA | CyberRelax | 2024-06-06 | CYBER001 | DM | CYBER001-102-004 | YEARS |
| 5 | 005 | 101 | M | 36 | White | USA | Placebo | 2024-06-07 | CYBER001 | DM | CYBER001-101-005 | YEARS |
| 6 | 006 | 103 | F | 26 | Other | GERMANY | CyberRelax | 2024-06-08 | CYBER001 | DM | CYBER001-103-006 | YEARS |
| 7 | 007 | 103 | M | 33 | Asian | INDIA | Placebo | 2024-06-09 | CYBER001 | DM | CYBER001-103-007 | YEARS |
| 8 | 008 | 101 | F | 45 | White | UK | CyberRelax | 2024-06-10 | CYBER001 | DM | CYBER001-101-008 | YEARS |
| 9 | 009 | 102 | M | 38 | Black | USA | Placebo | 2024-06-11 | CYBER001 | DM | CYBER001-102-009 | YEARS |
| 10 | 010 | 103 | F | 30 | Asian | INDIA | CyberRelax | 2024-06-12 | CYBER001 | DM | CYBER001-103-010 | YEARS |
| 11 | 011 | 104 | M | 27 | White | CANADA | Placebo | 2024-06-13 | CYBER001 | DM | CYBER001-104-011 | YEARS |
| 12 | 012 | 104 | F | 31 | Other | JAPAN | CyberRelax | 2024-06-14 | CYBER001 | DM | CYBER001-104-012 | YEARS |
After execution, the sdtm_dm dataset will contain transformed and SDTM-compliant data.
Step 6: Review the SDTM Dataset
proc print data=sdtm_dm label noobs;
run;
Output:
| Subject Identifier for the Study |
Study Site Identifier |
Sex | Age | Race | Country | Description of Planned Arm |
Subject Reference Start Date/Time |
Study Identifier | Domain Abbreviation |
Unique Subject Identifier | Age Units |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 001 | 101 | M | 28 | Asian | INDIA | CyberRelax | 2024-06-01 | CYBER001 | DM | CYBER001-101-001 | YEARS |
| 002 | 101 | F | 34 | White | USA | Placebo | 2024-06-03 | CYBER001 | DM | CYBER001-101-002 | YEARS |
| 003 | 102 | M | 41 | Black | USA | CyberRelax | 2024-06-05 | CYBER001 | DM | CYBER001-102-003 | YEARS |
| 004 | 102 | F | 29 | Asian | INDIA | CyberRelax | 2024-06-06 | CYBER001 | DM | CYBER001-102-004 | YEARS |
| 005 | 101 | M | 36 | White | USA | Placebo | 2024-06-07 | CYBER001 | DM | CYBER001-101-005 | YEARS |
| 006 | 103 | F | 26 | Other | GERMANY | CyberRelax | 2024-06-08 | CYBER001 | DM | CYBER001-103-006 | YEARS |
| 007 | 103 | M | 33 | Asian | INDIA | Placebo | 2024-06-09 | CYBER001 | DM | CYBER001-103-007 | YEARS |
| 008 | 101 | F | 45 | White | UK | CyberRelax | 2024-06-10 | CYBER001 | DM | CYBER001-101-008 | YEARS |
| 009 | 102 | M | 38 | Black | USA | Placebo | 2024-06-11 | CYBER001 | DM | CYBER001-102-009 | YEARS |
| 010 | 103 | F | 30 | Asian | INDIA | CyberRelax | 2024-06-12 | CYBER001 | DM | CYBER001-103-010 | YEARS |
| 011 | 104 | M | 27 | White | CANADA | Placebo | 2024-06-13 | CYBER001 | DM | CYBER001-104-011 | YEARS |
| 012 | 104 | F | 31 | Other | JAPAN | CyberRelax | 2024-06-14 | CYBER001 | DM | CYBER001-104-012 | YEARS |
Step 7: SDTM Compliance Checks
7.1 Variable Order
%macro reorder_dm(dataset=, out=);
data &out.;
retain STUDYID DOMAIN USUBJID SUBJID SITEID SEX AGE AGEU RACE COUNTRY ARM RFSTDTC;
set &dataset.;
run;
proc print;run;
%mend;
%reorder_dm(dataset=sdtm_dm, out=sdtm_dm_final);
Output:
| Obs | STUDYID | DOMAIN | USUBJID | SUBJID | SITEID | SEX | AGE | AGEU | RACE | COUNTRY | ARM | RFSTDTC |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | CYBER001 | DM | CYBER001-101-001 | 001 | 101 | M | 28 | YEARS | Asian | INDIA | CyberRelax | 2024-06-01 |
| 2 | CYBER001 | DM | CYBER001-101-002 | 002 | 101 | F | 34 | YEARS | White | USA | Placebo | 2024-06-03 |
| 3 | CYBER001 | DM | CYBER001-102-003 | 003 | 102 | M | 41 | YEARS | Black | USA | CyberRelax | 2024-06-05 |
| 4 | CYBER001 | DM | CYBER001-102-004 | 004 | 102 | F | 29 | YEARS | Asian | INDIA | CyberRelax | 2024-06-06 |
| 5 | CYBER001 | DM | CYBER001-101-005 | 005 | 101 | M | 36 | YEARS | White | USA | Placebo | 2024-06-07 |
| 6 | CYBER001 | DM | CYBER001-103-006 | 006 | 103 | F | 26 | YEARS | Other | GERMANY | CyberRelax | 2024-06-08 |
| 7 | CYBER001 | DM | CYBER001-103-007 | 007 | 103 | M | 33 | YEARS | Asian | INDIA | Placebo | 2024-06-09 |
| 8 | CYBER001 | DM | CYBER001-101-008 | 008 | 101 | F | 45 | YEARS | White | UK | CyberRelax | 2024-06-10 |
| 9 | CYBER001 | DM | CYBER001-102-009 | 009 | 102 | M | 38 | YEARS | Black | USA | Placebo | 2024-06-11 |
| 10 | CYBER001 | DM | CYBER001-103-010 | 010 | 103 | F | 30 | YEARS | Asian | INDIA | CyberRelax | 2024-06-12 |
| 11 | CYBER001 | DM | CYBER001-104-011 | 011 | 104 | M | 27 | YEARS | White | CANADA | Placebo | 2024-06-13 |
| 12 | CYBER001 | DM | CYBER001-104-012 | 012 | 104 | F | 31 | YEARS | Other | JAPAN | CyberRelax | 2024-06-14 |
7.2 Length Check
proc contents data=sdtm_dm_final;
run;
Output:
| Data Set Name | WORK.SDTM_DM_FINAL | Observations | 12 |
|---|---|---|---|
| Member Type | DATA | Variables | 12 |
| Engine | V9 | Indexes | 0 |
| Created | 14/09/2015 00:24:22 | Observation Length | 112 |
| Last Modified | 14/09/2015 00:24:22 | Deleted Observations | 0 |
| Protection | Compressed | NO | |
| Data Set Type | Sorted | NO | |
| Label | |||
| Data Representation | WINDOWS_64 | ||
| Encoding | wlatin1 Western (Windows) |
| Engine/Host Dependent Information | |
|---|---|
| Data Set Page Size | 65536 |
| Number of Data Set Pages | 1 |
| First Data Page | 1 |
| Max Obs per Page | 584 |
| Obs in First Data Page | 12 |
| Number of Data Set Repairs | 0 |
| ExtendObsCounter | YES |
| Filename | C:\Users\Lenovo\AppData\Local\Temp\SAS Temporary Files\_TD1524_DESKTOP-QFAA4KV_\sdtm_dm_final.sas7bdat |
| Release Created | 9.0401M2 |
| Host Created | X64_8HOME |
| Alphabetic List of Variables and Attributes | ||||
|---|---|---|---|---|
| # | Variable | Type | Len | Label |
| 7 | AGE | Num | 8 | Age |
| 8 | AGEU | Char | 5 | Age Units |
| 11 | ARM | Char | 15 | Description of Planned Arm |
| 10 | COUNTRY | Char | 10 | Country |
| 2 | DOMAIN | Char | 2 | Domain Abbreviation |
| 9 | RACE | Char | 20 | Race |
| 12 | RFSTDTC | Char | 10 | Subject Reference Start Date/Time |
| 6 | SEX | Char | 1 | Sex |
| 5 | SITEID | Char | 3 | Study Site Identifier |
| 1 | STUDYID | Char | 10 | Study Identifier |
| 4 | SUBJID | Char | 3 | Subject Identifier for the Study |
| 3 | USUBJID | Char | 25 | Unique Subject Identifier |
Step 8: Export the Dataset to XPORT Format
XPT is used for FDA submission.
libname outlib xport 'path/sdtm_dm_final.xpt';
data outlib.DM;
set sdtm_dm_final;
run;
Overall Summary:
1.Defined the goal: create a unique SDTM-compliant DM dataset.
2.Created raw input with realistic, cybersecurity-themed trial data.
3.Wrote a macro to automate transformation to SDTM format.
4.Executed and validated the macro.
5.Reordered variables for CDISC compliance.
6.Exported the dataset to an XPORT format.
No comments:
Post a Comment