From Raw Trial Records to ADaM Intelligence: Building High-Impact ADSL and ADAE Datasets for Clinical Analytics

Transforming DM, EX, and AE Domains into Submission-Ready ADaM Datasets

Introduction

In clinical trials, raw datasets collected from different domains are usually inconsistent, incomplete, and not directly suitable for statistical analysis. To perform accurate safety and efficacy analysis, the raw data must be transformed into standardized analysis-ready datasets following CDISC ADaM guidelines. In this project, the DM (Demographics), EX (Exposure), and AE (Adverse Events) datasets were used to create two important ADaM datasets known as ADSL and ADAE.

The ADSL dataset acts as the backbone of all analysis datasets because it contains one record per subject with important treatment and population-level information. Variables such as treatment start date, treatment end date, safety population flag, intent-to-treat flag, and death flag were derived using exposure and adverse event information. The ADAE dataset was developed to analyze adverse events occurring during the clinical trial. Important variables such as severity flags, serious adverse event flags, treatment emergent flags, and relationship flags were derived after cleaning and standardizing the AE domain.

This project demonstrates real-world clinical SAS programming concepts including data cleaning, date conversions, duplicate removal, merging datasets, derivation logic, standardization techniques, and analysis-ready dataset creation. It also reflects the practical workflow followed by SAS programmers in pharmaceutical and CRO industries during clinical trial submissions and regulatory reporting.

/*DM→ Demographics — Understanding DM Dataset*/

DATA dm;

LENGTH STUDYID $10 USUBJID $25 SEX $10 RACE $20

ARM $20 RFSTDTC $15 RFENDTC $15;

INFILE DATALINES DLM='|' DSD TRUNCOVER;

INPUT STUDYID :$10. USUBJID :$25. SITEID SEX :$10.

RACE :$20. ARM :$20. RFSTDTC :$15. RFENDTC :$15. AGE;

DATALINES;

ABC123|ABC123-101|1001|Male|Asian|Drug A|01JAN2025|31JAN2025|45

ABC123|ABC123-102|1001|FEMALE|White|PLACEBO|02/15/2025||52

ABC123|ABC123-103|1002|m|Black|drug b|01MAR2025|15MAR2025|38

ABC123|ABC123-104|1002|f|Asian|Drug A|04-01-2025||.

ABC123|ABC123-105|1003|MALE|White|Placebo|2025-05-01|2025-05-20|60

ABC123|ABC123-106|1003|Female|Asian|DRUG B|06/10/2025|06/25/2025|35

ABC123|ABC123-107|1004|M|Black|Drug A|07-01-2025||48

ABC123|ABC123-108|1004|F|Asian|placebo|2025-08-20|2025-08-30|29

ABC123|ABC123-109|1005|unknown|White|Drug B|09SEP2025|15SEP2025|55

ABC123|ABC123-110|1005|male|Asian|DRUG A|2025/10/01||42

ABC123|ABC123-111|1006| FEMALE |Black|Placebo|11NOV2025|15NOV2025|50

ABC123|ABC123-112|1006|m|White|Drug B|2025-12-01|2025-12-10|39

ABC123|ABC123-113|1007|Female|Asian|drug a|01/03/2025||33

ABC123|ABC123-114|1007|M|Black|PLACEBO|14FEB2025|18FEB2025|.

ABC123|ABC123-115|1008|f|White|Drug B|2025-03-15|2025-03-20|41

ABC123|ABC123-116|1008|Male|Asian|Drug A|04APR2025||58

ABC123|ABC123-117|1009|FEMALE|Black|Placebo|2025-05-20|2025-05-25|47

ABC123|ABC123-118|1009|m|Asian|drug b|06/06/2025||36

ABC123|ABC123-119|1010|Female|White|Drug A|07JUL2025|15JUL2025|44

ABC123|ABC123-120|1010|Male|Asian|PLACEBO|08AUG2025||49

ABC123|ABC123-101|1001|Male|Asian|Drug A|01JAN2025|31JAN2025|45

;

RUN;

PROC PRINT DATA=dm;

RUN;

OUTPUT:

ObsSTUDYIDUSUBJIDSEXRACEARMRFSTDTCRFENDTCSITEIDAGE
1ABC123ABC123-101MaleAsianDrug A01JAN202531JAN2025100145
2ABC123ABC123-102FEMALEWhitePLACEBO02/15/2025 100152
3ABC123ABC123-103mBlackdrug b01MAR202515MAR2025100238
4ABC123ABC123-104fAsianDrug A04-01-2025 1002.
5ABC123ABC123-105MALEWhitePlacebo2025-05-012025-05-20100360
6ABC123ABC123-106FemaleAsianDRUG B06/10/202506/25/2025100335
7ABC123ABC123-107MBlackDrug A07-01-2025 100448
8ABC123ABC123-108FAsianplacebo2025-08-202025-08-30100429
9ABC123ABC123-109unknownWhiteDrug B09SEP202515SEP2025100555
10ABC123ABC123-110maleAsianDRUG A2025/10/01 100542
11ABC123ABC123-111FEMALEBlackPlacebo11NOV202515NOV2025100650
12ABC123ABC123-112mWhiteDrug B2025-12-012025-12-10100639
13ABC123ABC123-113FemaleAsiandrug a01/03/2025 100733
14ABC123ABC123-114MBlackPLACEBO14FEB202518FEB20251007.
15ABC123ABC123-115fWhiteDrug B2025-03-152025-03-20100841
16ABC123ABC123-116MaleAsianDrug A04APR2025 100858
17ABC123ABC123-117FEMALEBlackPlacebo2025-05-202025-05-25100947
18ABC123ABC123-118mAsiandrug b06/06/2025 100936
19ABC123ABC123-119FemaleWhiteDrug A07JUL202515JUL2025101044
20ABC123ABC123-120MaleAsianPLACEBO08AUG2025 101049
21ABC123ABC123-101MaleAsianDrug A01JAN202531JAN2025100145

Explanation

This starts creation of the DM dataset.

DM stands for Demographics.

It contains:

  • Subject information
  • Sex
  • Race
  • Treatment Arm
  • Study start/end dates
  • Age

DM is considered the parent dataset because almost every analysis dataset links back to DM.

The LENGTH statement defines:

  • Variable type
  • Storage size

$ means character variable.

Example:

  • STUDYID $10 → Character variable with maximum length 10.

Why important?

  • Prevents truncation
  • Controls memory
  • Maintains consistency

This tells SAS how to read raw data.

Option

Purpose

DLM='

'

DSD

Handles missing values properly

TRUNCOVER

Prevents SAS from reading next line when values missing

Very common in raw clinical datasets.

Defines how variables are read.

: modifier allows flexible reading.

Example:

SEX :$10.

means:

  • Read character
  • Maximum width 10

About Important Variables

Variable

Meaning

STUDYID

Clinical study identifier

USUBJID

Unique subject ID

SITEID

Site number

SEX

Subject gender

RACE

Race category

ARM

Treatment arm

RFSTDTC

Reference Start Date

RFENDTC

Reference End Date

AGE

Subject age

Why Data Contains Issues

You intentionally included:

  • Duplicate records
  • Missing values
  • Different date formats
  • Inconsistent capitalization

This mimics real-world clinical data.

Examples:

Problem

Example

Duplicate subject

ABC123-101

Missing age

.

Different sex formats

Male, MALE, m

Different date formats

01JAN2025, 2025-01-01, 02/15/2025

This is excellent interview-level practice.

Displays dataset contents.

Used for:

  • QC checks
  • Validation
  • Debugging

/*EX→ Exposure — Understanding EX Dataset*/

DATA ex;

LENGTH USUBJID $25 EXSTDTC $15 EXENDTC $15;

INFILE DATALINES DLM='|' DSD TRUNCOVER;

INPUT USUBJID :$25. EXSTDTC :$15. EXENDTC :$15. EXDOSE;

DATALINES;

ABC123-101|01JAN2025|05JAN2025|100

ABC123-101|2025-01-10|2025-01-15|100

ABC123-102|02/15/2025||200

ABC123-102|2025-02-20|2025-02-25|200

ABC123-103|01MAR2025|05MAR2025|150

ABC123-103|2025/03/10|2025/03/15|150

ABC123-104|04-01-2025||120

ABC123-104|15APR2025|20APR2025|120

ABC123-105|2025-05-01|2025-05-10|100

ABC123-105|2025-05-15||100

ABC123-106|06/10/2025|06/15/2025|80

ABC123-106|06/20/2025|06/25/2025|80

ABC123-107|07-01-2025||90

ABC123-108|2025-08-20|2025-08-30|110

ABC123-109|09SEP2025|15SEP2025|130

ABC123-110|2025/10/01||140

ABC123-111|11NOV2025|15NOV2025|150

ABC123-112|2025-12-01|2025-12-10|160

ABC123-113|01/03/2025||170

ABC123-114|14FEB2025|18FEB2025|180

ABC123-115|2025-03-15|2025-03-20|190

ABC123-101|01JAN2025|05JAN2025|100

;

RUN;

PROC PRINT DATA=ex;

RUN;

OUTPUT:

ObsUSUBJIDEXSTDTCEXENDTCEXDOSE
1ABC123-10101JAN202505JAN2025100
2ABC123-1012025-01-102025-01-15100
3ABC123-10202/15/2025 200
4ABC123-1022025-02-202025-02-25200
5ABC123-10301MAR202505MAR2025150
6ABC123-1032025/03/102025/03/15150
7ABC123-10404-01-2025 120
8ABC123-10415APR202520APR2025120
9ABC123-1052025-05-012025-05-10100
10ABC123-1052025-05-15 100
11ABC123-10606/10/202506/15/202580
12ABC123-10606/20/202506/25/202580
13ABC123-10707-01-2025 90
14ABC123-1082025-08-202025-08-30110
15ABC123-10909SEP202515SEP2025130
16ABC123-1102025/10/01 140
17ABC123-11111NOV202515NOV2025150
18ABC123-1122025-12-012025-12-10160
19ABC123-11301/03/2025 170
20ABC123-11414FEB202518FEB2025180
21ABC123-1152025-03-152025-03-20190
22ABC123-10101JAN202505JAN2025100

Explanation

Creates Exposure dataset.

EX domain stores:

  • Drug administration information
  • Dose dates
  • Dose amount

Important Variables

Variable

Meaning

EXSTDTC

Exposure Start Date

EXENDTC

Exposure End Date

EXDOSE

Dose Amount

Why EX is Important

EX is used to derive:

Derived Variable

Meaning

TRTSDT

First treatment date

TRTEDT

Last treatment date

These are critical in:

  • Safety analysis
  • Efficacy analysis
  • Treatment emergent calculations

/*AE→ Adverse Events — Understanding AE Dataset*/

DATA ae;

LENGTH USUBJID $25 AETERM $30 AESTDTC $15 AEENDTC $15

AESEV $15 AESER $10 AEREL $20;

INFILE DATALINES DLM='|' DSD TRUNCOVER;

INPUT USUBJID :$25. AETERM :$30. AESTDTC :$15. AEENDTC :$15.

AESEV :$15. AESER :$10. AEREL :$20.;

DATALINES;

ABC123-101|Headache|01JAN2025|05JAN2025|Mild|Y|Related

ABC123-101|Fever|2025-01-20|2025-01-25|SEVERE|YES|RELATED

ABC123-102|Vomiting|02/10/2025||Moderate|N|Not Related

ABC123-102|Cold|2025-02-28|2025-03-02| mild |NO|Related

ABC123-103|Headache|01MAR2025|05MAR2025|Severe|Y|RELATED

ABC123-103|Fever|2025/03/20||MODERATE|N|Not Related

ABC123-104|Vomiting|03-25-2025|04-01-2025|Mild|No|Related

ABC123-104|Cold|15APR2025|| severe |YES|RELATED

ABC123-105|Headache|2025-04-20|2025-04-22|Moderate|N|Not Related

ABC123-105|Fever|2025-05-18||SEVERE|Y|Related

ABC123-106|Vomiting|06/05/2025|06/08/2025| mild |NO|RELATED

ABC123-106|Cold|06/25/2025||Moderate|YES|Not Related

ABC123-107|Headache|06-25-2025|07-02-2025|SEVERE|Y|Related

ABC123-108|Fever|2025-08-15||Mild|No|RELATED

ABC123-109|Vomiting|09SEP2025|12SEP2025| severe |YES|Related

ABC123-110|Cold|2025/09/25||Moderate|N|Not Related

ABC123-111|Headache|11NOV2025|15NOV2025|MILD|NO|RELATED

ABC123-112|Fever|2025-12-05||Severe|Y|Related

ABC123-113|Vomiting|01/02/2025|01/05/2025|Moderate|YES|Not Related

ABC123-114|Cold|10FEB2025|| mild |N|RELATED

ABC123-115|Headache|2025-03-10|2025-03-15|SEVERE|YES|Related

ABC123-101|Headache|01JAN2025|05JAN2025|Mild|Y|Related

;

RUN;

PROC PRINT DATA=ae;

RUN;

OUTPUT:

ObsUSUBJIDAETERMAESTDTCAEENDTCAESEVAESERAEREL
1ABC123-101Headache01JAN202505JAN2025MildYRelated
2ABC123-101Fever2025-01-202025-01-25SEVEREYESRELATED
3ABC123-102Vomiting02/10/2025 ModerateNNot Related
4ABC123-102Cold2025-02-282025-03-02mildNORelated
5ABC123-103Headache01MAR202505MAR2025SevereYRELATED
6ABC123-103Fever2025/03/20 MODERATENNot Related
7ABC123-104Vomiting03-25-202504-01-2025MildNoRelated
8ABC123-104Cold15APR2025 severeYESRELATED
9ABC123-105Headache2025-04-202025-04-22ModerateNNot Related
10ABC123-105Fever2025-05-18 SEVEREYRelated
11ABC123-106Vomiting06/05/202506/08/2025mildNORELATED
12ABC123-106Cold06/25/2025 ModerateYESNot Related
13ABC123-107Headache06-25-202507-02-2025SEVEREYRelated
14ABC123-108Fever2025-08-15 MildNoRELATED
15ABC123-109Vomiting09SEP202512SEP2025severeYESRelated
16ABC123-110Cold2025/09/25 ModerateNNot Related
17ABC123-111Headache11NOV202515NOV2025MILDNORELATED
18ABC123-112Fever2025-12-05 SevereYRelated
19ABC123-113Vomiting01/02/202501/05/2025ModerateYESNot Related
20ABC123-114Cold10FEB2025 mildNRELATED
21ABC123-115Headache2025-03-102025-03-15SEVEREYESRelated
22ABC123-101Headache01JAN202505JAN2025MildYRelated

Explanation

Creates Adverse Events dataset.

AE domain stores:

  • Medical events
  • Severity
  • Seriousness
  • Relationship to drug

Important Variables

Variable

Meaning

AETERM

Adverse Event Term

AESTDTC

AE Start Date

AEENDTC

AE End Date

AESEV

Severity

AESER

Serious Event Flag

AEREL

Relationship to treatment

Why AE is Critical

AE data is the most important safety domain.

Used in:

  • Regulatory submissions
  • Safety review
  • SAE reporting
  • CSR tables

Creating ADSL Dataset

ADSL = Subject-Level Analysis Dataset

ADSL is the backbone of all ADaM datasets.

/*Step 1: Prepare Exposure Dates*/

/*Derive first dose date (TRTSDT) and last dose date (TRTEDT) from EX.*/

PROC SORT DATA=EX;

BY USUBJID;

RUN;

PROC PRINT DATA=EX;

RUN;

OUTPUT:

ObsUSUBJIDEXSTDTCEXENDTCEXDOSE
1ABC123-10101JAN202505JAN2025100
2ABC123-1012025-01-102025-01-15100
3ABC123-10101JAN202505JAN2025100
4ABC123-10202/15/2025 200
5ABC123-1022025-02-202025-02-25200
6ABC123-10301MAR202505MAR2025150
7ABC123-1032025/03/102025/03/15150
8ABC123-10404-01-2025 120
9ABC123-10415APR202520APR2025120
10ABC123-1052025-05-012025-05-10100
11ABC123-1052025-05-15 100
12ABC123-10606/10/202506/15/202580
13ABC123-10606/20/202506/25/202580
14ABC123-10707-01-2025 90
15ABC123-1082025-08-202025-08-30110
16ABC123-10909SEP202515SEP2025130
17ABC123-1102025/10/01 140
18ABC123-11111NOV202515NOV2025150
19ABC123-1122025-12-012025-12-10160
20ABC123-11301/03/2025 170
21ABC123-11414FEB202518FEB2025180
22ABC123-1152025-03-152025-03-20190

Why?

Sorting is mandatory before:

  • BY-group processing
  • MERGE
  • FIRST./LAST. logic

Without sorting:

  • SAS gives errors
  • Merge becomes incorrect

/*Step 2: Convert Character Dates to SAS Dates*/

DATA EX_DATES;

SET EX;

EXSTDT = INPUT(EXSTDTC,ANYDTDTE15.);

EXENDT = INPUT(EXENDTC,ANYDTDTE15.);

FORMAT EXSTDT EXENDT DATE9.;

RUN;

PROC PRINT DATA=EX_DATES;

RUN;

OUTPUT:

ObsUSUBJIDEXSTDTCEXENDTCEXDOSEEXSTDTEXENDT
1ABC123-10101JAN202505JAN202510001JAN202505JAN2025
2ABC123-1012025-01-102025-01-1510010JAN202515JAN2025
3ABC123-10101JAN202505JAN202510001JAN202505JAN2025
4ABC123-10202/15/2025 20015FEB2025.
5ABC123-1022025-02-202025-02-2520020FEB202525FEB2025
6ABC123-10301MAR202505MAR202515001MAR202505MAR2025
7ABC123-1032025/03/102025/03/1515010MAR202515MAR2025
8ABC123-10404-01-2025 12001APR2025.
9ABC123-10415APR202520APR202512015APR202520APR2025
10ABC123-1052025-05-012025-05-1010001MAY202510MAY2025
11ABC123-1052025-05-15 10015MAY2025.
12ABC123-10606/10/202506/15/20258010JUN202515JUN2025
13ABC123-10606/20/202506/25/20258020JUN202525JUN2025
14ABC123-10707-01-2025 9001JUL2025.
15ABC123-1082025-08-202025-08-3011020AUG202530AUG2025
16ABC123-10909SEP202515SEP202513009SEP202515SEP2025
17ABC123-1102025/10/01 14001OCT2025.
18ABC123-11111NOV202515NOV202515011NOV202515NOV2025
19ABC123-1122025-12-012025-12-1016001DEC202510DEC2025
20ABC123-11301/03/2025 17003JAN2025.
21ABC123-11414FEB202518FEB202518014FEB202518FEB2025
22ABC123-1152025-03-152025-03-2019015MAR202520MAR2025

Explanation

Raw dates are character values.

Need conversion into numeric SAS dates.

INPUT Function

INPUT(variable, informat.)

Converts:

  • Character → Numeric

ANYDTDTE Informat

ANYDTDTE15.

Very powerful informat.

Reads multiple formats:

Raw Format

Reads Successfully

01JAN2025

Yes

2025-01-10

Yes

02/15/2025

Yes

2025/03/10

Yes

This is extremely common in clinical programming.

FORMAT Statement

FORMAT EXSTDT DATE9.;

Displays date like:

01JAN2025

instead of numeric values like:

23741

/*Step 3:Derive Treatment Dates*/

PROC SQL;

CREATE TABLE EX_SUMMARY AS

SELECT USUBJID,

                MIN(EXSTDT) AS TRTSDT FORMAT=DATE9.,

       MAX(EXENDT) AS TRTEDT FORMAT=DATE9.

FROM EX_DATES

GROUP BY USUBJID;

QUIT;

PROC PRINT DATA=EX_SUMMARY;

RUN;

OUTPUT:

ObsUSUBJIDTRTSDTTRTEDT
1ABC123-10101JAN202515JAN2025
2ABC123-10215FEB202525FEB2025
3ABC123-10301MAR202515MAR2025
4ABC123-10401APR202520APR2025
5ABC123-10501MAY202510MAY2025
6ABC123-10610JUN202525JUN2025
7ABC123-10701JUL2025.
8ABC123-10820AUG202530AUG2025
9ABC123-10909SEP202515SEP2025
10ABC123-11001OCT2025.
11ABC123-11111NOV202515NOV2025
12ABC123-11201DEC202510DEC2025
13ABC123-11303JAN2025.
14ABC123-11414FEB202518FEB2025
15ABC123-11515MAR202520MAR2025

Why MIN and MAX?

Variable

Logic

TRTSDT

First dose date

TRTEDT

Last dose date

GROUP BY

GROUP BY USUBJID

Performs subject-level summarization.

Without GROUP BY:

  • Whole dataset summarized together

Why PROC SQL?

PROC SQL is powerful for:

  • Aggregations
  • Joins
  • Summaries

Widely used in industry.

/*Step 4: Create Death Flag*/

/*Using AE dataset.*/

DATA AE_CLEAN;

SET AE;

AEOUT = UPCASE(STRIP(AETERM));

IF AEOUT IN ("DEATH","FATAL") THEN DTHFL = "Y";

ELSE DTHFL = "N";

RUN;

PROC PRINT DATA=AE_CLEAN;

RUN;

OUTPUT:

ObsUSUBJIDAETERMAESTDTCAEENDTCAESEVAESERAERELAEOUTDTHFL
1ABC123-101Headache01JAN202505JAN2025MildYRelatedHEADACHEN
2ABC123-101Fever2025-01-202025-01-25SEVEREYESRELATEDFEVERN
3ABC123-102Vomiting02/10/2025 ModerateNNot RelatedVOMITINGN
4ABC123-102Cold2025-02-282025-03-02mildNORelatedCOLDN
5ABC123-103Headache01MAR202505MAR2025SevereYRELATEDHEADACHEN
6ABC123-103Fever2025/03/20 MODERATENNot RelatedFEVERN
7ABC123-104Vomiting03-25-202504-01-2025MildNoRelatedVOMITINGN
8ABC123-104Cold15APR2025 severeYESRELATEDCOLDN
9ABC123-105Headache2025-04-202025-04-22ModerateNNot RelatedHEADACHEN
10ABC123-105Fever2025-05-18 SEVEREYRelatedFEVERN
11ABC123-106Vomiting06/05/202506/08/2025mildNORELATEDVOMITINGN
12ABC123-106Cold06/25/2025 ModerateYESNot RelatedCOLDN
13ABC123-107Headache06-25-202507-02-2025SEVEREYRelatedHEADACHEN
14ABC123-108Fever2025-08-15 MildNoRELATEDFEVERN
15ABC123-109Vomiting09SEP202512SEP2025severeYESRelatedVOMITINGN
16ABC123-110Cold2025/09/25 ModerateNNot RelatedCOLDN
17ABC123-111Headache11NOV202515NOV2025MILDNORELATEDHEADACHEN
18ABC123-112Fever2025-12-05 SevereYRelatedFEVERN
19ABC123-113Vomiting01/02/202501/05/2025ModerateYESNot RelatedVOMITINGN
20ABC123-114Cold10FEB2025 mildNRELATEDCOLDN
21ABC123-115Headache2025-03-102025-03-15SEVEREYESRelatedHEADACHEN
22ABC123-101Headache01JAN202505JAN2025MildYRelatedHEADACHEN

Explanation

STRIP Function

Removes leading/trailing blanks.

Example:

" FEMALE "

becomes

"FEMALE"

UPCASE Function

Converts to uppercase.

Important for standardization.

Why DTHFL?

DTHFL = Death Flag

Indicates:

  • Subject died during study

Regulatory importance:

  • Safety analysis
  • Survival analysis

/*Step 5:Subject-Level Death Flag*/

PROC SQL;

CREATE TABLE DEATH_FLAG AS

SELECT USUBJID,MAX(DTHFL) AS DTHFL

FROM AE_CLEAN

GROUP  BY USUBJID;

QUIT;

PROC PRINT DATA=DEATH_FLAG;

RUN;

OUTPUT:

ObsUSUBJIDDTHFL
1ABC123-101N
2ABC123-102N
3ABC123-103N
4ABC123-104N
5ABC123-105N
6ABC123-106N
7ABC123-107N
8ABC123-108N
9ABC123-109N
10ABC123-110N
11ABC123-111N
12ABC123-112N
13ABC123-113N
14ABC123-114N
15ABC123-115N

Why MAX(DTHFL)?

Character comparison:

  • "Y" > "N"

If any record has Y:

  • Subject gets Y

/*Step 6: Remove Duplicate Subjects from DM*/

PROC SORT DATA=DM NODUPKEY OUT=DM_CLEAN;

BY USUBJID;

RUN;

PROC PRINT DATA=DM_CLEAN;

RUN;

LOG:

NOTE: There were 21 observations read from the data set WORK.DM.
NOTE: 1 observations with duplicate key values were deleted.
NOTE: The data set WORK.DM_CLEAN has 20 observations and 9 variables.

OUTPUT:

ObsSTUDYIDUSUBJIDSEXRACEARMRFSTDTCRFENDTCSITEIDAGE
1ABC123ABC123-101MaleAsianDrug A01JAN202531JAN2025100145
2ABC123ABC123-102FEMALEWhitePLACEBO02/15/2025 100152
3ABC123ABC123-103mBlackdrug b01MAR202515MAR2025100238
4ABC123ABC123-104fAsianDrug A04-01-2025 1002.
5ABC123ABC123-105MALEWhitePlacebo2025-05-012025-05-20100360
6ABC123ABC123-106FemaleAsianDRUG B06/10/202506/25/2025100335
7ABC123ABC123-107MBlackDrug A07-01-2025 100448
8ABC123ABC123-108FAsianplacebo2025-08-202025-08-30100429
9ABC123ABC123-109unknownWhiteDrug B09SEP202515SEP2025100555
10ABC123ABC123-110maleAsianDRUG A2025/10/01 100542
11ABC123ABC123-111FEMALEBlackPlacebo11NOV202515NOV2025100650
12ABC123ABC123-112mWhiteDrug B2025-12-012025-12-10100639
13ABC123ABC123-113FemaleAsiandrug a01/03/2025 100733
14ABC123ABC123-114MBlackPLACEBO14FEB202518FEB20251007.
15ABC123ABC123-115fWhiteDrug B2025-03-152025-03-20100841
16ABC123ABC123-116MaleAsianDrug A04APR2025 100858
17ABC123ABC123-117FEMALEBlackPlacebo2025-05-202025-05-25100947
18ABC123ABC123-118mAsiandrug b06/06/2025 100936
19ABC123ABC123-119FemaleWhiteDrug A07JUL202515JUL2025101044
20ABC123ABC123-120MaleAsianPLACEBO08AUG2025 101049

Why NODUPKEY?

Removes duplicate keys.

Duplicate subject:

ABC123-101

appears twice.

Important because:

  • ADSL requires unique subject

Creating ADSL

PROC SORT DATA=EX_SUMMARY;

BY USUBJID;

RUN;

PROC PRINT DATA=EX_SUMMARY;

RUN;

OUTPUT:

ObsUSUBJIDTRTSDTTRTEDT
1ABC123-10101JAN202515JAN2025
2ABC123-10215FEB202525FEB2025
3ABC123-10301MAR202515MAR2025
4ABC123-10401APR202520APR2025
5ABC123-10501MAY202510MAY2025
6ABC123-10610JUN202525JUN2025
7ABC123-10701JUL2025.
8ABC123-10820AUG202530AUG2025
9ABC123-10909SEP202515SEP2025
10ABC123-11001OCT2025.
11ABC123-11111NOV202515NOV2025
12ABC123-11201DEC202510DEC2025
13ABC123-11303JAN2025.
14ABC123-11414FEB202518FEB2025
15ABC123-11515MAR202520MAR2025

PROC SORT DATA=DEATH_FLAG;

BY USUBJID;

RUN;

PROC PRINT DATA=DEATH_FLAG;

RUN;

OUTPUT:

ObsUSUBJIDDTHFL
1ABC123-101N
2ABC123-102N
3ABC123-103N
4ABC123-104N
5ABC123-105N
6ABC123-106N
7ABC123-107N
8ABC123-108N
9ABC123-109N
10ABC123-110N
11ABC123-111N
12ABC123-112N
13ABC123-113N
14ABC123-114N
15ABC123-115N

/*Step 7 :Merge Datasets*/

DATA ADSL;

MERGE DM_CLEAN

                EX_SUMMARY

        DEATH_FLAG;

BY USUBJID;

length TRT01A $20 SAFFL $1 ITTFL $1 EOSFL $1 DTHFL $1;

/* SEX Standardization */     

SEX = UPCASE(STRIP(SEX));

IF SEX IN ("MALE","M") THEN SEX = "M";

ELSE IF SEX IN ("FEMALE","F") THEN SEX = "F";

ELSE SEX = "U";

/* RACE */ 

RACE = UPCASE(STRIP(RACE));

/* Actual Treatment */

TRT01A = UPCASE(STRIP(ARM));

/* SAFFL */  

IF NOT MISSING(TRTSDT) THEN SAFFL = "Y";

ELSE SAFFL = "N";

/* ITTFL */ 

IF NOT MISSING(TRT01A) THEN ITTFL = "Y";

ELSE ITTFL = "N";

/* EOSFL */

IF NOT MISSING(RFENDTC) THEN EOSFL = "Y";

ELSE EOSFL = "N";

/* DTHFL*/

IF MISSING(DTHFL) THEN DTHFL = "N";

KEEP STUDYID USUBJID  SITEID SEX RACE TRT01A

     TRTSDT TRTEDT AGE SAFFL ITTFL EOSFL DTHFL;

RUN;

PROC PRINT DATA=ADSL;

RUN;

OUTPUT:

ObsSTUDYIDUSUBJIDSEXRACESITEIDAGETRTSDTTRTEDTDTHFLTRT01ASAFFLITTFLEOSFL
1ABC123ABC123-101MASIAN10014501JAN202515JAN2025NDRUG AYYY
2ABC123ABC123-102FWHITE10015215FEB202525FEB2025NPLACEBOYYN
3ABC123ABC123-103MBLACK10023801MAR202515MAR2025NDRUG BYYY
4ABC123ABC123-104FASIAN1002.01APR202520APR2025NDRUG AYYN
5ABC123ABC123-105MWHITE10036001MAY202510MAY2025NPLACEBOYYY
6ABC123ABC123-106FASIAN10033510JUN202525JUN2025NDRUG BYYY
7ABC123ABC123-107MBLACK10044801JUL2025.NDRUG AYYN
8ABC123ABC123-108FASIAN10042920AUG202530AUG2025NPLACEBOYYY
9ABC123ABC123-109UWHITE10055509SEP202515SEP2025NDRUG BYYY
10ABC123ABC123-110MASIAN10054201OCT2025.NDRUG AYYN
11ABC123ABC123-111FBLACK10065011NOV202515NOV2025NPLACEBOYYY
12ABC123ABC123-112MWHITE10063901DEC202510DEC2025NDRUG BYYY
13ABC123ABC123-113FASIAN10073303JAN2025.NDRUG AYYN
14ABC123ABC123-114MBLACK1007.14FEB202518FEB2025NPLACEBOYYY
15ABC123ABC123-115FWHITE10084115MAR202520MAR2025NDRUG BYYY
16ABC123ABC123-116MASIAN100858..NDRUG ANYN
17ABC123ABC123-117FBLACK100947..NPLACEBONYY
18ABC123ABC123-118MASIAN100936..NDRUG BNYN
19ABC123ABC123-119FWHITE101044..NDRUG ANYY
20ABC123ABC123-120MASIAN101049..NPLACEBONYN

Why MERGE?

Combines:

  • Demographics
  • Exposure summary
  • Death information

using:

BY USUBJID

Why?

Raw data inconsistent:

Raw

Standard

Male

M

male

M

FEMALE

F

unknown

U

Standardization is mandatory.

Actual treatment assigned.

Examples:

  • DRUG A
  • PLACEBO
  • DRUG B

Safety Population Flag.

Meaning:

  • Subject received treatment

Intent-to-Treat Population.

Used in efficacy analyses.

End of Study Flag.

Keeps only required variables.

Benefits:

  • Reduces dataset size
  • Cleaner output
  • Better performance

Creating ADAE Dataset

ADAE = Analysis Dataset for Adverse Events.

/*Step 1: Clean and Prepare AE*/

DATA AE_CLEAN;

SET AE;

LENGTH PARAMCD $10 PARAM $30 ASEV $15 ASER $1    

AREL $20;

/* CONVERT DATES */

ASTDT = INPUT(AESTDTC, ANYDTDTE15.);

AENDT = INPUT(AEENDTC, ANYDTDTE15.);

FORMAT ASTDT AENDT DATE9.;

/* STANDARDIZE SEVERITY */

ASEV = UPCASE(STRIP(AESEV));

/* SERIOUS FLAG */

ASER = UPCASE(STRIP(AESER));

IF ASER IN ("YES","Y") THEN ASER = "Y";

ELSE ASER = "N";

/* RELATIONSHIP */

AREL = UPCASE(STRIP(AEREL));

/* PARAMCD / PARAM MAPPING */

AETERM = UPCASE(STRIP(AETERM));

IF AETERM = "HEADACHE" THEN DO;

  PARAMCD = "HEAD";

     PARAM   = "HEADACHE";

END;

ELSE IF AETERM = "FEVER" THEN DO;

             PARAMCD = "PYRX";

                PARAM   = "PYREXIA";

END;

ELSE IF AETERM = "VOMITING" THEN DO;

             PARAMCD = "VOM";

                 PARAM   = "VOMITING";

END;

ELSE IF AETERM = "COLD" THEN DO;

             PARAMCD = "COLD";

                 PARAM   = "COMMON COLD";

END;

ELSE DO;

PARAMCD = "OTHER";

PARAM   = "OTHER";

END;

RUN;

PROC PRINT DATA=AE_CLEAN;

RUN;

OUTPUT:

ObsUSUBJIDAETERMAESTDTCAEENDTCAESEVAESERAERELPARAMCDPARAMASEVASERARELASTDTAENDT
1ABC123-101HEADACHE01JAN202505JAN2025MildYRelatedHEADHEADACHEMILDYRELATED01JAN202505JAN2025
2ABC123-101FEVER2025-01-202025-01-25SEVEREYESRELATEDPYRXPYREXIASEVEREYRELATED20JAN202525JAN2025
3ABC123-102VOMITING02/10/2025 ModerateNNot RelatedVOMVOMITINGMODERATENNOT RELATED10FEB2025.
4ABC123-102COLD2025-02-282025-03-02mildNORelatedCOLDCOMMON COLDMILDNRELATED28FEB202502MAR2025
5ABC123-103HEADACHE01MAR202505MAR2025SevereYRELATEDHEADHEADACHESEVEREYRELATED01MAR202505MAR2025
6ABC123-103FEVER2025/03/20 MODERATENNot RelatedPYRXPYREXIAMODERATENNOT RELATED20MAR2025.
7ABC123-104VOMITING03-25-202504-01-2025MildNoRelatedVOMVOMITINGMILDNRELATED25MAR202501APR2025
8ABC123-104COLD15APR2025 severeYESRELATEDCOLDCOMMON COLDSEVEREYRELATED15APR2025.
9ABC123-105HEADACHE2025-04-202025-04-22ModerateNNot RelatedHEADHEADACHEMODERATENNOT RELATED20APR202522APR2025
10ABC123-105FEVER2025-05-18 SEVEREYRelatedPYRXPYREXIASEVEREYRELATED18MAY2025.
11ABC123-106VOMITING06/05/202506/08/2025mildNORELATEDVOMVOMITINGMILDNRELATED05JUN202508JUN2025
12ABC123-106COLD06/25/2025 ModerateYESNot RelatedCOLDCOMMON COLDMODERATEYNOT RELATED25JUN2025.
13ABC123-107HEADACHE06-25-202507-02-2025SEVEREYRelatedHEADHEADACHESEVEREYRELATED25JUN202502JUL2025
14ABC123-108FEVER2025-08-15 MildNoRELATEDPYRXPYREXIAMILDNRELATED15AUG2025.
15ABC123-109VOMITING09SEP202512SEP2025severeYESRelatedVOMVOMITINGSEVEREYRELATED09SEP202512SEP2025
16ABC123-110COLD2025/09/25 ModerateNNot RelatedCOLDCOMMON COLDMODERATENNOT RELATED25SEP2025.
17ABC123-111HEADACHE11NOV202515NOV2025MILDNORELATEDHEADHEADACHEMILDNRELATED11NOV202515NOV2025
18ABC123-112FEVER2025-12-05 SevereYRelatedPYRXPYREXIASEVEREYRELATED05DEC2025.
19ABC123-113VOMITING01/02/202501/05/2025ModerateYESNot RelatedVOMVOMITINGMODERATEYNOT RELATED02JAN202505JAN2025
20ABC123-114COLD10FEB2025 mildNRELATEDCOLDCOMMON COLDMILDNRELATED10FEB2025.
21ABC123-115HEADACHE2025-03-102025-03-15SEVEREYESRelatedHEADHEADACHESEVEREYRELATED10MAR202515MAR2025
22ABC123-101HEADACHE01JAN202505JAN2025MildYRelatedHEADHEADACHEMILDYRELATED01JAN202505JAN2025

Explanation

Converts AE dates into SAS dates.

Examples:

Raw

Standard

mild

MILD

severe

SEVERE

Standardizes:

  • YES
  • Y
  • NO
  • N

Makes relationship values consistent.

Why PARAMCD?

ADaM standards require:

  • Short coded parameter
  • Readable parameter name

Used in:

  • Tables
  • Listings
  • Analysis

/*Step 2: Remove Duplicates*/

/*Based on:*/

/*USUBJID + AETERM + ASTDT*/

PROC SORT DATA=AE_CLEAN NODUPKEY;

BY USUBJID AETERM ASTDT;

RUN;

PROC PRINT DATA=AE_CLEAN;

RUN;

OUTPUT:

ObsUSUBJIDAETERMAESTDTCAEENDTCAESEVAESERAERELPARAMCDPARAMASEVASERARELASTDTAENDT
1ABC123-101FEVER2025-01-202025-01-25SEVEREYESRELATEDPYRXPYREXIASEVEREYRELATED20JAN202525JAN2025
2ABC123-101HEADACHE01JAN202505JAN2025MildYRelatedHEADHEADACHEMILDYRELATED01JAN202505JAN2025
3ABC123-102COLD2025-02-282025-03-02mildNORelatedCOLDCOMMON COLDMILDNRELATED28FEB202502MAR2025
4ABC123-102VOMITING02/10/2025 ModerateNNot RelatedVOMVOMITINGMODERATENNOT RELATED10FEB2025.
5ABC123-103FEVER2025/03/20 MODERATENNot RelatedPYRXPYREXIAMODERATENNOT RELATED20MAR2025.
6ABC123-103HEADACHE01MAR202505MAR2025SevereYRELATEDHEADHEADACHESEVEREYRELATED01MAR202505MAR2025
7ABC123-104COLD15APR2025 severeYESRELATEDCOLDCOMMON COLDSEVEREYRELATED15APR2025.
8ABC123-104VOMITING03-25-202504-01-2025MildNoRelatedVOMVOMITINGMILDNRELATED25MAR202501APR2025
9ABC123-105FEVER2025-05-18 SEVEREYRelatedPYRXPYREXIASEVEREYRELATED18MAY2025.
10ABC123-105HEADACHE2025-04-202025-04-22ModerateNNot RelatedHEADHEADACHEMODERATENNOT RELATED20APR202522APR2025
11ABC123-106COLD06/25/2025 ModerateYESNot RelatedCOLDCOMMON COLDMODERATEYNOT RELATED25JUN2025.
12ABC123-106VOMITING06/05/202506/08/2025mildNORELATEDVOMVOMITINGMILDNRELATED05JUN202508JUN2025
13ABC123-107HEADACHE06-25-202507-02-2025SEVEREYRelatedHEADHEADACHESEVEREYRELATED25JUN202502JUL2025
14ABC123-108FEVER2025-08-15 MildNoRELATEDPYRXPYREXIAMILDNRELATED15AUG2025.
15ABC123-109VOMITING09SEP202512SEP2025severeYESRelatedVOMVOMITINGSEVEREYRELATED09SEP202512SEP2025
16ABC123-110COLD2025/09/25 ModerateNNot RelatedCOLDCOMMON COLDMODERATENNOT RELATED25SEP2025.
17ABC123-111HEADACHE11NOV202515NOV2025MILDNORELATEDHEADHEADACHEMILDNRELATED11NOV202515NOV2025
18ABC123-112FEVER2025-12-05 SevereYRelatedPYRXPYREXIASEVEREYRELATED05DEC2025.
19ABC123-113VOMITING01/02/202501/05/2025ModerateYESNot RelatedVOMVOMITINGMODERATEYNOT RELATED02JAN202505JAN2025
20ABC123-114COLD10FEB2025 mildNRELATEDCOLDCOMMON COLDMILDNRELATED10FEB2025.
21ABC123-115HEADACHE2025-03-102025-03-15SEVEREYESRelatedHEADHEADACHESEVEREYRELATED10MAR202515MAR2025

Why These Keys?

Uniqueness based on:

  • Subject
  • Event
  • Start Date

Industry-standard duplicate handling.

/*STEP 3: MERGE TRTSDT FROM ADSL*/

PROC SORT DATA=ADSL;

BY USUBJID;

RUN;

PROC PRINT DATA=ADSL;

RUN;

OUTPUT:

ObsSTUDYIDUSUBJIDSEXRACESITEIDAGETRTSDTTRTEDTDTHFLTRT01ASAFFLITTFLEOSFL
1ABC123ABC123-101MASIAN10014501JAN202515JAN2025NDRUG AYYY
2ABC123ABC123-102FWHITE10015215FEB202525FEB2025NPLACEBOYYN
3ABC123ABC123-103MBLACK10023801MAR202515MAR2025NDRUG BYYY
4ABC123ABC123-104FASIAN1002.01APR202520APR2025NDRUG AYYN
5ABC123ABC123-105MWHITE10036001MAY202510MAY2025NPLACEBOYYY
6ABC123ABC123-106FASIAN10033510JUN202525JUN2025NDRUG BYYY
7ABC123ABC123-107MBLACK10044801JUL2025.NDRUG AYYN
8ABC123ABC123-108FASIAN10042920AUG202530AUG2025NPLACEBOYYY
9ABC123ABC123-109UWHITE10055509SEP202515SEP2025NDRUG BYYY
10ABC123ABC123-110MASIAN10054201OCT2025.NDRUG AYYN
11ABC123ABC123-111FBLACK10065011NOV202515NOV2025NPLACEBOYYY
12ABC123ABC123-112MWHITE10063901DEC202510DEC2025NDRUG BYYY
13ABC123ABC123-113FASIAN10073303JAN2025.NDRUG AYYN
14ABC123ABC123-114MBLACK1007.14FEB202518FEB2025NPLACEBOYYY
15ABC123ABC123-115FWHITE10084115MAR202520MAR2025NDRUG BYYY
16ABC123ABC123-116MASIAN100858..NDRUG ANYN
17ABC123ABC123-117FBLACK100947..NPLACEBONYY
18ABC123ABC123-118MASIAN100936..NDRUG BNYN
19ABC123ABC123-119FWHITE101044..NDRUG ANYY
20ABC123ABC123-120MASIAN101049..NPLACEBONYN

DATA ADAE_PRE;

MERGE AE_CLEAN(IN=A)

      ADSL(KEEP=USUBJID TRTSDT);

BY USUBJID;

IF A;

RUN;

PROC PRINT DATA=ADAE_PRE;;

RUN;

OUTPUT:

ObsUSUBJIDAETERMAESTDTCAEENDTCAESEVAESERAERELPARAMCDPARAMASEVASERARELASTDTAENDTTRTSDT
1ABC123-101FEVER2025-01-202025-01-25SEVEREYESRELATEDPYRXPYREXIASEVEREYRELATED20JAN202525JAN202501JAN2025
2ABC123-101HEADACHE01JAN202505JAN2025MildYRelatedHEADHEADACHEMILDYRELATED01JAN202505JAN202501JAN2025
3ABC123-102COLD2025-02-282025-03-02mildNORelatedCOLDCOMMON COLDMILDNRELATED28FEB202502MAR202515FEB2025
4ABC123-102VOMITING02/10/2025 ModerateNNot RelatedVOMVOMITINGMODERATENNOT RELATED10FEB2025.15FEB2025
5ABC123-103FEVER2025/03/20 MODERATENNot RelatedPYRXPYREXIAMODERATENNOT RELATED20MAR2025.01MAR2025
6ABC123-103HEADACHE01MAR202505MAR2025SevereYRELATEDHEADHEADACHESEVEREYRELATED01MAR202505MAR202501MAR2025
7ABC123-104COLD15APR2025 severeYESRELATEDCOLDCOMMON COLDSEVEREYRELATED15APR2025.01APR2025
8ABC123-104VOMITING03-25-202504-01-2025MildNoRelatedVOMVOMITINGMILDNRELATED25MAR202501APR202501APR2025
9ABC123-105FEVER2025-05-18 SEVEREYRelatedPYRXPYREXIASEVEREYRELATED18MAY2025.01MAY2025
10ABC123-105HEADACHE2025-04-202025-04-22ModerateNNot RelatedHEADHEADACHEMODERATENNOT RELATED20APR202522APR202501MAY2025
11ABC123-106COLD06/25/2025 ModerateYESNot RelatedCOLDCOMMON COLDMODERATEYNOT RELATED25JUN2025.10JUN2025
12ABC123-106VOMITING06/05/202506/08/2025mildNORELATEDVOMVOMITINGMILDNRELATED05JUN202508JUN202510JUN2025
13ABC123-107HEADACHE06-25-202507-02-2025SEVEREYRelatedHEADHEADACHESEVEREYRELATED25JUN202502JUL202501JUL2025
14ABC123-108FEVER2025-08-15 MildNoRELATEDPYRXPYREXIAMILDNRELATED15AUG2025.20AUG2025
15ABC123-109VOMITING09SEP202512SEP2025severeYESRelatedVOMVOMITINGSEVEREYRELATED09SEP202512SEP202509SEP2025
16ABC123-110COLD2025/09/25 ModerateNNot RelatedCOLDCOMMON COLDMODERATENNOT RELATED25SEP2025.01OCT2025
17ABC123-111HEADACHE11NOV202515NOV2025MILDNORELATEDHEADHEADACHEMILDNRELATED11NOV202515NOV202511NOV2025
18ABC123-112FEVER2025-12-05 SevereYRelatedPYRXPYREXIASEVEREYRELATED05DEC2025.01DEC2025
19ABC123-113VOMITING01/02/202501/05/2025ModerateYESNot RelatedVOMVOMITINGMODERATEYNOT RELATED02JAN202505JAN202503JAN2025
20ABC123-114COLD10FEB2025 mildNRELATEDCOLDCOMMON COLDMILDNRELATED10FEB2025.14FEB2025
21ABC123-115HEADACHE2025-03-102025-03-15SEVEREYESRelatedHEADHEADACHESEVEREYRELATED10MAR202515MAR202515MAR2025

Why Merge TRTSDT?

Needed to derive:

TRTEMFL

Treatment Emergent Flag.

IN= Dataset Option

(in=a)

Creates temporary variable.

Used to keep only AE records.

/*STEP 4: CREATE ADAE*/

DATA ADAE;

SET ADAE_PRE;

LENGTH STUDYID $10 TRTEMFL $1 SAEFL $1 SEVFL $1 RELFL $1;

/* STUDYID */

STUDYID = "ABC123";

/* AESEQ */

BY USUBJID;

RETAIN AESEQ;

IF FIRST.USUBJID THEN AESEQ = 1;

ELSE AESEQ + 1;

/* TRTEMFL */

IF ASTDT >= TRTSDT THEN TRTEMFL = "Y";

ELSE TRTEMFL = "N";

/* SAEFL */

IF ASER = "Y" THEN SAEFL = "Y";

ELSE SAEFL = "N";

/* SEVFL */

IF ASEV = "SEVERE" THEN SEVFL = "Y";

ELSE SEVFL = "N";

/* RELFL */

IF AREL = "RELATED" THEN RELFL = "Y";

ELSE RELFL = "N";

KEEP STUDYID USUBJID AESEQ PARAMCD PARAM ASTDT AENDT 

TRTEMFL ASEV ASER AREL SAEFL SEVFL RELFL;

RUN;

PROC PRINT DATA=ADAE;

RUN;

OUTPUT:

ObsUSUBJIDPARAMCDPARAMASEVASERARELASTDTAENDTSTUDYIDTRTEMFLSAEFLSEVFLRELFLAESEQ
1ABC123-101PYRXPYREXIASEVEREYRELATED20JAN202525JAN2025ABC123YYYY1
2ABC123-101HEADHEADACHEMILDYRELATED01JAN202505JAN2025ABC123YYNY2
3ABC123-102COLDCOMMON COLDMILDNRELATED28FEB202502MAR2025ABC123YNNY1
4ABC123-102VOMVOMITINGMODERATENNOT RELATED10FEB2025.ABC123NNNN2
5ABC123-103PYRXPYREXIAMODERATENNOT RELATED20MAR2025.ABC123YNNN1
6ABC123-103HEADHEADACHESEVEREYRELATED01MAR202505MAR2025ABC123YYYY2
7ABC123-104COLDCOMMON COLDSEVEREYRELATED15APR2025.ABC123YYYY1
8ABC123-104VOMVOMITINGMILDNRELATED25MAR202501APR2025ABC123NNNY2
9ABC123-105PYRXPYREXIASEVEREYRELATED18MAY2025.ABC123YYYY1
10ABC123-105HEADHEADACHEMODERATENNOT RELATED20APR202522APR2025ABC123NNNN2
11ABC123-106COLDCOMMON COLDMODERATEYNOT RELATED25JUN2025.ABC123YYNN1
12ABC123-106VOMVOMITINGMILDNRELATED05JUN202508JUN2025ABC123NNNY2
13ABC123-107HEADHEADACHESEVEREYRELATED25JUN202502JUL2025ABC123NYYY1
14ABC123-108PYRXPYREXIAMILDNRELATED15AUG2025.ABC123NNNY1
15ABC123-109VOMVOMITINGSEVEREYRELATED09SEP202512SEP2025ABC123YYYY1
16ABC123-110COLDCOMMON COLDMODERATENNOT RELATED25SEP2025.ABC123NNNN1
17ABC123-111HEADHEADACHEMILDNRELATED11NOV202515NOV2025ABC123YNNY1
18ABC123-112PYRXPYREXIASEVEREYRELATED05DEC2025.ABC123YYYY1
19ABC123-113VOMVOMITINGMODERATEYNOT RELATED02JAN202505JAN2025ABC123NYNN1
20ABC123-114COLDCOMMON COLDMILDNRELATED10FEB2025.ABC123NNNY1
21ABC123-115HEADHEADACHESEVEREYRELATED10MAR202515MAR2025ABC123NYYY1
Expalnation

Prevents variable reset.

Without RETAIN:

  • AESEQ resets every row

Automatic BY-group variable.

Used after sorting.

Indicates:

  • First record for subject

Treatment Emergent AE.

Very important safety variable.

Serious AE flag.

Severe AE flag.

Related-to-treatment flag.

Final ADAE Variables

Variable

Purpose

AESEQ

Sequence number

PARAMCD

Short code

PARAM

Parameter name

TRTEMFL

Treatment emergent

SAEFL

Serious event

SEVFL

Severe event

RELFL

Related event

/*STEP 5: CREATE SERIOUS AE DATASET*/

DATA ADAE_SERIOUS;

SET ADAE;

WHERE SAEFL = "Y";

RUN;

PROC PRINT DATA=ADAE_SERIOUS;

RUN;

OUTPUT:

ObsUSUBJIDPARAMCDPARAMASEVASERARELASTDTAENDTSTUDYIDTRTEMFLSAEFLSEVFLRELFLAESEQ
1ABC123-101PYRXPYREXIASEVEREYRELATED20JAN202525JAN2025ABC123YYYY1
2ABC123-101HEADHEADACHEMILDYRELATED01JAN202505JAN2025ABC123YYNY2
3ABC123-103HEADHEADACHESEVEREYRELATED01MAR202505MAR2025ABC123YYYY2
4ABC123-104COLDCOMMON COLDSEVEREYRELATED15APR2025.ABC123YYYY1
5ABC123-105PYRXPYREXIASEVEREYRELATED18MAY2025.ABC123YYYY1
6ABC123-106COLDCOMMON COLDMODERATEYNOT RELATED25JUN2025.ABC123YYNN1
7ABC123-107HEADHEADACHESEVEREYRELATED25JUN202502JUL2025ABC123NYYY1
8ABC123-109VOMVOMITINGSEVEREYRELATED09SEP202512SEP2025ABC123YYYY1
9ABC123-112PYRXPYREXIASEVEREYRELATED05DEC2025.ABC123YYYY1
10ABC123-113VOMVOMITINGMODERATEYNOT RELATED02JAN202505JAN2025ABC123NYNN1
11ABC123-115HEADHEADACHESEVEREYRELATED10MAR202515MAR2025ABC123NYYY1

Why?

Creates subset for:

  • Serious AE reporting
  • Safety review
  • Regulatory submission

Overall Clinical Trial Flow

DM  ---> Subject Information

EX  ---> Treatment Exposure

AE  ---> Adverse Events

 

        

 

ADSL ---> Subject-Level Analysis Dataset

 

       

 

ADAE ---> Adverse Event Analysis Dataset

Real Industry Importance

This project demonstrates:

  • Raw data cleaning
  • Date conversion
  • Standardization
  • Deduplication
  • Merge logic
  • BY-group processing
  • ADaM derivations
  • Clinical programming workflow

These are core interview topics for:

  • SAS Programmer
  • Clinical SAS Programmer
  • Statistical Programmer
  • ADaM Programmer

Key Interview Concepts Covered

Topic

Included

PROC SQL

Yes

MERGE

Yes

BY-group processing

Yes

RETAIN

Yes

FIRST./LAST.

Yes

INPUT function

Yes

ANYDTDTE informat

Yes

Deduplication

Yes

Standardization

Yes

ADaM derivation

Yes

Treatment Emergent logic

Yes

Population flags

Yes

20 Important Points About ADSL and ADAE Creation

1.     DM dataset is considered the parent dataset in clinical trials.

2.     EX dataset is used to derive treatment start and end dates.

3.     AE dataset stores adverse event information for safety analysis.

4.     ADSL contains one unique record per subject.

5.     Duplicate subjects must be removed before creating ADSL.

6.     ANYDTDTE informat helps convert multiple date formats into SAS dates.

7.     TRTSDT represents the first treatment exposure date.

8.     TRTEDT represents the last treatment exposure date.

9.     Population flags such as SAFFL and ITTFL are critical in analysis.

10.  SEX values must be standardized for consistency.

11.  UPCASE and STRIP functions are commonly used in data cleaning.

12.  PROC SQL helps summarize exposure information efficiently.

13.  MERGE statement combines datasets using common keys.

14.  ADAE is an analysis-ready adverse events dataset.

15.  Treatment Emergent Flag identifies events occurring after treatment start.

16.  Serious adverse events are identified using SAEFL.

17.  Severity levels are standardized into consistent categories.

18.  PARAMCD and PARAM are important ADaM variables for reporting.

19.  BY-group processing is essential for sequence derivation.

20.  ADSL and ADAE datasets are widely used for regulatory submissions and CSR reporting.

Summary

This clinical SAS programming project focused on transforming raw clinical trial datasets into analysis-ready ADaM datasets using SAS programming techniques. The project used three major domains: DM, EX, and AE. The DM dataset provided subject-level demographic information, the EX dataset contained treatment exposure details, and the AE dataset stored adverse event information related to subject safety.

The first part of the project involved creating the ADSL dataset. Important derivations such as TRTSDT, TRTEDT, SAFFL, ITTFL, EOSFL, and DTHFL were developed using exposure and adverse event data. Data cleaning techniques such as standardization, duplicate removal, and date conversion were performed to ensure data quality and consistency.

The second part focused on creating the ADAE dataset. Adverse event dates were converted into SAS date formats, severity and relationship variables were standardized, and treatment emergent logic was derived using treatment start dates from ADSL. Additional flags such as SAEFL, SEVFL, and RELFL were created for safety reporting purposes.

Overall, the project demonstrated practical clinical trial programming concepts widely used in pharmaceutical industries. It also showcased important SAS skills such as PROC SQL, MERGE processing, BY-group logic, RETAIN statements, and ADaM dataset derivations used in real-world regulatory submissions.

Conclusion

The creation of ADSL and ADAE datasets is one of the most important activities in clinical SAS programming because these datasets support statistical analysis, safety evaluation, and regulatory reporting. In this project, raw clinical trial datasets containing inconsistent formats, missing values, and duplicate records were successfully transformed into standardized ADaM datasets using SAS programming techniques.

The project highlighted the importance of data cleaning, date standardization, deduplication, and derivation logic in clinical trials. By deriving subject-level variables such as treatment start date, treatment end date, and population flags in ADSL, the dataset became suitable for subject-level analysis and reporting. Similarly, the ADAE dataset enabled detailed adverse event analysis through derivation of treatment emergent flags, serious adverse event flags, severity flags, and relationship indicators.

This project also demonstrated practical implementation of industry-standard SAS concepts including PROC SQL, BY-group processing, RETAIN statements, INPUT functions, and dataset merging. These techniques are commonly used by SAS programmers working in pharmaceutical companies and CRO environments.

In conclusion, the project provides strong hands-on understanding of the clinical data flow from raw datasets to analysis-ready ADaM datasets. It serves as an excellent example for interview preparation, real-time project learning, and understanding the workflow followed during clinical trial analysis and submission processes.

--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

About the Author:

SAS Learning Hub is a data analytics and SAS programming platform focused on clinical, financial, and real-world data analysis. The content is created by professionals with academic training in Pharmaceutics and hands-on experience in Base SAS, PROC SQL, Macros, SDTM, and ADaM, providing practical and industry-relevant SAS learning resources.


Disclaimer:

The datasets and analysis in this article are created for educational and demonstration purposes only. They do not represent REAL ADAM DATA.


Our Mission:

This blog provides industry-focused SAS programming tutorials and analytics projects covering finance, healthcare, and technology.


This project is suitable for:

·  Students learning SAS

·  Data analysts building portfolios

·  Professionals preparing for SAS interviews

·  Bloggers writing about analytics and Exams Reviewers and Observers


--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Follow Us On : 


 
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

--->Follow our blog for more SAS-based analytics projects and industry data models.

---> Support Us By Following Our Blog..

To deepen your understanding of SAS analytics, please refer to our other data science and industry-focused projects listed below:

1.Can Advanced SAS Programming Detect, Analyze, and Fix Errors in High-Frequency Trading Data While Identifying Fraud Patterns?

2.How Do SAS and R Complement Each Other in Detecting, Cleaning, and Transforming Complex Sensor Fusion Vehicle Data?

--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

About Us | Contact | Privacy Policy

Comments

Popular posts from this blog

Beyond Fabric and Fashion: Turning the World’s Most Beautiful Sarees Dataset into Structured Intelligence with SAS and R

Data Cleaning Secrets Using Famous Food Dataset:Handling Duplicate Records in SAS

Global AI Trends Unlocked Through SCAN and SUBSTR Precision in SAS