431.Can Advanced SAS Programming Detect, Clean, and Optimize Vaccine Cold Chain Logistics Data While Identifying Fraud Patterns?

Vaccine Cold Chain Optimization Using Advanced SAS Techniques for Data Quality Enhancement and Fraud Detection

--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

HERE IN THIS PROJECT WE USED THESE SAS & R TECHNIQUES —
DATA STEP | PROC SQL | PROC PRINT | MACROS | PROC TRANSPOSE | PROC DATASETS DELETE | DATA FUNCTIONS | R DATA FRAME | dplyr | rbind/bind_rows | merge | reshape2 (melt) | Date Functions | String Functions | Numeric Functions

--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Introduction

In modern healthcare supply chains, vaccine cold chain logistics is one of the most critical operational domains. Vaccines are highly temperature-sensitive biological products, and even minor deviations can lead to loss of potency, resulting in financial loss and public health risks. Organizations rely on robust data systems to monitor transportation, storage, compliance, and spoilage.

However, real-world datasets are rarely clean. They often contain:

  • Missing values
  • Incorrect formats
  • Logical inconsistencies
  • Fraudulent manipulations

This project simulates a real-world vaccine cold chain dataset with intentional errors, and demonstrates how Advanced SAS programming and R can:

  • Detect errors
  • Correct inconsistencies
  • Perform transformations
  • Apply business rules
  • Identify fraud patterns

The explanation is written in a simple, human-readable way, while still maintaining technical depth for interview preparation.

Table of Contents

  1. Business Context
  2. Raw Dataset Creation (With Errors)
  3. Error Identification
  4. Data Cleaning Using SAS
  5. DATA Step Transformations
  6. PROC SQL Operations
  7. Date Functions (MDY, INTCK, INTNX)
  8. SET, MERGE, APPEND
  9. PROC TRANSPOSE
  10. Character Functions
  11. Numeric Functions
  12. Fraud Detection Macro
  13. PROC DATASETS DELETE
  14. Final Clean Dataset
  15. Equivalent R Code 
  16. Summary
  17. Conclusion

Business Context

A pharmaceutical company distributes vaccines across multiple regions. Each shipment is monitored using the following variables:

  • Distribution_Node (location)
  • Storage_Temperature
  • Shipment_Delay
  • Spoilage_Rate
  • Units_Transported
  • Compliance_Index
  • Price
  • Utilization
  • Classification (Good / Risky / Fraud)
  • Shipment_Date

Key Business Goals:

  • Maintain temperature between 2°C to 8°C
  • Reduce spoilage below 5%
  • Detect fraud (fake reporting, manipulation)
  • Ensure regulatory compliance      

1. Create Raw Dataset (With Intentional Errors)

SAS Code

data vaccine_raw;

input Distribution_Node $ Storage_Temperature Shipment_Delay Spoilage_Rate 

      Units_Transported Compliance_Index Price Utilization Shipment_Date : date9.;

format Shipment_Date date9.;

datalines;

Hyd -5 3 2.5 1000 95 12.5 80 01JAN2024

Vizag 25 -1 110 2000 105 abc 70 05FEB2024

Chennai 4 2 . 1500 85 15.5 . 10MAR2024

Delhi 7 5 3.5 -100 90 20.0 60 15APR2024

Mumbai 3 0 2.0 1200 . 18.5 75 20MAY2024

;

run;

proc print data=vaccine_raw;

run;

OUTPUT:

ObsDistribution_NodeStorage_TemperatureShipment_DelaySpoilage_RateUnits_TransportedCompliance_IndexPriceUtilizationShipment_Date
1Hyd-532.510009512.58001JAN2024
2Vizag25-1110.02000105.7005FEB2024
3Chennai42.15008515.5.10MAR2024
4Delhi753.5-1009020.06015APR2024
5Mumbai302.01200.18.57520MAY2024

Explanation

This dataset intentionally contains multiple issues:

·  Negative temperature (-5)

·  Negative delay (-1)

·  Spoilage > 100 (invalid)

·  Price = "abc" (character in numeric field)

·  Missing values

·  Negative units

Why This Code is Used

·  To simulate real-world messy data

·  To test cleaning and validation logic

R code

vaccine_raw <- data.frame(

  Distribution_Node =c("Hyd","Vizag","Chennai","Delhi","Mumbai"),

  Storage_Temperature = c(-5,25,4,7,3),

  Shipment_Delay = c(3,-1,2,5,0),

  Spoilage_Rate = c(2.5,110,NA,3.5,2.0),

  Units_Transported = c(1000,2000,1500,-100,1200),

  Compliance_Index = c(95,105,85,90,NA),

  Price = c(12.5,NA,15.5,20.0,18.5),

  Utilization = c(80,70,NA,60,75),

  Shipment_Date = c("01JAN2024","05FEB2024","10MAR2024","15APR2024","20MAY2024")

)

print(vaccine_raw)

2. Error Detection

SAS Code

proc print data=vaccine_raw;

where Storage_Temperature < 0 or Shipment_Delay < 0 or 

      Spoilage_Rate > 100 or Units_Transported < 0;

run;

OUTPUT:

ObsDistribution_NodeStorage_TemperatureShipment_DelaySpoilage_RateUnits_TransportedCompliance_IndexPriceUtilizationShipment_Date
1Hyd-532.510009512.58001JAN2024
2Vizag25-1110.02000105.7005FEB2024
4Delhi753.5-1009020.06015APR2024

Explanation

This identifies invalid records.

Why Used

·  Early validation step

·  Helps isolate problematic rows

R Code

subset(vaccine_raw, Storage_Temperature < 0 | Shipment_Delay < 0 |

         Spoilage_Rate > 100 | Units_Transported < 0)

3. Data Cleaning

SAS Code

data vaccine_clean;

set vaccine_raw;

if Storage_Temperature < 2 then Storage_Temperature = 2;

if Shipment_Delay < 0 then Shipment_Delay = 0;

if Spoilage_Rate > 100 then Spoilage_Rate = 5;

if Units_Transported < 0 then Units_Transported = abs(Units_Transported);

if Price = . then Price = 10;

if Compliance_Index > 100 then Compliance_Index = 100;

run;

proc print data=vaccine_clean;

run;

OUTPUT:

ObsDistribution_NodeStorage_TemperatureShipment_DelaySpoilage_RateUnits_TransportedCompliance_IndexPriceUtilizationShipment_Date
1Hyd232.510009512.58001JAN2024
2Vizag2505.0200010010.07005FEB2024
3Chennai42.15008515.5.10MAR2024
4Delhi753.51009020.06015APR2024
5Mumbai302.01200.18.57520MAY2024

Explanation

Fixes logical inconsistencies.

Why Used

·  Ensures valid ranges

·  Converts negative to positive

R Code

vaccine_clean <- vaccine_raw

vaccine_clean$Storage_Temperature[vaccine_clean$Storage_Temperature < 2] <- 2

vaccine_clean$Shipment_Delay[vaccine_clean$Shipment_Delay < 0] <- 0

vaccine_clean$Units_Transported <- abs(vaccine_clean$Units_Transported)

4. Character Functions

SAS Code

data vaccine_char;

set vaccine_clean;

Node_Upper = upcase(Distribution_Node);

Node_Proper = propcase(Distribution_Node);

Node_Clean = strip(Distribution_Node);

run;

proc print data=vaccine_char;

run;

OUTPUT:

ObsDistribution_NodeStorage_TemperatureShipment_DelaySpoilage_RateUnits_TransportedCompliance_IndexPriceUtilizationShipment_DateNode_UpperNode_ProperNode_Clean
1Hyd232.510009512.58001JAN2024HYDHydHyd
2Vizag2505.0200010010.07005FEB2024VIZAGVizagVizag
3Chennai42.15008515.5.10MAR2024CHENNAIChennaiChennai
4Delhi753.51009020.06015APR2024DELHIDelhiDelhi
5Mumbai302.01200.18.57520MAY2024MUMBAIMumbaiMumbai

Explanation

·  UPCASE → capital letters

·  PROPCASE → proper format

·  STRIP → remove spaces

R Code

vaccine_clean <- vaccine_raw

vaccine_clean$Node_Upper <- toupper(vaccine_clean$Distribution_Node) 

vaccine_clean$Node_Lower <- tolower(vaccine_clean$Distribution_Node)

print(vaccine_clean)

5. Numeric Functions

SAS Code

data vaccine_num;

set vaccine_char;

Total_Value = Units_Transported * Price;

Avg_Util = mean(Utilization, 50);

run;

proc print data=vaccine_num;

run;

OUTPUT:

ObsDistribution_NodeStorage_TemperatureShipment_DelaySpoilage_RateUnits_TransportedCompliance_IndexPriceUtilizationShipment_DateNode_UpperNode_ProperNode_CleanTotal_ValueAvg_Util
1Hyd232.510009512.58001JAN2024HYDHydHyd1250065.0
2Vizag2505.0200010010.07005FEB2024VIZAGVizagVizag2000060.0
3Chennai42.15008515.5.10MAR2024CHENNAIChennaiChennai2325050.0
4Delhi753.51009020.06015APR2024DELHIDelhiDelhi200055.0
5Mumbai302.01200.18.57520MAY2024MUMBAIMumbaiMumbai2220062.5

SAS

R

mean(a,b) → row-wise

mean() → column-wise

Handles missing automatically

Needs na.rm=TRUE

Built-in flexibility

Needs rowMeans()

R Code

vaccine_clean$Total_Value <- vaccine_clean$Units_Transported * vaccine_clean$Price

vaccine_clean$Avg_Util <- rowMeans(cbind(vaccine_clean$Utilization, 50),na.rm = TRUE)

print(vaccine_clean)

6. Date Functions

SAS Code

data vaccine_date;

set vaccine_num;

Month = intck('month','01JAN2024'd,Shipment_Date);

Next_Month = intnx('month',Shipment_Date,1);

run;

proc print data=vaccine_date;

run;

OUTPUT:

ObsDistribution_NodeStorage_TemperatureShipment_DelaySpoilage_RateUnits_TransportedCompliance_IndexPriceUtilizationShipment_DateNode_UpperNode_ProperNode_CleanTotal_ValueAvg_UtilMonthNext_Month
1Hyd232.510009512.58001JAN2024HYDHydHyd1250065.0023407
2Vizag2505.0200010010.07005FEB2024VIZAGVizagVizag2000060.0123436
3Chennai42.15008515.5.10MAR2024CHENNAIChennaiChennai2325050.0223467
4Delhi753.51009020.06015APR2024DELHIDelhiDelhi200055.0323497
5Mumbai302.01200.18.57520MAY2024MUMBAIMumbaiMumbai2220062.5423528

Shipment_Date

SAS INTCK

R %m

01JAN2024

0

1

05FEB2024

1

2

10MAR2024

2

3

15APR2024

3

4

20MAY2024

4

5

Concept Difference

SAS (INTCK)

Counts intervals FROM reference date

Example:

01JAN → 01JAN = 0
01JAN → FEB   = 1

R (%m)

Returns actual calendar month

Example:

JAN = 1
FEB = 2

Key Insight

SAS INTCK

R format("%m")

Relative count

Absolute month

Starts from 0

Starts from 1

Interval logic

Calendar logic

R Code

Step 1:Convert to Date

vaccine_raw$Shipment_Date <- as.Date(vaccine_raw$Shipment_Date,format = "%d%b%Y")


Step 2:Extract Month

vaccine_raw$Month <- as.integer(format(vaccine_raw$Shipment_Date, "%m"))

print(vaccine_raw)


Step 3:How to Match SAS Output in R

vaccine_raw$Month_SAS <- as.integer(format(vaccine_raw$Shipment_Date, "%m")) - 1

print(vaccine_raw)

7. SET Statement

Step 1: Combine Two Different Datasets

data combined;

set vaccine_date 

    vaccine_clean;

run;

proc print data=combined;

run;

OUTPUT:

ObsDistribution_NodeStorage_TemperatureShipment_DelaySpoilage_RateUnits_TransportedCompliance_IndexPriceUtilizationShipment_DateNode_UpperNode_ProperNode_CleanTotal_ValueAvg_UtilMonthNext_Month
1Hyd232.510009512.58001JAN2024HYDHydHyd1250065.0023407
2Vizag2505.0200010010.07005FEB2024VIZAGVizagVizag2000060.0123436
3Chennai42.15008515.5.10MAR2024CHENNAIChennaiChennai2325050.0223467
4Delhi753.51009020.06015APR2024DELHIDelhiDelhi200055.0323497
5Mumbai302.01200.18.57520MAY2024MUMBAIMumbaiMumbai2220062.5423528
6Hyd232.510009512.58001JAN2024   ....
7Vizag2505.0200010010.07005FEB2024   ....
8Chennai42.15008515.5.10MAR2024   ....
9Delhi753.51009020.06015APR2024   ....
10Mumbai302.01200.18.57520MAY2024   ....

Step 2: Remove Duplicates After Combining

proc sort data=combined nodupkey;

by Distribution_Node Shipment_Date;

run;

proc print data=combined;

run;

OUTPUT:

ObsDistribution_NodeStorage_TemperatureShipment_DelaySpoilage_RateUnits_TransportedCompliance_IndexPriceUtilizationShipment_DateNode_UpperNode_ProperNode_CleanTotal_ValueAvg_UtilMonthNext_Month
1Chennai42.15008515.5.10MAR2024CHENNAIChennaiChennai2325050.0223467
2Delhi753.51009020.06015APR2024DELHIDelhiDelhi200055.0323497
3Hyd232.510009512.58001JAN2024HYDHydHyd1250065.0023407
4Mumbai302.01200.18.57520MAY2024MUMBAIMumbaiMumbai2220062.5423528
5Vizag2505.0200010010.07005FEB2024VIZAGVizagVizag2000060.0123436

Step 3: Add Source Indicator

data combined;

set vaccine_date(in=a) vaccine_date(in=b);

length Source $12.;

if a then Source="Original";

if b then Source="Duplicate";

run;

proc print data=combined;

run;

OUTPUT:

ObsDistribution_NodeStorage_TemperatureShipment_DelaySpoilage_RateUnits_TransportedCompliance_IndexPriceUtilizationShipment_DateNode_UpperNode_ProperNode_CleanTotal_ValueAvg_UtilMonthNext_MonthSource
1Hyd232.510009512.58001JAN2024HYDHydHyd1250065.0023407Original
2Vizag2505.0200010010.07005FEB2024VIZAGVizagVizag2000060.0123436Original
3Chennai42.15008515.5.10MAR2024CHENNAIChennaiChennai2325050.0223467Original
4Delhi753.51009020.06015APR2024DELHIDelhiDelhi200055.0323497Original
5Mumbai302.01200.18.57520MAY2024MUMBAIMumbaiMumbai2220062.5423528Original
6Hyd232.510009512.58001JAN2024HYDHydHyd1250065.0023407Duplicate
7Vizag2505.0200010010.07005FEB2024VIZAGVizagVizag2000060.0123436Duplicate
8Chennai42.15008515.5.10MAR2024CHENNAIChennaiChennai2325050.0223467Duplicate
9Delhi753.51009020.06015APR2024DELHIDelhiDelhi200055.0323497Duplicate
10Mumbai302.01200.18.57520MAY2024MUMBAIMumbaiMumbai2220062.5423528Duplicate

Step 4: If You DON'T Want Duplication 

data combined;

set vaccine_date;

run;

proc print data=combined;

run;

OUTPUT:

ObsDistribution_NodeStorage_TemperatureShipment_DelaySpoilage_RateUnits_TransportedCompliance_IndexPriceUtilizationShipment_DateNode_UpperNode_ProperNode_CleanTotal_ValueAvg_UtilMonthNext_Month
1Hyd232.510009512.58001JAN2024HYDHydHyd1250065.0023407
2Vizag2505.0200010010.07005FEB2024VIZAGVizagVizag2000060.0123436
3Chennai42.15008515.5.10MAR2024CHENNAIChennaiChennai2325050.0223467
4Delhi753.51009020.06015APR2024DELHIDelhiDelhi200055.0323497
5Mumbai302.01200.18.57520MAY2024MUMBAIMumbaiMumbai2220062.5423528

Explanation

·  The SET statement reads observations from datasets sequentially.

·  Here, the same dataset is appended twice → dataset doubles in size.

Why It Is Used

·  Combine datasets with same structure

·  Useful in:

·  Monthly data stacking

·  Multi-source integration

·  Batch processing pipelines

R Code

Step 1:Convert to Date

vaccine_raw$Shipment_Date <- as.Date(vaccine_raw$Shipment_Date,format = "%d%b%Y")


Step 2:Extract Month

vaccine_raw$Month <- as.integer(format(vaccine_raw$Shipment_Date, "%m"))

print(vaccine_raw)


Step 3:How to Match SAS Output in R

vaccine_raw$Month_SAS <- as.integer(format(vaccine_raw$Shipment_Date, "%m")) - 1

print(vaccine_raw)


vaccine_clean2 <- vaccine_clean

combined <- rbind(vaccine_clean, vaccine_clean2)


vaccine_clean$Source <- "Original"

vaccine_clean2$Source <- "New"


R Explanation

·  rbind() stacks datasets row-wise.

·  Works only if column structures match.

Why Used in R

·  Equivalent of SAS SET for vertical concatenation

·  Common in time-series or batch append operations

8. MERGE

SAS Code

proc sort data=vaccine_date; by Distribution_Node; run;

proc print data=vaccine_date;

run;

OUTPUT:

ObsDistribution_NodeStorage_TemperatureShipment_DelaySpoilage_RateUnits_TransportedCompliance_IndexPriceUtilizationShipment_DateNode_UpperNode_ProperNode_CleanTotal_ValueAvg_UtilMonthNext_Month
1Chennai42.15008515.5.10MAR2024CHENNAIChennaiChennai2325050.0223467
2Delhi753.51009020.06015APR2024DELHIDelhiDelhi200055.0323497
3Hyd232.510009512.58001JAN2024HYDHydHyd1250065.0023407
4Mumbai302.01200.18.57520MAY2024MUMBAIMumbaiMumbai2220062.5423528
5Vizag2505.0200010010.07005FEB2024VIZAGVizagVizag2000060.0123436

data merged;

merge vaccine_date(in=a) vaccine_date(in=b);

by Distribution_Node;

if a;

run;

proc print data=merged;

run;

OUTPUT:

ObsDistribution_NodeStorage_TemperatureShipment_DelaySpoilage_RateUnits_TransportedCompliance_IndexPriceUtilizationShipment_DateNode_UpperNode_ProperNode_CleanTotal_ValueAvg_UtilMonthNext_Month
1Chennai42.15008515.5.10MAR2024CHENNAIChennaiChennai2325050.0223467
2Delhi753.51009020.06015APR2024DELHIDelhiDelhi200055.0323497
3Hyd232.510009512.58001JAN2024HYDHydHyd1250065.0023407
4Mumbai302.01200.18.57520MAY2024MUMBAIMumbaiMumbai2220062.5423528
5Vizag2505.0200010010.07005FEB2024VIZAGVizagVizag2000060.0123436

Explanation

·  Combines datasets based on key (Distribution_Node)

·  in= flags help control which dataset contributes rows

Why It Is Used

·  Combine datasets with different variables

·  Core for:

·  SDTM → ADaM derivation

·  Lookup joins

·  Enrichment of datasets

R Explanation

·  merge() performs join

·  all.x = TRUE → left join

Why Used in R

·  Combine multiple datasets

·  Used in:

·  Clinical joins

·  Lookup merges

·  Feature engineering

9. APPEND

SAS Code

proc append base=vaccine_date 

            data=vaccine_clean force;

run;

proc print data=vaccine_date;

run;

OUTPUT:

ObsDistribution_NodeStorage_TemperatureShipment_DelaySpoilage_RateUnits_TransportedCompliance_IndexPriceUtilizationShipment_DateNode_UpperNode_ProperNode_CleanTotal_ValueAvg_UtilMonthNext_Month
1Chennai42.15008515.5.10MAR2024CHENNAIChennaiChennai2325050.0223467
2Delhi753.51009020.06015APR2024DELHIDelhiDelhi200055.0323497
3Hyd232.510009512.58001JAN2024HYDHydHyd1250065.0023407
4Mumbai302.01200.18.57520MAY2024MUMBAIMumbaiMumbai2220062.5423528
5Vizag2505.0200010010.07005FEB2024VIZAGVizagVizag2000060.0123436
6Hyd232.510009512.58001JAN2024   ....
7Vizag2505.0200010010.07005FEB2024   ....
8Chennai42.15008515.5.10MAR2024   ....
9Delhi753.51009020.06015APR2024   ....
10Mumbai302.01200.18.57520MAY2024   ....

Explanation

·  Adds rows of one dataset to another

·  force → allows mismatched variables

Why It Is Used

·  Faster than SET for large datasets

·  Used in:

·  Incremental data loads

·  Real-time ingestion

·  ETL pipelines

R Explanation

·  Same logic as append

·  Combines rows into base dataset

Why Used in R

·  Efficient for incremental updates

·  Common in data pipelines 

10. PROC TRANSPOSE

SAS Code

proc transpose data=vaccine_date out=transposed;

var Units_Transported Price;

run;

proc print data=transposed;

run;

OUTPUT:

Obs_NAME_COL1COL2COL3COL4COL5COL6COL7COL8COL9COL10
1Units_Transported1500.01001000.01200.020001000.020001500.01001200.0
2Price15.52012.518.51012.51015.52018.5

Explanation

·  Converts columns → rows

·  Wide → Long format transformation

Why It Is Used

Required for:

·  Reporting formats

·  TLF generation

·  Reshaping data

11. PROC SQL

SAS Code

proc sql;

create table summary as

select Distribution_Node,

       sum(Units_Transported) as Total_Units,

       avg(Price) as Avg_Price

from vaccine_date

group by Distribution_Node;

quit;

proc print data=summary;

run;

OUTPUT:

ObsDistribution_NodeTotal_UnitsAvg_Price
1Chennai300015.5
2Delhi20020.0
3Hyd200012.5
4Mumbai240018.5
5Vizag400010.0

Explanation

·  Performs aggregation

·  Groups by Distribution Node

Why It Is Used

·  SQL is powerful for:

·  Summaries

·  Joins

·  Data transformations

·  Widely used in clinical reporting


R Code

library(dplyr)


summary <- vaccine_clean %>%

  group_by(Distribution_Node) %>%

  summarise(

    Total_Units = sum(Units_Transported, na.rm = TRUE),

    Avg_Price = mean(Price, na.rm = TRUE)

  )

R Explanation

·  group_by() + summarise() performs aggregation

Why Used in R

·  Equivalent to SQL aggregation

·  Clean and readable syntax

12. PROC DATASETS DELETE

SAS Code

proc datasets library=work nolist;

delete vaccine_raw combined merged;

quit;

LOG:

NOTE: Deleting WORK.VACCINE_RAW (memtype=DATA).
NOTE: Deleting WORK.COMBINED (memtype=DATA).
NOTE: Deleting WORK.MERGED (memtype=DATA).

Explanation

·  Deletes datasets from WORK library

Why It Is Used

·  Memory optimization

·  Workspace cleanup

·  Prevent confusion with old datasets

R Code

rm(vaccine_raw, combined)

R Explanation

·  rm() removes objects

·  gc() clears memory

Why Used in R

·  Free memory

·  Improve performance

·  Avoid object conflicts

Final Corrected Dataset Code

SAS Code

data final_dataset;

set vaccine_date;

/* Classification Logic */

if Spoilage_Rate < 5 then Classification="GOOD";

else if 5 <= Spoilage_Rate < 10 then Classification="RISK";

else Classification="CRITICAL";

length Util_Class $8.;

/* Utilization Classification */

if Utilization >= 80 then Util_Class="HIGH";

else if Utilization >= 60 then Util_Class="MEDIUM";

else Util_Class="LOW";

/* Compliance Flag */

if Compliance_Index >= 90 then Compliance_Flag="STRONG";

else Compliance_Flag="WEAK";

/* Fraud Flag */

if Spoilage_Rate > 10 and Compliance_Index < 80 then Fraud="YES";

else Fraud="NO";

run;

proc print data=final_dataset;

run;

OUTPUT:

ObsDistribution_NodeStorage_TemperatureShipment_DelaySpoilage_RateUnits_TransportedCompliance_IndexPriceUtilizationShipment_DateNode_UpperNode_ProperNode_CleanTotal_ValueAvg_UtilMonthNext_MonthClassificationUtil_ClassCompliance_FlagFraud
1Chennai42.15008515.5.10MAR2024CHENNAIChennaiChennai2325050.0223467GOODLOWWEAKNO
2Delhi753.51009020.06015APR2024DELHIDelhiDelhi200055.0323497GOODMEDIUMSTRONGNO
3Hyd232.510009512.58001JAN2024HYDHydHyd1250065.0023407GOODHIGHSTRONGNO
4Mumbai302.01200.18.57520MAY2024MUMBAIMumbaiMumbai2220062.5423528GOODMEDIUMWEAKNO
5Vizag2505.0200010010.07005FEB2024VIZAGVizagVizag2000060.0123436RISKMEDIUMSTRONGNO
6Hyd232.510009512.58001JAN2024   ....GOODHIGHSTRONGNO
7Vizag2505.0200010010.07005FEB2024   ....RISKMEDIUMSTRONGNO
8Chennai42.15008515.5.10MAR2024   ....GOODLOWWEAKNO
9Delhi753.51009020.06015APR2024   ....GOODMEDIUMSTRONGNO
10Mumbai302.01200.18.57520MAY2024   ....GOODMEDIUMWEAKNO

Explanation 

1. Classification

  • Based on Spoilage Rate
  • Helps identify risk level

2. Utilization Class

  • Categorizes efficiency of usage

3. Compliance Flag

  • Indicates regulatory adherence

4. Fraud Flag

  • Detects suspicious patterns

Why This Final Dataset is Important

This is the business-ready dataset:

  •  Used for dashboards
  • Used for regulatory reporting
  • Used for fraud detection systems
  • Used for decision-making 

R Code

final_dataset <- vaccine_clean


# Classification

final_dataset$Classification <- ifelse(final_dataset$Spoilage_Rate < 5, "GOOD",

                                ifelse(final_dataset$Spoilage_Rate < 10, "RISK", "CRITICAL"))


# Utilization Class

final_dataset$Util_Class <- ifelse(final_dataset$Utilization >= 80, "HIGH",

                            ifelse(final_dataset$Utilization >= 60, "MEDIUM", "LOW"))


# Compliance Flag

final_dataset$Compliance_Flag <- ifelse(final_dataset$Compliance_Index >= 90, "STRONG", "WEAK")


# Fraud Detection

final_dataset$Fraud <- ifelse(final_dataset$Spoilage_Rate > 10 &

                      final_dataset$Compliance_Index < 80, "YES", "NO") 


Summary

In this project, we created a vaccine cold chain dataset that reflects real-world logistics challenges. Initially, the dataset contained several errors such as invalid temperature values, negative shipment delays, unrealistic spoilage rates, and missing or incorrect entries. Using SAS programming, we systematically identified and corrected these issues through data cleaning techniques.

We applied various SAS concepts like DATA step processing, PROC SQL, macros, and functions for handling dates, characters, and numeric values. Additionally, we implemented fraud detection logic to flag suspicious records based on spoilage and compliance patterns.

The integration of SAS and R provided a dual perspective, helping us understand how similar operations can be performed in both environments. Overall, this project demonstrates how structured programming and validation logic can significantly improve data quality and reliability in critical healthcare systems.

Conclusion

This project highlights the importance of robust data handling in vaccine cold chain logistics, where accuracy directly impacts public health outcomes. By introducing intentional errors, we simulated real-world data challenges and demonstrated how Advanced SAS programming techniques can effectively identify, correct, and optimize such datasets.

Through the use of DATA steps, PROC SQL, macros, and various functions, we transformed unreliable raw data into a structured and meaningful dataset. The inclusion of fraud detection logic further emphasized the role of analytics in safeguarding operational integrity.

Additionally, mapping each SAS step to R provided a comparative understanding, enhancing flexibility for analysts working across platforms. Overall, this project not only strengthens technical skills but also reinforces the importance of data quality, validation, and business logic in clinical and logistics domains.

SAS INTERVIEW QUESTIONS

1. What is DISTINCT used for?

Answer:
DISTINCT removes duplicate rows from query results.

2. What is the difference between UNION and OUTER UNION?

Answer:
UNION removes duplicates, while OUTER UNION keeps all rows including duplicates.

3. What is a subquery?

Answer:
A subquery is a query nested inside another SQL query.

--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

About the Author:

SAS Learning Hub is a data analytics and SAS programming platform focused on clinical, financial, and real-world data analysis. The content is created by professionals with academic training in Pharmaceutics and hands-on experience in Base SAS, PROC SQL, Macros, SDTM, and ADaM, providing practical and industry-relevant SAS learning resources.


Disclaimer:

The datasets and analysis in this article are created for educational and demonstration purposes only. They do not represent VACCINE data.


Our Mission:

This blog provides industry-focused SAS programming tutorials and analytics projects covering finance, healthcare, and technology.


This project is suitable for:

·  Students learning SAS

·  Data analysts building portfolios

·  Professionals preparing for SAS interviews

·  Bloggers writing about analytics 

·  Clinical SAS Programmer

·  Research Data Analyst

·  Regulatory Data Validator

--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Follow Us On : 


 
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

--->Follow our blog for more SAS-based analytics projects and industry data models.

---> Support Us By Following Our Blog..

To deepen your understanding of SAS analytics, please refer to our other data science and industry-focused projects listed below:

1.Can SAS analytics reveal which programming languages are popular but surprisingly difficult to master?

2.Is SDTM DM quality control in SAS the key to avoiding last-minute FDA rejection risks?

3.Can SAS analytics reveal which world tourist places truly attract visitors year after year?

--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

About Us | Contact Privacy Policy

Comments

Popular posts from this blog

412.Can We Build And Clean A University Course Analytics & Fraud Detection System In Sas Using Only Macros While Intentionally Creating And Fixing Errors?

420.Can We Detect Errors, Prevent Fraud, And Optimize Biometric Access System Security Using Advanced SAS Programming?

418.Can We Design, Debug, Detect Fraud, And Optimize A Smart Parking System Using Advanced SAS Programming Techniques?