412.Can We Build And Clean A University Course Analytics & Fraud Detection System In Sas Using Only Macros While Intentionally Creating And Fixing Errors?

University Course Analytics And Fraud Detection System In Sas Using Macros

--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

HERE IN THIS PROJECT WE USED THESE SAS STATEMENTS —

DATA STEP | SET | MERGE | PROC SORT | PROC APPEND | PROC TRANSPOSE | PROC DATASETS | DELETE | INPUT | DATALINES | LENGTH | FORMAT | IF-THEN-ELSE | ABS | COALESCE | ROUND | CEIL | FLOOR | STRIP | TRIM | CAT | CATX | PROPCASE | UPCASE | LOWCASE | MDY | INTCK | INTNX | PROC PRINT | PROC MEANS | PROC FREQ | RUN | QUIT | %MACRO | %MEND | %PUT

--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Introduction

In this project, we will design a complete University Course Analytics System in SAS. The goal is not only to create a dataset with more than 12 observations, but also to intentionally introduce multiple logical and structural errors, detect them using macro-based fraud detection logic, and then correct them step by step.

We will apply:

  • DATA step
  • SET, MERGE
  • PROC APPEND
  • PROC TRANSPOSE
  • PROC DATASETS DELETE
  • MDY, INTCK, INTNX
  • Numeric functions
  • Character functions (STRIP, TRIM, CAT, CATX, PROPCASE, UPCASE, LOWCASE, COALESCE)
  • Utilization classification
  • Percentage calculations
  • Date formats
  • Fraud detection rules

Table Of Contents

  1. Business Context
  2. Raw Dataset Creation With Intentional Errors
  3. Macro-Based Standardization
  4. Error Detection Logic
  5. Fraud Detection Logic
  6. Date Handling (MDY, INTCK, INTNX)
  7. Utilization Classification
  8. APPEND, SET, MERGE Usage
  9. PROC TRANSPOSE
  10. PROC DATASETS DELETE
  11. Full Corrected Dataset Code
  12. Explanation Of Every Macro
  13. Conclusion

Business Context

Universities monitor:

  • Enrollment levels
  • Pass rates
  • Dropout rates
  • Student feedback
  • Teaching mode utilization

However, data often contains:

  • Wrong percentages (>100%)
  • Negative enrollment
  • Mixed case department names
  • Trailing blanks
  • Missing values
  • Wrong date formats
  • Duplicate course codes
  • Fraudulent manipulation of pass rate

1. Create Raw Dataset With Intentional Errors

data university_raw;

    length Course_Code $10 Department $30 Teaching_Mode $20;

    format Start_Date End_Date date9.;

    input Course_Code $

          Department :$30.

          Enrollment

          Pass_Rate

          Avg_Feedback_Score

          Dropout_Rate

          Teaching_Mode $

          Start_Date :date9.

          End_Date   :date9.;

datalines;

cs101 computer_science 120 85 4.5 10 offline 01JAN2024 30APR2024

cs102 COMPUTER_SCIENCE 150 110 4.2 5 online 01JAN2024 30APR2024

cs103 Computer_Science -20 75 3.8 8 hybrid 01JAN2024 30APR2024

ec201 economics 200 88 4.0 7 offline 15JAN2024 15MAY2024

ec202 ECONOMICS 180 92 4.1 3 online 15JAN2024 15MAY2024

mg301 management 250 95 4.8 -5 offline 01FEB2024 30JUN2024

mg302 Management 240 89 . 6 hybrid 01FEB2024 30JUN2024

bt401 biotechnology 130 60 3.2 45 offline 01MAR2024 31JUL2024

bt402 BIOTECHNOLOGY 125 102 4.9 2 online 01MAR2024 31JUL2024

mt501 mathematics 90 85 4.3 5 offline 01JAN2024 30APR2024

mt501 mathematics 95 86 4.4 4 offline 01JAN2024 30APR2024

ph601 physics 0 70 3.9 30 hybrid 01APR2024 31AUG2024

;

run;

%macro a;

proc print data=university_raw;

run;

%mend a;


%a;

OUTPUT:

ObsCourse_CodeDepartmentTeaching_ModeStart_DateEnd_DateEnrollmentPass_RateAvg_Feedback_ScoreDropout_Rate
1cs101computer_scienceoffline01JAN202430APR2024120854.510
2cs102COMPUTER_SCIENCEonline01JAN202430APR20241501104.25
3cs103Computer_Sciencehybrid01JAN202430APR2024-20753.88
4ec201economicsoffline15JAN202415MAY2024200884.07
5ec202ECONOMICSonline15JAN202415MAY2024180924.13
6mg301managementoffline01FEB202430JUN2024250954.8-5
7mg302Managementhybrid01FEB202430JUN202424089.6
8bt401biotechnologyoffline01MAR202431JUL2024130603.245
9bt402BIOTECHNOLOGYonline01MAR202431JUL20241251024.92
10mt501mathematicsoffline01JAN202430APR202490854.35
11mt501mathematicsoffline01JAN202430APR202495864.44
12ph601physicshybrid01APR202431AUG20240703.9:30

Errors Introduced:

1.     Pass_Rate > 100 (110, 102)

2.     Negative Enrollment (-20)

3.     Negative Dropout_Rate (-5)

4.     Missing Feedback Score

5.     Duplicate Course_Code (mt501)

6.     Enrollment = 0

7.     Mixed case Department

8.     Untrimmed character values

9.     Dropout 45% suspicious

10.  No utilization classification

11.  No course duration

12.  No fraud flag

2. Standardize Character Variables

%macro b;

data university_clean1;

set university_raw;


Department = propcase(strip(Department));

Teaching_Mode = upcase(strip(Teaching_Mode));

Course_Code = upcase(strip(Course_Code));


run;

proc print data=university_clean1;

run;

%mend b;


%b;

OUTPUT:

ObsCourse_CodeDepartmentTeaching_ModeStart_DateEnd_DateEnrollmentPass_RateAvg_Feedback_ScoreDropout_Rate
1CS101Computer_scienceOFFLINE01JAN202430APR2024120854.510
2CS102Computer_scienceONLINE01JAN202430APR20241501104.25
3CS103Computer_scienceHYBRID01JAN202430APR2024-20753.88
4EC201EconomicsOFFLINE15JAN202415MAY2024200884.07
5EC202EconomicsONLINE15JAN202415MAY2024180924.13
6MG301ManagementOFFLINE01FEB202430JUN2024250954.8-5
7MG302ManagementHYBRID01FEB202430JUN202424089.6
8BT401BiotechnologyOFFLINE01MAR202431JUL2024130603.245
9BT402BiotechnologyONLINE01MAR202431JUL20241251024.92
10MT501MathematicsOFFLINE01JAN202430APR202490854.35
11MT501MathematicsOFFLINE01JAN202430APR202495864.44
12PH601PhysicsHYBRID01APR202431AUG20240703.930

·  STRIP removes spaces

·  PROPCASE standardizes department

·  UPCASE standardizes codes

3. Fix Numeric Logical Errors

%macro c;

data university_clean2;

set university_clean1;


if Pass_Rate > 100 then Pass_Rate = 100;

if Pass_Rate < 0 then Pass_Rate = .;


if Enrollment < 0 then Enrollment = abs(Enrollment);


if Dropout_Rate < 0 then Dropout_Rate = 0;


Avg_Feedback_Score = coalesce(Avg_Feedback_Score,3.5);


run;

proc print data=university_clean2;

run;

%mend c;


%c;

OUTPUT:

ObsCourse_CodeDepartmentTeaching_ModeStart_DateEnd_DateEnrollmentPass_RateAvg_Feedback_ScoreDropout_Rate
1CS101Computer_scienceOFFLINE01JAN202430APR2024120854.510
2CS102Computer_scienceONLINE01JAN202430APR20241501004.25
3CS103Computer_scienceHYBRID01JAN202430APR202420753.88
4EC201EconomicsOFFLINE15JAN202415MAY2024200884.07
5EC202EconomicsONLINE15JAN202415MAY2024180924.13
6MG301ManagementOFFLINE01FEB202430JUN2024250954.80
7MG302ManagementHYBRID01FEB202430JUN2024240893.56
8BT401BiotechnologyOFFLINE01MAR202431JUL2024130603.245
9BT402BiotechnologyONLINE01MAR202431JUL20241251004.92
10MT501MathematicsOFFLINE01JAN202430APR202490854.35
11MT501MathematicsOFFLINE01JAN202430APR202495864.44
12PH601PhysicsHYBRID01APR202431AUG20240703.930

·  ABS fixes negative enrollment

·  COALESCE fills missing feedback

·  Logical validation applied

4. Remove Duplicate Course Codes

%macro d;

proc sort data=university_clean2 nodupkey out=university_clean3;

by Course_Code;

run;

proc print data=university_clean3;

run;

%mend d;


%d;

OUTPUT:

ObsCourse_CodeDepartmentTeaching_ModeStart_DateEnd_DateEnrollmentPass_RateAvg_Feedback_ScoreDropout_Rate
1BT401BiotechnologyOFFLINE01MAR202431JUL2024130603.245
2BT402BiotechnologyONLINE01MAR202431JUL20241251004.92
3CS101Computer_scienceOFFLINE01JAN202430APR2024120854.510
4CS102Computer_scienceONLINE01JAN202430APR20241501004.25
5CS103Computer_scienceHYBRID01JAN202430APR202420753.88
6EC201EconomicsOFFLINE15JAN202415MAY2024200884.07
7EC202EconomicsONLINE15JAN202415MAY2024180924.13
8MG301ManagementOFFLINE01FEB202430JUN2024250954.80
9MG302ManagementHYBRID01FEB202430JUN2024240893.56
10MT501MathematicsOFFLINE01JAN202430APR202490854.35
11PH601PhysicsHYBRID01APR202431AUG20240703.930

5. Create Course Duration Using INTCK

%macro e;

data university_clean4;

set university_clean3;


Course_Duration_Days = intck('day', Start_Date, End_Date);


run;

proc print data=university_clean4;

run;

%mend e;


%e;

OUTPUT:

ObsCourse_CodeDepartmentTeaching_ModeStart_DateEnd_DateEnrollmentPass_RateAvg_Feedback_ScoreDropout_RateCourse_Duration_Days
1BT401BiotechnologyOFFLINE01MAR202431JUL2024130603.245152
2BT402BiotechnologyONLINE01MAR202431JUL20241251004.92152
3CS101Computer_scienceOFFLINE01JAN202430APR2024120854.510120
4CS102Computer_scienceONLINE01JAN202430APR20241501004.25120
5CS103Computer_scienceHYBRID01JAN202430APR202420753.88120
6EC201EconomicsOFFLINE15JAN202415MAY2024200884.07121
7EC202EconomicsONLINE15JAN202415MAY2024180924.13121
8MG301ManagementOFFLINE01FEB202430JUN2024250954.80150
9MG302ManagementHYBRID01FEB202430JUN2024240893.56150
10MT501MathematicsOFFLINE01JAN202430APR202490854.35120
11PH601PhysicsHYBRID01APR202431AUG20240703.930152

6. Create Next Session Date Using INTNX

%macro f;

data university_clean5;

set university_clean4;


Next_Session = intnx('month', End_Date, 6, 'same');

format Next_Session date9.;


run;

proc print data=university_clean5;

run;

%mend f;


%f;

OUTPUT:

ObsCourse_CodeDepartmentTeaching_ModeStart_DateEnd_DateEnrollmentPass_RateAvg_Feedback_ScoreDropout_RateCourse_Duration_DaysNext_Session
1BT401BiotechnologyOFFLINE01MAR202431JUL2024130603.24515231JAN2025
2BT402BiotechnologyONLINE01MAR202431JUL20241251004.9215231JAN2025
3CS101Computer_scienceOFFLINE01JAN202430APR2024120854.51012030OCT2024
4CS102Computer_scienceONLINE01JAN202430APR20241501004.2512030OCT2024
5CS103Computer_scienceHYBRID01JAN202430APR202420753.8812030OCT2024
6EC201EconomicsOFFLINE15JAN202415MAY2024200884.0712115NOV2024
7EC202EconomicsONLINE15JAN202415MAY2024180924.1312115NOV2024
8MG301ManagementOFFLINE01FEB202430JUN2024250954.8015030DEC2024
9MG302ManagementHYBRID01FEB202430JUN2024240893.5615030DEC2024
10MT501MathematicsOFFLINE01JAN202430APR202490854.3512030OCT2024
11PH601PhysicsHYBRID01APR202431AUG20240703.93015228FEB2025

7. Create Utilization Percentage

%macro g;

data university_clean6;

set university_clean5;


Max_Capacity = 300;

Utilization_Percentage = (Enrollment/Max_Capacity)*100;


run;

proc print data=university_clean6;

run;

%mend g;


%g;

OUTPUT:

ObsCourse_CodeDepartmentTeaching_ModeStart_DateEnd_DateEnrollmentPass_RateAvg_Feedback_ScoreDropout_RateCourse_Duration_DaysNext_SessionMax_CapacityUtilization_Percentage
1BT401BiotechnologyOFFLINE01MAR202431JUL2024130603.24515231JAN202530043.3333
2BT402BiotechnologyONLINE01MAR202431JUL20241251004.9215231JAN202530041.6667
3CS101Computer_scienceOFFLINE01JAN202430APR2024120854.51012030OCT202430040.0000
4CS102Computer_scienceONLINE01JAN202430APR20241501004.2512030OCT202430050.0000
5CS103Computer_scienceHYBRID01JAN202430APR202420753.8812030OCT20243006.6667
6EC201EconomicsOFFLINE15JAN202415MAY2024200884.0712115NOV202430066.6667
7EC202EconomicsONLINE15JAN202415MAY2024180924.1312115NOV202430060.0000
8MG301ManagementOFFLINE01FEB202430JUN2024250954.8015030DEC202430083.3333
9MG302ManagementHYBRID01FEB202430JUN2024240893.5615030DEC202430080.0000
10MT501MathematicsOFFLINE01JAN202430APR202490854.3512030OCT202430030.0000
11PH601PhysicsHYBRID01APR202431AUG20240703.93015228FEB20253000.0000

8. Utilization Classification

%macro h;

data university_clean7;

set university_clean6;


length Utilization_Class $20;


if Utilization_Percentage >=80 then Utilization_Class="High";

else if Utilization_Percentage >=50 then Utilization_Class="Medium";

else Utilization_Class="Low";


run;

proc print data=university_clean7;

run;

%mend h;


%h;

OUTPUT:

ObsCourse_CodeDepartmentTeaching_ModeStart_DateEnd_DateEnrollmentPass_RateAvg_Feedback_ScoreDropout_RateCourse_Duration_DaysNext_SessionMax_CapacityUtilization_PercentageUtilization_Class
1BT401BiotechnologyOFFLINE01MAR202431JUL2024130603.24515231JAN202530043.3333Low
2BT402BiotechnologyONLINE01MAR202431JUL20241251004.9215231JAN202530041.6667Low
3CS101Computer_scienceOFFLINE01JAN202430APR2024120854.51012030OCT202430040.0000Low
4CS102Computer_scienceONLINE01JAN202430APR20241501004.2512030OCT202430050.0000Medium
5CS103Computer_scienceHYBRID01JAN202430APR202420753.8812030OCT20243006.6667Low
6EC201EconomicsOFFLINE15JAN202415MAY2024200884.0712115NOV202430066.6667Medium
7EC202EconomicsONLINE15JAN202415MAY2024180924.1312115NOV202430060.0000Medium
8MG301ManagementOFFLINE01FEB202430JUN2024250954.8015030DEC202430083.3333High
9MG302ManagementHYBRID01FEB202430JUN2024240893.5615030DEC202430080.0000High
10MT501MathematicsOFFLINE01JAN202430APR202490854.3512030OCT202430030.0000Low
11PH601PhysicsHYBRID01APR202431AUG20240703.93015228FEB20253000.0000Low

9. Fraud Detection Logic

%macro i;

data university_clean8;

set university_clean7;


length Fraud_Flag $10;


if Pass_Rate=100 and Avg_Feedback_Score <3 then Fraud_Flag="Check";

else if Dropout_Rate >40 then Fraud_Flag="High Risk";

else Fraud_Flag="Normal";


run;

proc print data=university_clean8;

run;

%mend i;


%i;

OUTPUT:

ObsCourse_CodeDepartmentTeaching_ModeStart_DateEnd_DateEnrollmentPass_RateAvg_Feedback_ScoreDropout_RateCourse_Duration_DaysNext_SessionMax_CapacityUtilization_PercentageUtilization_ClassFraud_Flag
1BT401BiotechnologyOFFLINE01MAR202431JUL2024130603.24515231JAN202530043.3333LowHigh Risk
2BT402BiotechnologyONLINE01MAR202431JUL20241251004.9215231JAN202530041.6667LowNormal
3CS101Computer_scienceOFFLINE01JAN202430APR2024120854.51012030OCT202430040.0000LowNormal
4CS102Computer_scienceONLINE01JAN202430APR20241501004.2512030OCT202430050.0000MediumNormal
5CS103Computer_scienceHYBRID01JAN202430APR202420753.8812030OCT20243006.6667LowNormal
6EC201EconomicsOFFLINE15JAN202415MAY2024200884.0712115NOV202430066.6667MediumNormal
7EC202EconomicsONLINE15JAN202415MAY2024180924.1312115NOV202430060.0000MediumNormal
8MG301ManagementOFFLINE01FEB202430JUN2024250954.8015030DEC202430083.3333HighNormal
9MG302ManagementHYBRID01FEB202430JUN2024240893.5615030DEC202430080.0000HighNormal
10MT501MathematicsOFFLINE01JAN202430APR202490854.3512030OCT202430030.0000LowNormal
11PH601PhysicsHYBRID01APR202431AUG20240703.93015228FEB20253000.0000LowNormal

10. Append New Courses

%macro j;

data new_courses;

length Course_Code $10 Department $30 Teaching_Mode $20;

Course_Code="CS999";

Department="Computer Science";

Enrollment=220;

Pass_Rate=90;

Avg_Feedback_Score=4.6;

Dropout_Rate=4;

Teaching_Mode="ONLINE";

Start_Date=mdy(7,1,2024);

End_Date=mdy(10,30,2024);

run;

proc print data=new_courses;

run;


proc append base=university_clean8  data=new_courses;

run;

proc print data=university_clean8;

run;

%mend j;


%j;

OUTPUT:

ObsCourse_CodeDepartmentTeaching_ModeEnrollmentPass_RateAvg_Feedback_ScoreDropout_RateStart_DateEnd_Date
1CS999Computer ScienceONLINE220904.642355823679
ObsCourse_CodeDepartmentTeaching_ModeStart_DateEnd_DateEnrollmentPass_RateAvg_Feedback_ScoreDropout_RateCourse_Duration_DaysNext_SessionMax_CapacityUtilization_PercentageUtilization_ClassFraud_Flag
1BT401BiotechnologyOFFLINE01MAR202431JUL2024130603.24515231JAN202530043.3333LowHigh Risk
2BT402BiotechnologyONLINE01MAR202431JUL20241251004.9215231JAN202530041.6667LowNormal
3CS101Computer_scienceOFFLINE01JAN202430APR2024120854.51012030OCT202430040.0000LowNormal
4CS102Computer_scienceONLINE01JAN202430APR20241501004.2512030OCT202430050.0000MediumNormal
5CS103Computer_scienceHYBRID01JAN202430APR202420753.8812030OCT20243006.6667LowNormal
6EC201EconomicsOFFLINE15JAN202415MAY2024200884.0712115NOV202430066.6667MediumNormal
7EC202EconomicsONLINE15JAN202415MAY2024180924.1312115NOV202430060.0000MediumNormal
8MG301ManagementOFFLINE01FEB202430JUN2024250954.8015030DEC202430083.3333HighNormal
9MG302ManagementHYBRID01FEB202430JUN2024240893.5615030DEC202430080.0000HighNormal
10MT501MathematicsOFFLINE01JAN202430APR202490854.3512030OCT202430030.0000LowNormal
11PH601PhysicsHYBRID01APR202431AUG20240703.93015228FEB20253000.0000LowNormal
12CS999Computer ScienceONLINE01JUL202430OCT2024220904.64....  

11. Merge Department Budget Dataset

data dept_budget;

length Department $30;

input Department $ Budget Pass_Rate;

datalines;

Computer_Science 500000 80

Economics 300000 90

Management 400000 95

Biotechnology 350000 53

Mathematics 200000 78

Physics 250000 69

;

run;


%macro k;

proc print data=dept_budget;

run;

%mend k;


%k;

OUTPUT:

ObsDepartmentBudgetPass_Rate
1Computer_Science50000080
2Economics30000090
3Management40000095
4Biotechnology35000053
5Mathematics20000078
6Physics25000069

%macro l;

proc sort data=university_clean8;by Department;run;

proc print data=university_clean8;

run;


proc sort data=dept_budget;by Department;run;

proc print data=dept_budget;

run;


data university_final1;

merge university_clean8 

      dept_budget;

by Department;

run;

proc print data=university_final1;

run;


proc transpose data=university_final1 out=transposed_pass;

by Department NotSorted;

run;

proc print data=transposed_pass;

run;

%mend l;


%l;

OUTPUT:

ObsCourse_CodeDepartmentTeaching_ModeStart_DateEnd_DateEnrollmentPass_RateAvg_Feedback_ScoreDropout_RateCourse_Duration_DaysNext_SessionMax_CapacityUtilization_PercentageUtilization_ClassFraud_Flag
1BT401BiotechnologyOFFLINE01MAR202431JUL2024130603.24515231JAN202530043.3333LowHigh Risk
2BT402BiotechnologyONLINE01MAR202431JUL20241251004.9215231JAN202530041.6667LowNormal
3CS999Computer ScienceONLINE01JUL202430OCT2024220904.64....  
4CS101Computer_scienceOFFLINE01JAN202430APR2024120854.51012030OCT202430040.0000LowNormal
5CS102Computer_scienceONLINE01JAN202430APR20241501004.2512030OCT202430050.0000MediumNormal
6CS103Computer_scienceHYBRID01JAN202430APR202420753.8812030OCT20243006.6667LowNormal
7EC201EconomicsOFFLINE15JAN202415MAY2024200884.0712115NOV202430066.6667MediumNormal
8EC202EconomicsONLINE15JAN202415MAY2024180924.1312115NOV202430060.0000MediumNormal
9MG301ManagementOFFLINE01FEB202430JUN2024250954.8015030DEC202430083.3333HighNormal
10MG302ManagementHYBRID01FEB202430JUN2024240893.5615030DEC202430080.0000HighNormal
11MT501MathematicsOFFLINE01JAN202430APR202490854.3512030OCT202430030.0000LowNormal
12PH601PhysicsHYBRID01APR202431AUG20240703.93015228FEB20253000.0000LowNormal
ObsDepartmentBudgetPass_Rate
1Biotechnology35000053
2Computer_Science50000080
3Economics30000090
4Management40000095
5Mathematics20000078
6Physics25000069
ObsCourse_CodeDepartmentTeaching_ModeStart_DateEnd_DateEnrollmentPass_RateAvg_Feedback_ScoreDropout_RateCourse_Duration_DaysNext_SessionMax_CapacityUtilization_PercentageUtilization_ClassFraud_FlagBudget
1BT401BiotechnologyOFFLINE01MAR202431JUL2024130533.24515231JAN202530043.3333LowHigh Risk350000
2BT402BiotechnologyONLINE01MAR202431JUL20241251004.9215231JAN202530041.6667LowNormal350000
3CS999Computer ScienceONLINE01JUL202430OCT2024220904.64....  .
4 Computer_Science ...80......  500000
5CS101Computer_scienceOFFLINE01JAN202430APR2024120854.51012030OCT202430040.0000LowNormal.
6CS102Computer_scienceONLINE01JAN202430APR20241501004.2512030OCT202430050.0000MediumNormal.
7CS103Computer_scienceHYBRID01JAN202430APR202420753.8812030OCT20243006.6667LowNormal.
8EC201EconomicsOFFLINE15JAN202415MAY2024200904.0712115NOV202430066.6667MediumNormal300000
9EC202EconomicsONLINE15JAN202415MAY2024180924.1312115NOV202430060.0000MediumNormal300000
10MG301ManagementOFFLINE01FEB202430JUN2024250954.8015030DEC202430083.3333HighNormal400000
11MG302ManagementHYBRID01FEB202430JUN2024240893.5615030DEC202430080.0000HighNormal400000
12MT501MathematicsOFFLINE01JAN202430APR202490784.3512030OCT202430030.0000LowNormal200000
13PH601PhysicsHYBRID01APR202431AUG20240693.93015228FEB20253000.0000LowNormal250000
ObsDepartment_NAME_COL1COL2COL3
1BiotechnologyStart_Date23436.0023436.00.
2BiotechnologyEnd_Date23588.0023588.00.
3BiotechnologyEnrollment130.00125.00.
4BiotechnologyPass_Rate53.00100.00.
5BiotechnologyAvg_Feedback_Score3.204.90.
6BiotechnologyDropout_Rate45.002.00.
7BiotechnologyCourse_Duration_Days152.00152.00.
8BiotechnologyNext_Session23772.0023772.00.
9BiotechnologyMax_Capacity300.00300.00.
10BiotechnologyUtilization_Percentage43.3341.67.
11BiotechnologyBudget350000.00350000.00.
12Computer ScienceStart_Date23558.00..
13Computer ScienceEnd_Date23679.00..
14Computer ScienceEnrollment220.00..
15Computer SciencePass_Rate90.00..
16Computer ScienceAvg_Feedback_Score4.60..
17Computer ScienceDropout_Rate4.00..
18Computer ScienceCourse_Duration_Days...
19Computer ScienceNext_Session...
20Computer ScienceMax_Capacity...
21Computer ScienceUtilization_Percentage...
22Computer ScienceBudget...
23Computer_ScienceStart_Date...
24Computer_ScienceEnd_Date...
25Computer_ScienceEnrollment...
26Computer_SciencePass_Rate80.00..
27Computer_ScienceAvg_Feedback_Score...
28Computer_ScienceDropout_Rate...
29Computer_ScienceCourse_Duration_Days...
30Computer_ScienceNext_Session...
31Computer_ScienceMax_Capacity...
32Computer_ScienceUtilization_Percentage...
33Computer_ScienceBudget500000.00..
34Computer_scienceStart_Date23376.0023376.0023376.00
35Computer_scienceEnd_Date23496.0023496.0023496.00
36Computer_scienceEnrollment120.00150.0020.00
37Computer_sciencePass_Rate85.00100.0075.00
38Computer_scienceAvg_Feedback_Score4.504.203.80
39Computer_scienceDropout_Rate10.005.008.00
40Computer_scienceCourse_Duration_Days120.00120.00120.00
41Computer_scienceNext_Session23679.0023679.0023679.00
42Computer_scienceMax_Capacity300.00300.00300.00
43Computer_scienceUtilization_Percentage40.0050.006.67
44Computer_scienceBudget...
45EconomicsStart_Date23390.0023390.00.
46EconomicsEnd_Date23511.0023511.00.
47EconomicsEnrollment200.00180.00.
48EconomicsPass_Rate90.0092.00.
49EconomicsAvg_Feedback_Score4.004.10.
50EconomicsDropout_Rate7.003.00.
51EconomicsCourse_Duration_Days121.00121.00.
52EconomicsNext_Session23695.0023695.00.
53EconomicsMax_Capacity300.00300.00.
54EconomicsUtilization_Percentage66.6760.00.
55EconomicsBudget300000.00300000.00.
56ManagementStart_Date23407.0023407.00.
57ManagementEnd_Date23557.0023557.00.
58ManagementEnrollment250.00240.00.
59ManagementPass_Rate95.0089.00.
60ManagementAvg_Feedback_Score4.803.50.
61ManagementDropout_Rate0.006.00.
62ManagementCourse_Duration_Days150.00150.00.
63ManagementNext_Session23740.0023740.00.
64ManagementMax_Capacity300.00300.00.
65ManagementUtilization_Percentage83.3380.00.
66ManagementBudget400000.00400000.00.
67MathematicsStart_Date23376.00..
68MathematicsEnd_Date23496.00..
69MathematicsEnrollment90.00..
70MathematicsPass_Rate78.00..
71MathematicsAvg_Feedback_Score4.30..
72MathematicsDropout_Rate5.00..
73MathematicsCourse_Duration_Days120.00..
74MathematicsNext_Session23679.00..
75MathematicsMax_Capacity300.00..
76MathematicsUtilization_Percentage30.00..
77MathematicsBudget200000.00..
78PhysicsStart_Date23467.00..
79PhysicsEnd_Date23619.00..
80PhysicsEnrollment0.00..
81PhysicsPass_Rate69.00..
82PhysicsAvg_Feedback_Score3.90..
83PhysicsDropout_Rate30.00..
84PhysicsCourse_Duration_Days152.00..
85PhysicsNext_Session23800.00..
86PhysicsMax_Capacity300.00..
87PhysicsUtilization_Percentage0.00..
88PhysicsBudget250000.00..

12. Numeric Functions Example

%macro m;

data university_final2;

set university_final1;


Rounded_Pass = round(Pass_Rate,1);

Ceil_Enroll = ceil(Enrollment);

Floor_Enroll = floor(Enrollment);


run;

proc print data=university_final2;

run;

%mend m;


%m;

OUTPUT:

ObsCourse_CodeDepartmentTeaching_ModeStart_DateEnd_DateEnrollmentPass_RateAvg_Feedback_ScoreDropout_RateCourse_Duration_DaysNext_SessionMax_CapacityUtilization_PercentageUtilization_ClassFraud_FlagBudgetRounded_PassCeil_EnrollFloor_Enroll
1BT401BiotechnologyOFFLINE01MAR202431JUL2024130533.24515231JAN202530043.3333LowHigh Risk35000053130130
2BT402BiotechnologyONLINE01MAR202431JUL20241251004.9215231JAN202530041.6667LowNormal350000100125125
3CS999Computer ScienceONLINE01JUL202430OCT2024220904.64....  .90220220
4 Computer_Science ...80......  50000080..
5CS101Computer_scienceOFFLINE01JAN202430APR2024120854.51012030OCT202430040.0000LowNormal.85120120
6CS102Computer_scienceONLINE01JAN202430APR20241501004.2512030OCT202430050.0000MediumNormal.100150150
7CS103Computer_scienceHYBRID01JAN202430APR202420753.8812030OCT20243006.6667LowNormal.752020
8EC201EconomicsOFFLINE15JAN202415MAY2024200904.0712115NOV202430066.6667MediumNormal30000090200200
9EC202EconomicsONLINE15JAN202415MAY2024180924.1312115NOV202430060.0000MediumNormal30000092180180
10MG301ManagementOFFLINE01FEB202430JUN2024250954.8015030DEC202430083.3333HighNormal40000095250250
11MG302ManagementHYBRID01FEB202430JUN2024240893.5615030DEC202430080.0000HighNormal40000089240240
12MT501MathematicsOFFLINE01JAN202430APR202490784.3512030OCT202430030.0000LowNormal200000789090
13PH601PhysicsHYBRID01APR202431AUG20240693.93015228FEB20253000.0000LowNormal2500006900

13. Character Concatenation

%macro n;

data university_final3;

set university_final2;


Course_Label = catx('-',Course_Code,Department);

Mode_Label = cat(Teaching_Mode," MODE");


run;

proc print data=university_final3;

run;

%mend n;


%n;

OUTPUT:

ObsCourse_CodeDepartmentTeaching_ModeStart_DateEnd_DateEnrollmentPass_RateAvg_Feedback_ScoreDropout_RateCourse_Duration_DaysNext_SessionMax_CapacityUtilization_PercentageUtilization_ClassFraud_FlagBudgetRounded_PassCeil_EnrollFloor_EnrollCourse_LabelMode_Label
1BT401BiotechnologyOFFLINE01MAR202431JUL2024130533.24515231JAN202530043.3333LowHigh Risk35000053130130BT401-BiotechnologyOFFLINE MODE
2BT402BiotechnologyONLINE01MAR202431JUL20241251004.9215231JAN202530041.6667LowNormal350000100125125BT402-BiotechnologyONLINE MODE
3CS999Computer ScienceONLINE01JUL202430OCT2024220904.64....  .90220220CS999-Computer ScienceONLINE MODE
4 Computer_Science ...80......  50000080..Computer_ScienceMODE
5CS101Computer_scienceOFFLINE01JAN202430APR2024120854.51012030OCT202430040.0000LowNormal.85120120CS101-Computer_scienceOFFLINE MODE
6CS102Computer_scienceONLINE01JAN202430APR20241501004.2512030OCT202430050.0000MediumNormal.100150150CS102-Computer_scienceONLINE MODE
7CS103Computer_scienceHYBRID01JAN202430APR202420753.8812030OCT20243006.6667LowNormal.752020CS103-Computer_scienceHYBRID MODE
8EC201EconomicsOFFLINE15JAN202415MAY2024200904.0712115NOV202430066.6667MediumNormal30000090200200EC201-EconomicsOFFLINE MODE
9EC202EconomicsONLINE15JAN202415MAY2024180924.1312115NOV202430060.0000MediumNormal30000092180180EC202-EconomicsONLINE MODE
10MG301ManagementOFFLINE01FEB202430JUN2024250954.8015030DEC202430083.3333HighNormal40000095250250MG301-ManagementOFFLINE MODE
11MG302ManagementHYBRID01FEB202430JUN2024240893.5615030DEC202430080.0000HighNormal40000089240240MG302-ManagementHYBRID MODE
12MT501MathematicsOFFLINE01JAN202430APR202490784.3512030OCT202430030.0000LowNormal200000789090MT501-MathematicsOFFLINE MODE
13PH601PhysicsHYBRID01APR202431AUG20240693.93015228FEB20253000.0000LowNormal2500006900PH601-PhysicsHYBRID MODE

14. Delete Temporary Datasets

%macro o;

proc datasets library=work nolist;

delete university_raw university_clean1 university_clean2;

quit;

%mend o;


%o;

LOG:

NOTE: Deleting WORK.UNIVERSITY_RAW (memtype=DATA).
NOTE: Deleting WORK.UNIVERSITY_CLEAN1 (memtype=DATA).
NOTE: Deleting WORK.UNIVERSITY_CLEAN2 (memtype=DATA).

15. Full Corrected Final Dataset Code

data university_master;

set university_final3;


if Enrollment=0 then delete;


Pass_Rate_Percent = Pass_Rate/100;

Dropout_Percent = Dropout_Rate/100;


run;

proc print data=university_master;

run;

OUTPUT:

ObsCourse_CodeDepartmentTeaching_ModeStart_DateEnd_DateEnrollmentPass_RateAvg_Feedback_ScoreDropout_RateCourse_Duration_DaysNext_SessionMax_CapacityUtilization_PercentageUtilization_ClassFraud_FlagBudgetRounded_PassCeil_EnrollFloor_EnrollCourse_LabelMode_LabelPass_Rate_PercentDropout_Percent
1BT401BiotechnologyOFFLINE01MAR202431JUL2024130533.24515231JAN202530043.3333LowHigh Risk35000053130130BT401-BiotechnologyOFFLINE MODE0.530.45
2BT402BiotechnologyONLINE01MAR202431JUL20241251004.9215231JAN202530041.6667LowNormal350000100125125BT402-BiotechnologyONLINE MODE1.000.02
3CS999Computer ScienceONLINE01JUL202430OCT2024220904.64....  .90220220CS999-Computer ScienceONLINE MODE0.900.04
4 Computer_Science ...80......  50000080..Computer_ScienceMODE0.80.
5CS101Computer_scienceOFFLINE01JAN202430APR2024120854.51012030OCT202430040.0000LowNormal.85120120CS101-Computer_scienceOFFLINE MODE0.850.10
6CS102Computer_scienceONLINE01JAN202430APR20241501004.2512030OCT202430050.0000MediumNormal.100150150CS102-Computer_scienceONLINE MODE1.000.05
7CS103Computer_scienceHYBRID01JAN202430APR202420753.8812030OCT20243006.6667LowNormal.752020CS103-Computer_scienceHYBRID MODE0.750.08
8EC201EconomicsOFFLINE15JAN202415MAY2024200904.0712115NOV202430066.6667MediumNormal30000090200200EC201-EconomicsOFFLINE MODE0.900.07
9EC202EconomicsONLINE15JAN202415MAY2024180924.1312115NOV202430060.0000MediumNormal30000092180180EC202-EconomicsONLINE MODE0.920.03
10MG301ManagementOFFLINE01FEB202430JUN2024250954.8015030DEC202430083.3333HighNormal40000095250250MG301-ManagementOFFLINE MODE0.950.00
11MG302ManagementHYBRID01FEB202430JUN2024240893.5615030DEC202430080.0000HighNormal40000089240240MG302-ManagementHYBRID MODE0.890.06
12MT501MathematicsOFFLINE01JAN202430APR202490784.3512030OCT202430030.0000LowNormal200000789090MT501-MathematicsOFFLINE MODE0.780.05

Explanation:

Created Intentional Errors

·       Universities manipulate pass rate

·       Enrollment data may be wrongly entered

·       Dropout rate may be negative due to entry mistake

·       Feedback may be missing

·       Duplicate course codes may exist

Why Use STRIP?

Removes leading and trailing blanks.
Important when joining datasets.

Why Use PROPCASE?

Standardizes department name format.
"computer science" → "Computer Science"

Why Use UPCASE?

Standardizes Course_Code.
Ensures no duplicate due to case mismatch.

Why Use COALESCE?

If feedback score missing, assign default 3.5.

Why Use INTCK?

Calculates difference between two dates.
Used to find course duration.

Why Use INTNX?

Find next session date automatically.

Why Use MDY?

Creates date using numeric values.
MDY(7,1,2024) = 01JUL2024

Why Use APPEND?

Adds new courses without rewriting full dataset.

Why Use MERGE?

Combine budget dataset with course dataset.

Why Use TRANSPOSE?

Convert rows to columns.
Used in reporting.

Why Use PROC DATASETS DELETE?

Clean temporary datasets.
Improves memory efficiency.

Fraud Detection Logic Explanation

Fraud patterns:

1.     Pass rate 100% but low feedback → suspicious

2.     Dropout > 40% → high risk

3.     Enrollment 0 → invalid

4.     Negative values → data entry fraud

Business Impact

After cleaning:

·       Accurate utilization reporting

·       Reliable academic analytics

·       Fraud risk identification

·       Capacity planning improved

·       Budget allocation justified

Conclusion

We successfully built a University Course Analytics and Fraud Detection System in SAS using only macros from A to O.

·       Created dataset with 12+ observations

·       Inserted intentional errors

·       Corrected logical numeric errors

·       Standardized character fields

·       Calculated percentages

·       Applied utilization classification

·       Used MDY, INTCK, INTNX

·       Applied APPEND, MERGE, SET

·       Used numeric and character functions

·       Applied fraud detection logic

·       Deleted temporary datasets

·       Delivered a fully cleaned master dataset


SAS INTERVIEW QUESTIONS

·  How do you handle missing values?

·  What is reconciliation?

·  What is log checking?


--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

About the Author:

SAS Learning Hub is a data analytics and SAS programming platform focused on clinical, financial, and real-world data analysis. The content is created by professionals with academic training in Pharmaceutics and hands-on experience in Base SAS, PROC SQL, Macros, SDTM, and ADaM, providing practical and industry-relevant SAS learning resources.


Disclaimer:

The datasets and analysis in this article are created for educational and demonstration purposes only. They do not represent UNIVERSITY data.


Our Mission:

This blog provides industry-focused SAS programming tutorials and analytics projects covering finance, healthcare, and technology.


This project is suitable for:

·  Students learning SAS

·  Data analysts building portfolios

·  Professionals preparing for SAS interviews

·  Bloggers writing about analytics 

·  Clinical SAS Programmer

·  Research Data Analyst

·  Regulatory Data Validator


--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Follow Us On : 


 
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

--->Follow our blog for more SAS-based analytics projects and industry data models.

---> Support Us By Following Our Blog..

To deepen your understanding of SAS analytics, please refer to our other data science and industry-focused projects listed below:

1.How Do We Build a Complete River Trade and Pollution Monitoring System in SAS Using MERGE, APPEND, TRANSPOSE, and Macros?


--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

About Us | Contact Privacy Policy

Comments

Popular posts from this blog

409.Can We Build a Reliable Emergency Services Analytics & Fraud Detection System in SAS While Identifying and Fixing Intentional Errors?

397.If a satellite has excellent signal strength but very high latency, can it still deliver good quality communication? Why or why not?A Sas Study

401.How Efficient Are Global Data Centers? A Complete SAS Analytics Study