412.Can We Build And Clean A University Course Analytics & Fraud Detection System In Sas Using Only Macros While Intentionally Creating And Fixing Errors?
University Course Analytics And Fraud Detection System In Sas Using Macros
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
HERE IN THIS PROJECT WE USED THESE SAS STATEMENTS —
DATA STEP | SET | MERGE | PROC SORT | PROC APPEND | PROC TRANSPOSE | PROC DATASETS | DELETE | INPUT | DATALINES | LENGTH | FORMAT | IF-THEN-ELSE | ABS | COALESCE | ROUND | CEIL | FLOOR | STRIP | TRIM | CAT | CATX | PROPCASE | UPCASE | LOWCASE | MDY | INTCK | INTNX | PROC PRINT | PROC MEANS | PROC FREQ | RUN | QUIT | %MACRO | %MEND | %PUT
Introduction
In this
project, we will design a complete University Course Analytics System in SAS.
The goal is not only to create a dataset with more than 12 observations, but
also to intentionally introduce multiple logical and structural errors, detect
them using macro-based fraud detection logic, and then correct them step by
step.
We will
apply:
- DATA step
- SET, MERGE
- PROC APPEND
- PROC TRANSPOSE
- PROC DATASETS DELETE
- MDY, INTCK, INTNX
- Numeric functions
- Character functions (STRIP,
TRIM, CAT, CATX, PROPCASE, UPCASE, LOWCASE, COALESCE)
- Utilization classification
- Percentage calculations
- Date formats
- Fraud detection rules
Table Of Contents
- Business Context
- Raw Dataset Creation With
Intentional Errors
- Macro-Based Standardization
- Error Detection Logic
- Fraud Detection Logic
- Date Handling (MDY, INTCK,
INTNX)
- Utilization Classification
- APPEND, SET, MERGE Usage
- PROC TRANSPOSE
- PROC DATASETS DELETE
- Full Corrected Dataset Code
- Explanation Of Every Macro
- Conclusion
Business Context
Universities
monitor:
- Enrollment levels
- Pass rates
- Dropout rates
- Student feedback
- Teaching mode utilization
However, data
often contains:
- Wrong percentages (>100%)
- Negative enrollment
- Mixed case department names
- Trailing blanks
- Missing values
- Wrong date formats
- Duplicate course codes
- Fraudulent manipulation of
pass rate
1. Create Raw Dataset With Intentional Errors
data university_raw;
length Course_Code $10 Department $30 Teaching_Mode $20;
format Start_Date End_Date date9.;
input Course_Code $
Department :$30.
Enrollment
Pass_Rate
Avg_Feedback_Score
Dropout_Rate
Teaching_Mode $
Start_Date :date9.
End_Date :date9.;
datalines;
cs101 computer_science 120 85 4.5 10 offline 01JAN2024 30APR2024
cs102 COMPUTER_SCIENCE 150 110 4.2 5 online 01JAN2024 30APR2024
cs103 Computer_Science -20 75 3.8 8 hybrid 01JAN2024 30APR2024
ec201 economics 200 88 4.0 7 offline 15JAN2024 15MAY2024
ec202 ECONOMICS 180 92 4.1 3 online 15JAN2024 15MAY2024
mg301 management 250 95 4.8 -5 offline 01FEB2024 30JUN2024
mg302 Management 240 89 . 6 hybrid 01FEB2024 30JUN2024
bt401 biotechnology 130 60 3.2 45 offline 01MAR2024 31JUL2024
bt402 BIOTECHNOLOGY 125 102 4.9 2 online 01MAR2024 31JUL2024
mt501 mathematics 90 85 4.3 5 offline 01JAN2024 30APR2024
mt501 mathematics 95 86 4.4 4 offline 01JAN2024 30APR2024
ph601 physics 0 70 3.9 30 hybrid 01APR2024 31AUG2024
;
run;
%macro a;
proc print data=university_raw;
run;
%mend a;
%a;
OUTPUT:
| Obs | Course_Code | Department | Teaching_Mode | Start_Date | End_Date | Enrollment | Pass_Rate | Avg_Feedback_Score | Dropout_Rate |
|---|---|---|---|---|---|---|---|---|---|
| 1 | cs101 | computer_science | offline | 01JAN2024 | 30APR2024 | 120 | 85 | 4.5 | 10 |
| 2 | cs102 | COMPUTER_SCIENCE | online | 01JAN2024 | 30APR2024 | 150 | 110 | 4.2 | 5 |
| 3 | cs103 | Computer_Science | hybrid | 01JAN2024 | 30APR2024 | -20 | 75 | 3.8 | 8 |
| 4 | ec201 | economics | offline | 15JAN2024 | 15MAY2024 | 200 | 88 | 4.0 | 7 |
| 5 | ec202 | ECONOMICS | online | 15JAN2024 | 15MAY2024 | 180 | 92 | 4.1 | 3 |
| 6 | mg301 | management | offline | 01FEB2024 | 30JUN2024 | 250 | 95 | 4.8 | -5 |
| 7 | mg302 | Management | hybrid | 01FEB2024 | 30JUN2024 | 240 | 89 | . | 6 |
| 8 | bt401 | biotechnology | offline | 01MAR2024 | 31JUL2024 | 130 | 60 | 3.2 | 45 |
| 9 | bt402 | BIOTECHNOLOGY | online | 01MAR2024 | 31JUL2024 | 125 | 102 | 4.9 | 2 |
| 10 | mt501 | mathematics | offline | 01JAN2024 | 30APR2024 | 90 | 85 | 4.3 | 5 |
| 11 | mt501 | mathematics | offline | 01JAN2024 | 30APR2024 | 95 | 86 | 4.4 | 4 |
| 12 | ph601 | physics | hybrid | 01APR2024 | 31AUG2024 | 0 | 70 | 3.9 | :30 |
Errors Introduced:
1. Pass_Rate
> 100 (110, 102)
2.
Negative Enrollment (-20)
3.
Negative Dropout_Rate (-5)
4.
Missing Feedback Score
5.
Duplicate Course_Code (mt501)
6.
Enrollment = 0
7.
Mixed case Department
8.
Untrimmed character values
9.
Dropout 45% suspicious
10. No
utilization classification
11. No
course duration
12. No
fraud flag
2. Standardize Character Variables
%macro b;
data university_clean1;
set university_raw;
Department = propcase(strip(Department));
Teaching_Mode = upcase(strip(Teaching_Mode));
Course_Code = upcase(strip(Course_Code));
run;
proc print data=university_clean1;
run;
%mend b;
%b;
OUTPUT:
| Obs | Course_Code | Department | Teaching_Mode | Start_Date | End_Date | Enrollment | Pass_Rate | Avg_Feedback_Score | Dropout_Rate |
|---|---|---|---|---|---|---|---|---|---|
| 1 | CS101 | Computer_science | OFFLINE | 01JAN2024 | 30APR2024 | 120 | 85 | 4.5 | 10 |
| 2 | CS102 | Computer_science | ONLINE | 01JAN2024 | 30APR2024 | 150 | 110 | 4.2 | 5 |
| 3 | CS103 | Computer_science | HYBRID | 01JAN2024 | 30APR2024 | -20 | 75 | 3.8 | 8 |
| 4 | EC201 | Economics | OFFLINE | 15JAN2024 | 15MAY2024 | 200 | 88 | 4.0 | 7 |
| 5 | EC202 | Economics | ONLINE | 15JAN2024 | 15MAY2024 | 180 | 92 | 4.1 | 3 |
| 6 | MG301 | Management | OFFLINE | 01FEB2024 | 30JUN2024 | 250 | 95 | 4.8 | -5 |
| 7 | MG302 | Management | HYBRID | 01FEB2024 | 30JUN2024 | 240 | 89 | . | 6 |
| 8 | BT401 | Biotechnology | OFFLINE | 01MAR2024 | 31JUL2024 | 130 | 60 | 3.2 | 45 |
| 9 | BT402 | Biotechnology | ONLINE | 01MAR2024 | 31JUL2024 | 125 | 102 | 4.9 | 2 |
| 10 | MT501 | Mathematics | OFFLINE | 01JAN2024 | 30APR2024 | 90 | 85 | 4.3 | 5 |
| 11 | MT501 | Mathematics | OFFLINE | 01JAN2024 | 30APR2024 | 95 | 86 | 4.4 | 4 |
| 12 | PH601 | Physics | HYBRID | 01APR2024 | 31AUG2024 | 0 | 70 | 3.9 | 30 |
·
STRIP removes spaces
·
PROPCASE standardizes department
·
UPCASE standardizes codes
3. Fix Numeric Logical Errors
%macro c;
data university_clean2;
set university_clean1;
if Pass_Rate > 100 then Pass_Rate = 100;
if Pass_Rate < 0 then Pass_Rate = .;
if Enrollment < 0 then Enrollment = abs(Enrollment);
if Dropout_Rate < 0 then Dropout_Rate = 0;
Avg_Feedback_Score = coalesce(Avg_Feedback_Score,3.5);
run;
proc print data=university_clean2;
run;
%mend c;
%c;
OUTPUT:
| Obs | Course_Code | Department | Teaching_Mode | Start_Date | End_Date | Enrollment | Pass_Rate | Avg_Feedback_Score | Dropout_Rate |
|---|---|---|---|---|---|---|---|---|---|
| 1 | CS101 | Computer_science | OFFLINE | 01JAN2024 | 30APR2024 | 120 | 85 | 4.5 | 10 |
| 2 | CS102 | Computer_science | ONLINE | 01JAN2024 | 30APR2024 | 150 | 100 | 4.2 | 5 |
| 3 | CS103 | Computer_science | HYBRID | 01JAN2024 | 30APR2024 | 20 | 75 | 3.8 | 8 |
| 4 | EC201 | Economics | OFFLINE | 15JAN2024 | 15MAY2024 | 200 | 88 | 4.0 | 7 |
| 5 | EC202 | Economics | ONLINE | 15JAN2024 | 15MAY2024 | 180 | 92 | 4.1 | 3 |
| 6 | MG301 | Management | OFFLINE | 01FEB2024 | 30JUN2024 | 250 | 95 | 4.8 | 0 |
| 7 | MG302 | Management | HYBRID | 01FEB2024 | 30JUN2024 | 240 | 89 | 3.5 | 6 |
| 8 | BT401 | Biotechnology | OFFLINE | 01MAR2024 | 31JUL2024 | 130 | 60 | 3.2 | 45 |
| 9 | BT402 | Biotechnology | ONLINE | 01MAR2024 | 31JUL2024 | 125 | 100 | 4.9 | 2 |
| 10 | MT501 | Mathematics | OFFLINE | 01JAN2024 | 30APR2024 | 90 | 85 | 4.3 | 5 |
| 11 | MT501 | Mathematics | OFFLINE | 01JAN2024 | 30APR2024 | 95 | 86 | 4.4 | 4 |
| 12 | PH601 | Physics | HYBRID | 01APR2024 | 31AUG2024 | 0 | 70 | 3.9 | 30 |
·
ABS fixes negative enrollment
·
COALESCE fills missing feedback
·
Logical validation applied
4. Remove Duplicate Course Codes
%macro d;
proc sort data=university_clean2 nodupkey out=university_clean3;
by Course_Code;
run;
proc print data=university_clean3;
run;
%mend d;
%d;
OUTPUT:
| Obs | Course_Code | Department | Teaching_Mode | Start_Date | End_Date | Enrollment | Pass_Rate | Avg_Feedback_Score | Dropout_Rate |
|---|---|---|---|---|---|---|---|---|---|
| 1 | BT401 | Biotechnology | OFFLINE | 01MAR2024 | 31JUL2024 | 130 | 60 | 3.2 | 45 |
| 2 | BT402 | Biotechnology | ONLINE | 01MAR2024 | 31JUL2024 | 125 | 100 | 4.9 | 2 |
| 3 | CS101 | Computer_science | OFFLINE | 01JAN2024 | 30APR2024 | 120 | 85 | 4.5 | 10 |
| 4 | CS102 | Computer_science | ONLINE | 01JAN2024 | 30APR2024 | 150 | 100 | 4.2 | 5 |
| 5 | CS103 | Computer_science | HYBRID | 01JAN2024 | 30APR2024 | 20 | 75 | 3.8 | 8 |
| 6 | EC201 | Economics | OFFLINE | 15JAN2024 | 15MAY2024 | 200 | 88 | 4.0 | 7 |
| 7 | EC202 | Economics | ONLINE | 15JAN2024 | 15MAY2024 | 180 | 92 | 4.1 | 3 |
| 8 | MG301 | Management | OFFLINE | 01FEB2024 | 30JUN2024 | 250 | 95 | 4.8 | 0 |
| 9 | MG302 | Management | HYBRID | 01FEB2024 | 30JUN2024 | 240 | 89 | 3.5 | 6 |
| 10 | MT501 | Mathematics | OFFLINE | 01JAN2024 | 30APR2024 | 90 | 85 | 4.3 | 5 |
| 11 | PH601 | Physics | HYBRID | 01APR2024 | 31AUG2024 | 0 | 70 | 3.9 | 30 |
5. Create Course Duration Using INTCK
%macro e;
data university_clean4;
set university_clean3;
Course_Duration_Days = intck('day', Start_Date, End_Date);
run;
proc print data=university_clean4;
run;
%mend e;
%e;
OUTPUT:
| Obs | Course_Code | Department | Teaching_Mode | Start_Date | End_Date | Enrollment | Pass_Rate | Avg_Feedback_Score | Dropout_Rate | Course_Duration_Days |
|---|---|---|---|---|---|---|---|---|---|---|
| 1 | BT401 | Biotechnology | OFFLINE | 01MAR2024 | 31JUL2024 | 130 | 60 | 3.2 | 45 | 152 |
| 2 | BT402 | Biotechnology | ONLINE | 01MAR2024 | 31JUL2024 | 125 | 100 | 4.9 | 2 | 152 |
| 3 | CS101 | Computer_science | OFFLINE | 01JAN2024 | 30APR2024 | 120 | 85 | 4.5 | 10 | 120 |
| 4 | CS102 | Computer_science | ONLINE | 01JAN2024 | 30APR2024 | 150 | 100 | 4.2 | 5 | 120 |
| 5 | CS103 | Computer_science | HYBRID | 01JAN2024 | 30APR2024 | 20 | 75 | 3.8 | 8 | 120 |
| 6 | EC201 | Economics | OFFLINE | 15JAN2024 | 15MAY2024 | 200 | 88 | 4.0 | 7 | 121 |
| 7 | EC202 | Economics | ONLINE | 15JAN2024 | 15MAY2024 | 180 | 92 | 4.1 | 3 | 121 |
| 8 | MG301 | Management | OFFLINE | 01FEB2024 | 30JUN2024 | 250 | 95 | 4.8 | 0 | 150 |
| 9 | MG302 | Management | HYBRID | 01FEB2024 | 30JUN2024 | 240 | 89 | 3.5 | 6 | 150 |
| 10 | MT501 | Mathematics | OFFLINE | 01JAN2024 | 30APR2024 | 90 | 85 | 4.3 | 5 | 120 |
| 11 | PH601 | Physics | HYBRID | 01APR2024 | 31AUG2024 | 0 | 70 | 3.9 | 30 | 152 |
6. Create Next Session Date Using INTNX
%macro f;
data university_clean5;
set university_clean4;
Next_Session = intnx('month', End_Date, 6, 'same');
format Next_Session date9.;
run;
proc print data=university_clean5;
run;
%mend f;
%f;
OUTPUT:
| Obs | Course_Code | Department | Teaching_Mode | Start_Date | End_Date | Enrollment | Pass_Rate | Avg_Feedback_Score | Dropout_Rate | Course_Duration_Days | Next_Session |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | BT401 | Biotechnology | OFFLINE | 01MAR2024 | 31JUL2024 | 130 | 60 | 3.2 | 45 | 152 | 31JAN2025 |
| 2 | BT402 | Biotechnology | ONLINE | 01MAR2024 | 31JUL2024 | 125 | 100 | 4.9 | 2 | 152 | 31JAN2025 |
| 3 | CS101 | Computer_science | OFFLINE | 01JAN2024 | 30APR2024 | 120 | 85 | 4.5 | 10 | 120 | 30OCT2024 |
| 4 | CS102 | Computer_science | ONLINE | 01JAN2024 | 30APR2024 | 150 | 100 | 4.2 | 5 | 120 | 30OCT2024 |
| 5 | CS103 | Computer_science | HYBRID | 01JAN2024 | 30APR2024 | 20 | 75 | 3.8 | 8 | 120 | 30OCT2024 |
| 6 | EC201 | Economics | OFFLINE | 15JAN2024 | 15MAY2024 | 200 | 88 | 4.0 | 7 | 121 | 15NOV2024 |
| 7 | EC202 | Economics | ONLINE | 15JAN2024 | 15MAY2024 | 180 | 92 | 4.1 | 3 | 121 | 15NOV2024 |
| 8 | MG301 | Management | OFFLINE | 01FEB2024 | 30JUN2024 | 250 | 95 | 4.8 | 0 | 150 | 30DEC2024 |
| 9 | MG302 | Management | HYBRID | 01FEB2024 | 30JUN2024 | 240 | 89 | 3.5 | 6 | 150 | 30DEC2024 |
| 10 | MT501 | Mathematics | OFFLINE | 01JAN2024 | 30APR2024 | 90 | 85 | 4.3 | 5 | 120 | 30OCT2024 |
| 11 | PH601 | Physics | HYBRID | 01APR2024 | 31AUG2024 | 0 | 70 | 3.9 | 30 | 152 | 28FEB2025 |
7. Create Utilization Percentage
%macro g;
data university_clean6;
set university_clean5;
Max_Capacity = 300;
Utilization_Percentage = (Enrollment/Max_Capacity)*100;
run;
proc print data=university_clean6;
run;
%mend g;
%g;
OUTPUT:
| Obs | Course_Code | Department | Teaching_Mode | Start_Date | End_Date | Enrollment | Pass_Rate | Avg_Feedback_Score | Dropout_Rate | Course_Duration_Days | Next_Session | Max_Capacity | Utilization_Percentage |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | BT401 | Biotechnology | OFFLINE | 01MAR2024 | 31JUL2024 | 130 | 60 | 3.2 | 45 | 152 | 31JAN2025 | 300 | 43.3333 |
| 2 | BT402 | Biotechnology | ONLINE | 01MAR2024 | 31JUL2024 | 125 | 100 | 4.9 | 2 | 152 | 31JAN2025 | 300 | 41.6667 |
| 3 | CS101 | Computer_science | OFFLINE | 01JAN2024 | 30APR2024 | 120 | 85 | 4.5 | 10 | 120 | 30OCT2024 | 300 | 40.0000 |
| 4 | CS102 | Computer_science | ONLINE | 01JAN2024 | 30APR2024 | 150 | 100 | 4.2 | 5 | 120 | 30OCT2024 | 300 | 50.0000 |
| 5 | CS103 | Computer_science | HYBRID | 01JAN2024 | 30APR2024 | 20 | 75 | 3.8 | 8 | 120 | 30OCT2024 | 300 | 6.6667 |
| 6 | EC201 | Economics | OFFLINE | 15JAN2024 | 15MAY2024 | 200 | 88 | 4.0 | 7 | 121 | 15NOV2024 | 300 | 66.6667 |
| 7 | EC202 | Economics | ONLINE | 15JAN2024 | 15MAY2024 | 180 | 92 | 4.1 | 3 | 121 | 15NOV2024 | 300 | 60.0000 |
| 8 | MG301 | Management | OFFLINE | 01FEB2024 | 30JUN2024 | 250 | 95 | 4.8 | 0 | 150 | 30DEC2024 | 300 | 83.3333 |
| 9 | MG302 | Management | HYBRID | 01FEB2024 | 30JUN2024 | 240 | 89 | 3.5 | 6 | 150 | 30DEC2024 | 300 | 80.0000 |
| 10 | MT501 | Mathematics | OFFLINE | 01JAN2024 | 30APR2024 | 90 | 85 | 4.3 | 5 | 120 | 30OCT2024 | 300 | 30.0000 |
| 11 | PH601 | Physics | HYBRID | 01APR2024 | 31AUG2024 | 0 | 70 | 3.9 | 30 | 152 | 28FEB2025 | 300 | 0.0000 |
8. Utilization Classification
%macro h;
data university_clean7;
set university_clean6;
length Utilization_Class $20;
if Utilization_Percentage >=80 then Utilization_Class="High";
else if Utilization_Percentage >=50 then Utilization_Class="Medium";
else Utilization_Class="Low";
run;
proc print data=university_clean7;
run;
%mend h;
%h;
OUTPUT:
| Obs | Course_Code | Department | Teaching_Mode | Start_Date | End_Date | Enrollment | Pass_Rate | Avg_Feedback_Score | Dropout_Rate | Course_Duration_Days | Next_Session | Max_Capacity | Utilization_Percentage | Utilization_Class |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | BT401 | Biotechnology | OFFLINE | 01MAR2024 | 31JUL2024 | 130 | 60 | 3.2 | 45 | 152 | 31JAN2025 | 300 | 43.3333 | Low |
| 2 | BT402 | Biotechnology | ONLINE | 01MAR2024 | 31JUL2024 | 125 | 100 | 4.9 | 2 | 152 | 31JAN2025 | 300 | 41.6667 | Low |
| 3 | CS101 | Computer_science | OFFLINE | 01JAN2024 | 30APR2024 | 120 | 85 | 4.5 | 10 | 120 | 30OCT2024 | 300 | 40.0000 | Low |
| 4 | CS102 | Computer_science | ONLINE | 01JAN2024 | 30APR2024 | 150 | 100 | 4.2 | 5 | 120 | 30OCT2024 | 300 | 50.0000 | Medium |
| 5 | CS103 | Computer_science | HYBRID | 01JAN2024 | 30APR2024 | 20 | 75 | 3.8 | 8 | 120 | 30OCT2024 | 300 | 6.6667 | Low |
| 6 | EC201 | Economics | OFFLINE | 15JAN2024 | 15MAY2024 | 200 | 88 | 4.0 | 7 | 121 | 15NOV2024 | 300 | 66.6667 | Medium |
| 7 | EC202 | Economics | ONLINE | 15JAN2024 | 15MAY2024 | 180 | 92 | 4.1 | 3 | 121 | 15NOV2024 | 300 | 60.0000 | Medium |
| 8 | MG301 | Management | OFFLINE | 01FEB2024 | 30JUN2024 | 250 | 95 | 4.8 | 0 | 150 | 30DEC2024 | 300 | 83.3333 | High |
| 9 | MG302 | Management | HYBRID | 01FEB2024 | 30JUN2024 | 240 | 89 | 3.5 | 6 | 150 | 30DEC2024 | 300 | 80.0000 | High |
| 10 | MT501 | Mathematics | OFFLINE | 01JAN2024 | 30APR2024 | 90 | 85 | 4.3 | 5 | 120 | 30OCT2024 | 300 | 30.0000 | Low |
| 11 | PH601 | Physics | HYBRID | 01APR2024 | 31AUG2024 | 0 | 70 | 3.9 | 30 | 152 | 28FEB2025 | 300 | 0.0000 | Low |
9. Fraud Detection Logic
%macro i;
data university_clean8;
set university_clean7;
length Fraud_Flag $10;
if Pass_Rate=100 and Avg_Feedback_Score <3 then Fraud_Flag="Check";
else if Dropout_Rate >40 then Fraud_Flag="High Risk";
else Fraud_Flag="Normal";
run;
proc print data=university_clean8;
run;
%mend i;
%i;
OUTPUT:
| Obs | Course_Code | Department | Teaching_Mode | Start_Date | End_Date | Enrollment | Pass_Rate | Avg_Feedback_Score | Dropout_Rate | Course_Duration_Days | Next_Session | Max_Capacity | Utilization_Percentage | Utilization_Class | Fraud_Flag |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | BT401 | Biotechnology | OFFLINE | 01MAR2024 | 31JUL2024 | 130 | 60 | 3.2 | 45 | 152 | 31JAN2025 | 300 | 43.3333 | Low | High Risk |
| 2 | BT402 | Biotechnology | ONLINE | 01MAR2024 | 31JUL2024 | 125 | 100 | 4.9 | 2 | 152 | 31JAN2025 | 300 | 41.6667 | Low | Normal |
| 3 | CS101 | Computer_science | OFFLINE | 01JAN2024 | 30APR2024 | 120 | 85 | 4.5 | 10 | 120 | 30OCT2024 | 300 | 40.0000 | Low | Normal |
| 4 | CS102 | Computer_science | ONLINE | 01JAN2024 | 30APR2024 | 150 | 100 | 4.2 | 5 | 120 | 30OCT2024 | 300 | 50.0000 | Medium | Normal |
| 5 | CS103 | Computer_science | HYBRID | 01JAN2024 | 30APR2024 | 20 | 75 | 3.8 | 8 | 120 | 30OCT2024 | 300 | 6.6667 | Low | Normal |
| 6 | EC201 | Economics | OFFLINE | 15JAN2024 | 15MAY2024 | 200 | 88 | 4.0 | 7 | 121 | 15NOV2024 | 300 | 66.6667 | Medium | Normal |
| 7 | EC202 | Economics | ONLINE | 15JAN2024 | 15MAY2024 | 180 | 92 | 4.1 | 3 | 121 | 15NOV2024 | 300 | 60.0000 | Medium | Normal |
| 8 | MG301 | Management | OFFLINE | 01FEB2024 | 30JUN2024 | 250 | 95 | 4.8 | 0 | 150 | 30DEC2024 | 300 | 83.3333 | High | Normal |
| 9 | MG302 | Management | HYBRID | 01FEB2024 | 30JUN2024 | 240 | 89 | 3.5 | 6 | 150 | 30DEC2024 | 300 | 80.0000 | High | Normal |
| 10 | MT501 | Mathematics | OFFLINE | 01JAN2024 | 30APR2024 | 90 | 85 | 4.3 | 5 | 120 | 30OCT2024 | 300 | 30.0000 | Low | Normal |
| 11 | PH601 | Physics | HYBRID | 01APR2024 | 31AUG2024 | 0 | 70 | 3.9 | 30 | 152 | 28FEB2025 | 300 | 0.0000 | Low | Normal |
10. Append New Courses
%macro j;
data new_courses;
length Course_Code $10 Department $30 Teaching_Mode $20;
Course_Code="CS999";
Department="Computer Science";
Enrollment=220;
Pass_Rate=90;
Avg_Feedback_Score=4.6;
Dropout_Rate=4;
Teaching_Mode="ONLINE";
Start_Date=mdy(7,1,2024);
End_Date=mdy(10,30,2024);
run;
proc print data=new_courses;
run;
proc append base=university_clean8 data=new_courses;
run;
proc print data=university_clean8;
run;
%mend j;
%j;
OUTPUT:
| Obs | Course_Code | Department | Teaching_Mode | Enrollment | Pass_Rate | Avg_Feedback_Score | Dropout_Rate | Start_Date | End_Date |
|---|---|---|---|---|---|---|---|---|---|
| 1 | CS999 | Computer Science | ONLINE | 220 | 90 | 4.6 | 4 | 23558 | 23679 |
| Obs | Course_Code | Department | Teaching_Mode | Start_Date | End_Date | Enrollment | Pass_Rate | Avg_Feedback_Score | Dropout_Rate | Course_Duration_Days | Next_Session | Max_Capacity | Utilization_Percentage | Utilization_Class | Fraud_Flag |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | BT401 | Biotechnology | OFFLINE | 01MAR2024 | 31JUL2024 | 130 | 60 | 3.2 | 45 | 152 | 31JAN2025 | 300 | 43.3333 | Low | High Risk |
| 2 | BT402 | Biotechnology | ONLINE | 01MAR2024 | 31JUL2024 | 125 | 100 | 4.9 | 2 | 152 | 31JAN2025 | 300 | 41.6667 | Low | Normal |
| 3 | CS101 | Computer_science | OFFLINE | 01JAN2024 | 30APR2024 | 120 | 85 | 4.5 | 10 | 120 | 30OCT2024 | 300 | 40.0000 | Low | Normal |
| 4 | CS102 | Computer_science | ONLINE | 01JAN2024 | 30APR2024 | 150 | 100 | 4.2 | 5 | 120 | 30OCT2024 | 300 | 50.0000 | Medium | Normal |
| 5 | CS103 | Computer_science | HYBRID | 01JAN2024 | 30APR2024 | 20 | 75 | 3.8 | 8 | 120 | 30OCT2024 | 300 | 6.6667 | Low | Normal |
| 6 | EC201 | Economics | OFFLINE | 15JAN2024 | 15MAY2024 | 200 | 88 | 4.0 | 7 | 121 | 15NOV2024 | 300 | 66.6667 | Medium | Normal |
| 7 | EC202 | Economics | ONLINE | 15JAN2024 | 15MAY2024 | 180 | 92 | 4.1 | 3 | 121 | 15NOV2024 | 300 | 60.0000 | Medium | Normal |
| 8 | MG301 | Management | OFFLINE | 01FEB2024 | 30JUN2024 | 250 | 95 | 4.8 | 0 | 150 | 30DEC2024 | 300 | 83.3333 | High | Normal |
| 9 | MG302 | Management | HYBRID | 01FEB2024 | 30JUN2024 | 240 | 89 | 3.5 | 6 | 150 | 30DEC2024 | 300 | 80.0000 | High | Normal |
| 10 | MT501 | Mathematics | OFFLINE | 01JAN2024 | 30APR2024 | 90 | 85 | 4.3 | 5 | 120 | 30OCT2024 | 300 | 30.0000 | Low | Normal |
| 11 | PH601 | Physics | HYBRID | 01APR2024 | 31AUG2024 | 0 | 70 | 3.9 | 30 | 152 | 28FEB2025 | 300 | 0.0000 | Low | Normal |
| 12 | CS999 | Computer Science | ONLINE | 01JUL2024 | 30OCT2024 | 220 | 90 | 4.6 | 4 | . | . | . | . |
11. Merge Department Budget Dataset
data dept_budget;
length Department $30;
input Department $ Budget Pass_Rate;
datalines;
Computer_Science 500000 80
Economics 300000 90
Management 400000 95
Biotechnology 350000 53
Mathematics 200000 78
Physics 250000 69
;
run;
%macro k;
proc print data=dept_budget;
run;
%mend k;
%k;
OUTPUT:
| Obs | Department | Budget | Pass_Rate |
|---|---|---|---|
| 1 | Computer_Science | 500000 | 80 |
| 2 | Economics | 300000 | 90 |
| 3 | Management | 400000 | 95 |
| 4 | Biotechnology | 350000 | 53 |
| 5 | Mathematics | 200000 | 78 |
| 6 | Physics | 250000 | 69 |
%macro l;
proc sort data=university_clean8;by Department;run;
proc print data=university_clean8;
run;
proc sort data=dept_budget;by Department;run;
proc print data=dept_budget;
run;
data university_final1;
merge university_clean8
dept_budget;
by Department;
run;
proc print data=university_final1;
run;
proc transpose data=university_final1 out=transposed_pass;
by Department NotSorted;
run;
proc print data=transposed_pass;
run;
%mend l;
%l;
OUTPUT:
| Obs | Course_Code | Department | Teaching_Mode | Start_Date | End_Date | Enrollment | Pass_Rate | Avg_Feedback_Score | Dropout_Rate | Course_Duration_Days | Next_Session | Max_Capacity | Utilization_Percentage | Utilization_Class | Fraud_Flag |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | BT401 | Biotechnology | OFFLINE | 01MAR2024 | 31JUL2024 | 130 | 60 | 3.2 | 45 | 152 | 31JAN2025 | 300 | 43.3333 | Low | High Risk |
| 2 | BT402 | Biotechnology | ONLINE | 01MAR2024 | 31JUL2024 | 125 | 100 | 4.9 | 2 | 152 | 31JAN2025 | 300 | 41.6667 | Low | Normal |
| 3 | CS999 | Computer Science | ONLINE | 01JUL2024 | 30OCT2024 | 220 | 90 | 4.6 | 4 | . | . | . | . | ||
| 4 | CS101 | Computer_science | OFFLINE | 01JAN2024 | 30APR2024 | 120 | 85 | 4.5 | 10 | 120 | 30OCT2024 | 300 | 40.0000 | Low | Normal |
| 5 | CS102 | Computer_science | ONLINE | 01JAN2024 | 30APR2024 | 150 | 100 | 4.2 | 5 | 120 | 30OCT2024 | 300 | 50.0000 | Medium | Normal |
| 6 | CS103 | Computer_science | HYBRID | 01JAN2024 | 30APR2024 | 20 | 75 | 3.8 | 8 | 120 | 30OCT2024 | 300 | 6.6667 | Low | Normal |
| 7 | EC201 | Economics | OFFLINE | 15JAN2024 | 15MAY2024 | 200 | 88 | 4.0 | 7 | 121 | 15NOV2024 | 300 | 66.6667 | Medium | Normal |
| 8 | EC202 | Economics | ONLINE | 15JAN2024 | 15MAY2024 | 180 | 92 | 4.1 | 3 | 121 | 15NOV2024 | 300 | 60.0000 | Medium | Normal |
| 9 | MG301 | Management | OFFLINE | 01FEB2024 | 30JUN2024 | 250 | 95 | 4.8 | 0 | 150 | 30DEC2024 | 300 | 83.3333 | High | Normal |
| 10 | MG302 | Management | HYBRID | 01FEB2024 | 30JUN2024 | 240 | 89 | 3.5 | 6 | 150 | 30DEC2024 | 300 | 80.0000 | High | Normal |
| 11 | MT501 | Mathematics | OFFLINE | 01JAN2024 | 30APR2024 | 90 | 85 | 4.3 | 5 | 120 | 30OCT2024 | 300 | 30.0000 | Low | Normal |
| 12 | PH601 | Physics | HYBRID | 01APR2024 | 31AUG2024 | 0 | 70 | 3.9 | 30 | 152 | 28FEB2025 | 300 | 0.0000 | Low | Normal |
| Obs | Department | Budget | Pass_Rate |
|---|---|---|---|
| 1 | Biotechnology | 350000 | 53 |
| 2 | Computer_Science | 500000 | 80 |
| 3 | Economics | 300000 | 90 |
| 4 | Management | 400000 | 95 |
| 5 | Mathematics | 200000 | 78 |
| 6 | Physics | 250000 | 69 |
| Obs | Course_Code | Department | Teaching_Mode | Start_Date | End_Date | Enrollment | Pass_Rate | Avg_Feedback_Score | Dropout_Rate | Course_Duration_Days | Next_Session | Max_Capacity | Utilization_Percentage | Utilization_Class | Fraud_Flag | Budget |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | BT401 | Biotechnology | OFFLINE | 01MAR2024 | 31JUL2024 | 130 | 53 | 3.2 | 45 | 152 | 31JAN2025 | 300 | 43.3333 | Low | High Risk | 350000 |
| 2 | BT402 | Biotechnology | ONLINE | 01MAR2024 | 31JUL2024 | 125 | 100 | 4.9 | 2 | 152 | 31JAN2025 | 300 | 41.6667 | Low | Normal | 350000 |
| 3 | CS999 | Computer Science | ONLINE | 01JUL2024 | 30OCT2024 | 220 | 90 | 4.6 | 4 | . | . | . | . | . | ||
| 4 | Computer_Science | . | . | . | 80 | . | . | . | . | . | . | 500000 | ||||
| 5 | CS101 | Computer_science | OFFLINE | 01JAN2024 | 30APR2024 | 120 | 85 | 4.5 | 10 | 120 | 30OCT2024 | 300 | 40.0000 | Low | Normal | . |
| 6 | CS102 | Computer_science | ONLINE | 01JAN2024 | 30APR2024 | 150 | 100 | 4.2 | 5 | 120 | 30OCT2024 | 300 | 50.0000 | Medium | Normal | . |
| 7 | CS103 | Computer_science | HYBRID | 01JAN2024 | 30APR2024 | 20 | 75 | 3.8 | 8 | 120 | 30OCT2024 | 300 | 6.6667 | Low | Normal | . |
| 8 | EC201 | Economics | OFFLINE | 15JAN2024 | 15MAY2024 | 200 | 90 | 4.0 | 7 | 121 | 15NOV2024 | 300 | 66.6667 | Medium | Normal | 300000 |
| 9 | EC202 | Economics | ONLINE | 15JAN2024 | 15MAY2024 | 180 | 92 | 4.1 | 3 | 121 | 15NOV2024 | 300 | 60.0000 | Medium | Normal | 300000 |
| 10 | MG301 | Management | OFFLINE | 01FEB2024 | 30JUN2024 | 250 | 95 | 4.8 | 0 | 150 | 30DEC2024 | 300 | 83.3333 | High | Normal | 400000 |
| 11 | MG302 | Management | HYBRID | 01FEB2024 | 30JUN2024 | 240 | 89 | 3.5 | 6 | 150 | 30DEC2024 | 300 | 80.0000 | High | Normal | 400000 |
| 12 | MT501 | Mathematics | OFFLINE | 01JAN2024 | 30APR2024 | 90 | 78 | 4.3 | 5 | 120 | 30OCT2024 | 300 | 30.0000 | Low | Normal | 200000 |
| 13 | PH601 | Physics | HYBRID | 01APR2024 | 31AUG2024 | 0 | 69 | 3.9 | 30 | 152 | 28FEB2025 | 300 | 0.0000 | Low | Normal | 250000 |
| Obs | Department | _NAME_ | COL1 | COL2 | COL3 |
|---|---|---|---|---|---|
| 1 | Biotechnology | Start_Date | 23436.00 | 23436.00 | . |
| 2 | Biotechnology | End_Date | 23588.00 | 23588.00 | . |
| 3 | Biotechnology | Enrollment | 130.00 | 125.00 | . |
| 4 | Biotechnology | Pass_Rate | 53.00 | 100.00 | . |
| 5 | Biotechnology | Avg_Feedback_Score | 3.20 | 4.90 | . |
| 6 | Biotechnology | Dropout_Rate | 45.00 | 2.00 | . |
| 7 | Biotechnology | Course_Duration_Days | 152.00 | 152.00 | . |
| 8 | Biotechnology | Next_Session | 23772.00 | 23772.00 | . |
| 9 | Biotechnology | Max_Capacity | 300.00 | 300.00 | . |
| 10 | Biotechnology | Utilization_Percentage | 43.33 | 41.67 | . |
| 11 | Biotechnology | Budget | 350000.00 | 350000.00 | . |
| 12 | Computer Science | Start_Date | 23558.00 | . | . |
| 13 | Computer Science | End_Date | 23679.00 | . | . |
| 14 | Computer Science | Enrollment | 220.00 | . | . |
| 15 | Computer Science | Pass_Rate | 90.00 | . | . |
| 16 | Computer Science | Avg_Feedback_Score | 4.60 | . | . |
| 17 | Computer Science | Dropout_Rate | 4.00 | . | . |
| 18 | Computer Science | Course_Duration_Days | . | . | . |
| 19 | Computer Science | Next_Session | . | . | . |
| 20 | Computer Science | Max_Capacity | . | . | . |
| 21 | Computer Science | Utilization_Percentage | . | . | . |
| 22 | Computer Science | Budget | . | . | . |
| 23 | Computer_Science | Start_Date | . | . | . |
| 24 | Computer_Science | End_Date | . | . | . |
| 25 | Computer_Science | Enrollment | . | . | . |
| 26 | Computer_Science | Pass_Rate | 80.00 | . | . |
| 27 | Computer_Science | Avg_Feedback_Score | . | . | . |
| 28 | Computer_Science | Dropout_Rate | . | . | . |
| 29 | Computer_Science | Course_Duration_Days | . | . | . |
| 30 | Computer_Science | Next_Session | . | . | . |
| 31 | Computer_Science | Max_Capacity | . | . | . |
| 32 | Computer_Science | Utilization_Percentage | . | . | . |
| 33 | Computer_Science | Budget | 500000.00 | . | . |
| 34 | Computer_science | Start_Date | 23376.00 | 23376.00 | 23376.00 |
| 35 | Computer_science | End_Date | 23496.00 | 23496.00 | 23496.00 |
| 36 | Computer_science | Enrollment | 120.00 | 150.00 | 20.00 |
| 37 | Computer_science | Pass_Rate | 85.00 | 100.00 | 75.00 |
| 38 | Computer_science | Avg_Feedback_Score | 4.50 | 4.20 | 3.80 |
| 39 | Computer_science | Dropout_Rate | 10.00 | 5.00 | 8.00 |
| 40 | Computer_science | Course_Duration_Days | 120.00 | 120.00 | 120.00 |
| 41 | Computer_science | Next_Session | 23679.00 | 23679.00 | 23679.00 |
| 42 | Computer_science | Max_Capacity | 300.00 | 300.00 | 300.00 |
| 43 | Computer_science | Utilization_Percentage | 40.00 | 50.00 | 6.67 |
| 44 | Computer_science | Budget | . | . | . |
| 45 | Economics | Start_Date | 23390.00 | 23390.00 | . |
| 46 | Economics | End_Date | 23511.00 | 23511.00 | . |
| 47 | Economics | Enrollment | 200.00 | 180.00 | . |
| 48 | Economics | Pass_Rate | 90.00 | 92.00 | . |
| 49 | Economics | Avg_Feedback_Score | 4.00 | 4.10 | . |
| 50 | Economics | Dropout_Rate | 7.00 | 3.00 | . |
| 51 | Economics | Course_Duration_Days | 121.00 | 121.00 | . |
| 52 | Economics | Next_Session | 23695.00 | 23695.00 | . |
| 53 | Economics | Max_Capacity | 300.00 | 300.00 | . |
| 54 | Economics | Utilization_Percentage | 66.67 | 60.00 | . |
| 55 | Economics | Budget | 300000.00 | 300000.00 | . |
| 56 | Management | Start_Date | 23407.00 | 23407.00 | . |
| 57 | Management | End_Date | 23557.00 | 23557.00 | . |
| 58 | Management | Enrollment | 250.00 | 240.00 | . |
| 59 | Management | Pass_Rate | 95.00 | 89.00 | . |
| 60 | Management | Avg_Feedback_Score | 4.80 | 3.50 | . |
| 61 | Management | Dropout_Rate | 0.00 | 6.00 | . |
| 62 | Management | Course_Duration_Days | 150.00 | 150.00 | . |
| 63 | Management | Next_Session | 23740.00 | 23740.00 | . |
| 64 | Management | Max_Capacity | 300.00 | 300.00 | . |
| 65 | Management | Utilization_Percentage | 83.33 | 80.00 | . |
| 66 | Management | Budget | 400000.00 | 400000.00 | . |
| 67 | Mathematics | Start_Date | 23376.00 | . | . |
| 68 | Mathematics | End_Date | 23496.00 | . | . |
| 69 | Mathematics | Enrollment | 90.00 | . | . |
| 70 | Mathematics | Pass_Rate | 78.00 | . | . |
| 71 | Mathematics | Avg_Feedback_Score | 4.30 | . | . |
| 72 | Mathematics | Dropout_Rate | 5.00 | . | . |
| 73 | Mathematics | Course_Duration_Days | 120.00 | . | . |
| 74 | Mathematics | Next_Session | 23679.00 | . | . |
| 75 | Mathematics | Max_Capacity | 300.00 | . | . |
| 76 | Mathematics | Utilization_Percentage | 30.00 | . | . |
| 77 | Mathematics | Budget | 200000.00 | . | . |
| 78 | Physics | Start_Date | 23467.00 | . | . |
| 79 | Physics | End_Date | 23619.00 | . | . |
| 80 | Physics | Enrollment | 0.00 | . | . |
| 81 | Physics | Pass_Rate | 69.00 | . | . |
| 82 | Physics | Avg_Feedback_Score | 3.90 | . | . |
| 83 | Physics | Dropout_Rate | 30.00 | . | . |
| 84 | Physics | Course_Duration_Days | 152.00 | . | . |
| 85 | Physics | Next_Session | 23800.00 | . | . |
| 86 | Physics | Max_Capacity | 300.00 | . | . |
| 87 | Physics | Utilization_Percentage | 0.00 | . | . |
| 88 | Physics | Budget | 250000.00 | . | . |
12. Numeric Functions Example
%macro m;
data university_final2;
set university_final1;
Rounded_Pass = round(Pass_Rate,1);
Ceil_Enroll = ceil(Enrollment);
Floor_Enroll = floor(Enrollment);
run;
proc print data=university_final2;
run;
%mend m;
%m;
OUTPUT:
| Obs | Course_Code | Department | Teaching_Mode | Start_Date | End_Date | Enrollment | Pass_Rate | Avg_Feedback_Score | Dropout_Rate | Course_Duration_Days | Next_Session | Max_Capacity | Utilization_Percentage | Utilization_Class | Fraud_Flag | Budget | Rounded_Pass | Ceil_Enroll | Floor_Enroll |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | BT401 | Biotechnology | OFFLINE | 01MAR2024 | 31JUL2024 | 130 | 53 | 3.2 | 45 | 152 | 31JAN2025 | 300 | 43.3333 | Low | High Risk | 350000 | 53 | 130 | 130 |
| 2 | BT402 | Biotechnology | ONLINE | 01MAR2024 | 31JUL2024 | 125 | 100 | 4.9 | 2 | 152 | 31JAN2025 | 300 | 41.6667 | Low | Normal | 350000 | 100 | 125 | 125 |
| 3 | CS999 | Computer Science | ONLINE | 01JUL2024 | 30OCT2024 | 220 | 90 | 4.6 | 4 | . | . | . | . | . | 90 | 220 | 220 | ||
| 4 | Computer_Science | . | . | . | 80 | . | . | . | . | . | . | 500000 | 80 | . | . | ||||
| 5 | CS101 | Computer_science | OFFLINE | 01JAN2024 | 30APR2024 | 120 | 85 | 4.5 | 10 | 120 | 30OCT2024 | 300 | 40.0000 | Low | Normal | . | 85 | 120 | 120 |
| 6 | CS102 | Computer_science | ONLINE | 01JAN2024 | 30APR2024 | 150 | 100 | 4.2 | 5 | 120 | 30OCT2024 | 300 | 50.0000 | Medium | Normal | . | 100 | 150 | 150 |
| 7 | CS103 | Computer_science | HYBRID | 01JAN2024 | 30APR2024 | 20 | 75 | 3.8 | 8 | 120 | 30OCT2024 | 300 | 6.6667 | Low | Normal | . | 75 | 20 | 20 |
| 8 | EC201 | Economics | OFFLINE | 15JAN2024 | 15MAY2024 | 200 | 90 | 4.0 | 7 | 121 | 15NOV2024 | 300 | 66.6667 | Medium | Normal | 300000 | 90 | 200 | 200 |
| 9 | EC202 | Economics | ONLINE | 15JAN2024 | 15MAY2024 | 180 | 92 | 4.1 | 3 | 121 | 15NOV2024 | 300 | 60.0000 | Medium | Normal | 300000 | 92 | 180 | 180 |
| 10 | MG301 | Management | OFFLINE | 01FEB2024 | 30JUN2024 | 250 | 95 | 4.8 | 0 | 150 | 30DEC2024 | 300 | 83.3333 | High | Normal | 400000 | 95 | 250 | 250 |
| 11 | MG302 | Management | HYBRID | 01FEB2024 | 30JUN2024 | 240 | 89 | 3.5 | 6 | 150 | 30DEC2024 | 300 | 80.0000 | High | Normal | 400000 | 89 | 240 | 240 |
| 12 | MT501 | Mathematics | OFFLINE | 01JAN2024 | 30APR2024 | 90 | 78 | 4.3 | 5 | 120 | 30OCT2024 | 300 | 30.0000 | Low | Normal | 200000 | 78 | 90 | 90 |
| 13 | PH601 | Physics | HYBRID | 01APR2024 | 31AUG2024 | 0 | 69 | 3.9 | 30 | 152 | 28FEB2025 | 300 | 0.0000 | Low | Normal | 250000 | 69 | 0 | 0 |
13. Character Concatenation
%macro n;
data university_final3;
set university_final2;
Course_Label = catx('-',Course_Code,Department);
Mode_Label = cat(Teaching_Mode," MODE");
run;
proc print data=university_final3;
run;
%mend n;
%n;
OUTPUT:
| Obs | Course_Code | Department | Teaching_Mode | Start_Date | End_Date | Enrollment | Pass_Rate | Avg_Feedback_Score | Dropout_Rate | Course_Duration_Days | Next_Session | Max_Capacity | Utilization_Percentage | Utilization_Class | Fraud_Flag | Budget | Rounded_Pass | Ceil_Enroll | Floor_Enroll | Course_Label | Mode_Label |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | BT401 | Biotechnology | OFFLINE | 01MAR2024 | 31JUL2024 | 130 | 53 | 3.2 | 45 | 152 | 31JAN2025 | 300 | 43.3333 | Low | High Risk | 350000 | 53 | 130 | 130 | BT401-Biotechnology | OFFLINE MODE |
| 2 | BT402 | Biotechnology | ONLINE | 01MAR2024 | 31JUL2024 | 125 | 100 | 4.9 | 2 | 152 | 31JAN2025 | 300 | 41.6667 | Low | Normal | 350000 | 100 | 125 | 125 | BT402-Biotechnology | ONLINE MODE |
| 3 | CS999 | Computer Science | ONLINE | 01JUL2024 | 30OCT2024 | 220 | 90 | 4.6 | 4 | . | . | . | . | . | 90 | 220 | 220 | CS999-Computer Science | ONLINE MODE | ||
| 4 | Computer_Science | . | . | . | 80 | . | . | . | . | . | . | 500000 | 80 | . | . | Computer_Science | MODE | ||||
| 5 | CS101 | Computer_science | OFFLINE | 01JAN2024 | 30APR2024 | 120 | 85 | 4.5 | 10 | 120 | 30OCT2024 | 300 | 40.0000 | Low | Normal | . | 85 | 120 | 120 | CS101-Computer_science | OFFLINE MODE |
| 6 | CS102 | Computer_science | ONLINE | 01JAN2024 | 30APR2024 | 150 | 100 | 4.2 | 5 | 120 | 30OCT2024 | 300 | 50.0000 | Medium | Normal | . | 100 | 150 | 150 | CS102-Computer_science | ONLINE MODE |
| 7 | CS103 | Computer_science | HYBRID | 01JAN2024 | 30APR2024 | 20 | 75 | 3.8 | 8 | 120 | 30OCT2024 | 300 | 6.6667 | Low | Normal | . | 75 | 20 | 20 | CS103-Computer_science | HYBRID MODE |
| 8 | EC201 | Economics | OFFLINE | 15JAN2024 | 15MAY2024 | 200 | 90 | 4.0 | 7 | 121 | 15NOV2024 | 300 | 66.6667 | Medium | Normal | 300000 | 90 | 200 | 200 | EC201-Economics | OFFLINE MODE |
| 9 | EC202 | Economics | ONLINE | 15JAN2024 | 15MAY2024 | 180 | 92 | 4.1 | 3 | 121 | 15NOV2024 | 300 | 60.0000 | Medium | Normal | 300000 | 92 | 180 | 180 | EC202-Economics | ONLINE MODE |
| 10 | MG301 | Management | OFFLINE | 01FEB2024 | 30JUN2024 | 250 | 95 | 4.8 | 0 | 150 | 30DEC2024 | 300 | 83.3333 | High | Normal | 400000 | 95 | 250 | 250 | MG301-Management | OFFLINE MODE |
| 11 | MG302 | Management | HYBRID | 01FEB2024 | 30JUN2024 | 240 | 89 | 3.5 | 6 | 150 | 30DEC2024 | 300 | 80.0000 | High | Normal | 400000 | 89 | 240 | 240 | MG302-Management | HYBRID MODE |
| 12 | MT501 | Mathematics | OFFLINE | 01JAN2024 | 30APR2024 | 90 | 78 | 4.3 | 5 | 120 | 30OCT2024 | 300 | 30.0000 | Low | Normal | 200000 | 78 | 90 | 90 | MT501-Mathematics | OFFLINE MODE |
| 13 | PH601 | Physics | HYBRID | 01APR2024 | 31AUG2024 | 0 | 69 | 3.9 | 30 | 152 | 28FEB2025 | 300 | 0.0000 | Low | Normal | 250000 | 69 | 0 | 0 | PH601-Physics | HYBRID MODE |
14. Delete Temporary Datasets
%macro o;
proc datasets library=work nolist;
delete university_raw university_clean1 university_clean2;
quit;
%mend o;
%o;
LOG:
15. Full Corrected Final Dataset Code
data university_master;
set university_final3;
if Enrollment=0 then delete;
Pass_Rate_Percent = Pass_Rate/100;
Dropout_Percent = Dropout_Rate/100;
run;
proc print data=university_master;
run;
OUTPUT:
| Obs | Course_Code | Department | Teaching_Mode | Start_Date | End_Date | Enrollment | Pass_Rate | Avg_Feedback_Score | Dropout_Rate | Course_Duration_Days | Next_Session | Max_Capacity | Utilization_Percentage | Utilization_Class | Fraud_Flag | Budget | Rounded_Pass | Ceil_Enroll | Floor_Enroll | Course_Label | Mode_Label | Pass_Rate_Percent | Dropout_Percent |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | BT401 | Biotechnology | OFFLINE | 01MAR2024 | 31JUL2024 | 130 | 53 | 3.2 | 45 | 152 | 31JAN2025 | 300 | 43.3333 | Low | High Risk | 350000 | 53 | 130 | 130 | BT401-Biotechnology | OFFLINE MODE | 0.53 | 0.45 |
| 2 | BT402 | Biotechnology | ONLINE | 01MAR2024 | 31JUL2024 | 125 | 100 | 4.9 | 2 | 152 | 31JAN2025 | 300 | 41.6667 | Low | Normal | 350000 | 100 | 125 | 125 | BT402-Biotechnology | ONLINE MODE | 1.00 | 0.02 |
| 3 | CS999 | Computer Science | ONLINE | 01JUL2024 | 30OCT2024 | 220 | 90 | 4.6 | 4 | . | . | . | . | . | 90 | 220 | 220 | CS999-Computer Science | ONLINE MODE | 0.90 | 0.04 | ||
| 4 | Computer_Science | . | . | . | 80 | . | . | . | . | . | . | 500000 | 80 | . | . | Computer_Science | MODE | 0.80 | . | ||||
| 5 | CS101 | Computer_science | OFFLINE | 01JAN2024 | 30APR2024 | 120 | 85 | 4.5 | 10 | 120 | 30OCT2024 | 300 | 40.0000 | Low | Normal | . | 85 | 120 | 120 | CS101-Computer_science | OFFLINE MODE | 0.85 | 0.10 |
| 6 | CS102 | Computer_science | ONLINE | 01JAN2024 | 30APR2024 | 150 | 100 | 4.2 | 5 | 120 | 30OCT2024 | 300 | 50.0000 | Medium | Normal | . | 100 | 150 | 150 | CS102-Computer_science | ONLINE MODE | 1.00 | 0.05 |
| 7 | CS103 | Computer_science | HYBRID | 01JAN2024 | 30APR2024 | 20 | 75 | 3.8 | 8 | 120 | 30OCT2024 | 300 | 6.6667 | Low | Normal | . | 75 | 20 | 20 | CS103-Computer_science | HYBRID MODE | 0.75 | 0.08 |
| 8 | EC201 | Economics | OFFLINE | 15JAN2024 | 15MAY2024 | 200 | 90 | 4.0 | 7 | 121 | 15NOV2024 | 300 | 66.6667 | Medium | Normal | 300000 | 90 | 200 | 200 | EC201-Economics | OFFLINE MODE | 0.90 | 0.07 |
| 9 | EC202 | Economics | ONLINE | 15JAN2024 | 15MAY2024 | 180 | 92 | 4.1 | 3 | 121 | 15NOV2024 | 300 | 60.0000 | Medium | Normal | 300000 | 92 | 180 | 180 | EC202-Economics | ONLINE MODE | 0.92 | 0.03 |
| 10 | MG301 | Management | OFFLINE | 01FEB2024 | 30JUN2024 | 250 | 95 | 4.8 | 0 | 150 | 30DEC2024 | 300 | 83.3333 | High | Normal | 400000 | 95 | 250 | 250 | MG301-Management | OFFLINE MODE | 0.95 | 0.00 |
| 11 | MG302 | Management | HYBRID | 01FEB2024 | 30JUN2024 | 240 | 89 | 3.5 | 6 | 150 | 30DEC2024 | 300 | 80.0000 | High | Normal | 400000 | 89 | 240 | 240 | MG302-Management | HYBRID MODE | 0.89 | 0.06 |
| 12 | MT501 | Mathematics | OFFLINE | 01JAN2024 | 30APR2024 | 90 | 78 | 4.3 | 5 | 120 | 30OCT2024 | 300 | 30.0000 | Low | Normal | 200000 | 78 | 90 | 90 | MT501-Mathematics | OFFLINE MODE | 0.78 | 0.05 |
Explanation:
Created Intentional Errors
·
Universities manipulate pass rate
·
Enrollment data may be wrongly entered
·
Dropout rate may be negative due to entry
mistake
·
Feedback may be missing
·
Duplicate course codes may exist
Why Use STRIP?
Removes leading and trailing blanks.
Important when joining datasets.
Why Use PROPCASE?
Standardizes department name format.
"computer science" → "Computer Science"
Why Use UPCASE?
Standardizes Course_Code.
Ensures no duplicate due to case mismatch.
Why Use COALESCE?
If feedback score missing, assign default 3.5.
Why Use INTCK?
Calculates difference between two dates.
Used to find course duration.
Why Use INTNX?
Find next session date automatically.
Why Use MDY?
Creates date using numeric values.
MDY(7,1,2024) = 01JUL2024
Why Use APPEND?
Adds new courses without rewriting full
dataset.
Why Use MERGE?
Combine budget dataset with course dataset.
Why Use TRANSPOSE?
Convert rows to columns.
Used in reporting.
Why Use PROC DATASETS DELETE?
Clean temporary datasets.
Improves memory efficiency.
Fraud Detection Logic Explanation
Fraud patterns:
1.
Pass rate 100% but low feedback → suspicious
2.
Dropout > 40% → high risk
3.
Enrollment 0 → invalid
4.
Negative values → data entry fraud
Business Impact
After cleaning:
·
Accurate utilization reporting
·
Reliable academic analytics
·
Fraud risk identification
·
Capacity planning improved
·
Budget allocation justified
Conclusion
We successfully built a University Course
Analytics and Fraud Detection System in SAS using only macros from A to O.
·
Created dataset with 12+ observations
·
Inserted intentional errors
·
Corrected logical numeric errors
·
Standardized character fields
·
Calculated percentages
·
Applied utilization classification
·
Used MDY, INTCK, INTNX
·
Applied APPEND, MERGE, SET
·
Used numeric and character functions
·
Applied fraud detection logic
·
Deleted temporary datasets
·
Delivered a fully cleaned master dataset
SAS INTERVIEW QUESTIONS
· How do you handle missing values?
· What is reconciliation?
· What is log checking?
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
About the Author:
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
About the Author:
SAS Learning Hub is a data analytics and SAS programming platform focused on clinical, financial, and real-world data analysis. The content is created by professionals with academic training in Pharmaceutics and hands-on experience in Base SAS, PROC SQL, Macros, SDTM, and ADaM, providing practical and industry-relevant SAS learning resources.
Disclaimer:
The datasets and analysis in this article are created for educational and demonstration purposes only. They do not represent UNIVERSITY data.
Our Mission:
This blog provides industry-focused SAS programming tutorials and analytics projects covering finance, healthcare, and technology.
This project is suitable for:
· Students learning SAS
· Data analysts building portfolios
· Professionals preparing for SAS interviews
· Bloggers writing about analytics
· Clinical SAS Programmer
· Research Data Analyst
· Regulatory Data Validator
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Comments
Post a Comment