417.Can We Design, Debug, Detect Fraud, and Optimize Large Public Event Management Using Advanced SAS Programming Techniques?
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
HERE IN THIS PROJECT WE USED THESE SAS STATEMENTS —DATA STEP| SET | IF-THEN-ELSE | LENGTH | FORMAT | INTCK | INTNX | MDY | MERGE | PROC SORT | PROC SQL | PROC APPEND | PROC TRANSPOSE | PROC REPORT | PROC FORMAT | MACRO (%MACRO / %MEND) | HASH OBJECT | PROC DATASETS DELETE | RUN | QUIT
Introduction
Managing large public events such as concerts,
expos, political rallies, and sports finals requires advanced operational
monitoring. Crowd control, security risk detection, staff deployment
optimization, and fraud surveillance are critical components.
In this integrated SAS project, we designed a
complete Public Event Risk Surveillance
System that:
·
Creates structured datasets
·
Introduces intentional errors
·
Performs debugging and data correction
·
Applies fraud detection logic
·
Derives utilization metrics
·
Implements rolling risk monitoring
·
Uses macros, SQL, hash objects, and reporting
procedures
·
Builds audit and alert mechanisms
This project simulates a real-world operational
intelligence system similar to safety monitoring frameworks used in regulated
industries.
Table Of Contents
1.
Business Context
2.
Raw Dataset Creation with Intentional Errors
3.
Error Identification and Debugging Strategy
4.
Data Cleaning and Standardization
5.
Character and Numeric Function Applications
6.
Date Intelligence (MDY, INTCK, INTNX, DATEPART)
7.
Utilization and Risk Index Derivation
8.
Fraud Detection Macro Logic
9.
Revenue Per Capita Analysis
10. Duplicate
Detection and Data Integrity Checks
11. Dataset
Integration Using SET, MERGE, APPEND
12. Hash
Object Lookup Optimization
13. Rolling
30-Day Risk Monitoring
14. Peak
Hour Crowd Analysis
15. Risk
Band Classification Using PROC FORMAT
16. SQL-Based
Fraud Cross Validation
17. Weighted
Risk Model Enhancement
18. Growth
Trend Analysis Using LAG( )
19. PROC
TRANSPOSE for Structural Reshaping
20. Monthly
Risk Aggregation Macro
21. Dashboard
Reporting via PROC REPORT
22. Automated
Alert Dataset Creation
23. Audit
Trail Generation
24. QC
and Validation Framework
25. WORK Library Cleanup Using PROC DATASETS
Business Context
Imagine a
government intelligence department monitoring large public events across India
to:
- Detect abnormal crowd surges
- Prevent ticket fraud
- Monitor security threats
- Optimize staff deployment
- Predict crowd risk levels
Variables
included:
|
Variable |
Description |
|
Event_Name |
Name of
public event |
|
Location |
City |
|
Attendees |
Total
attendance |
|
Entry_Time |
Entry
timestamp |
|
Security_Incidents |
Reported
incidents |
|
Staff_Deployed |
Number
of staff |
|
Crowd_Risk_Index |
Calculated
risk % |
|
Ticket_Revenue |
Revenue
collected |
|
Fraud_Flag |
Fraud
indicator |
1. Raw Dataset Creation (With Intentional Errors)
data public_events_raw;
length Event_Name $50 Location $30;
format Entry_Time datetime20.;
input Event_Name $ Location $ Attendees Entry_Time :datetime20.
Security_Incidents Staff_Deployed Crowd_Risk_Index Ticket_Revenue;
datalines;
MusicFest Hyderabad 50000 01JAN2026:10:00:00 5 200 85 2500000
TechExpo Bangalore 30000 05FEB2026:09:30:00 2 150 70 1800000
CricketFinal Mumbai 120000 10MAR2026:14:00:00 15 500 95 8000000
PoliticalRally Delhi 200000 20APR2026:11:00:00 25 300 110 10000000
FoodCarnival Chennai -5000 15MAY2026:16:00:00 1 100 40 500000
ArtExpo Pune 15000 25JUN2026:10:30:00 . 80 55 600000
;
run;
proc print data=public_events_raw;
run;
OUTPUT:
| Obs | Event_Name | Location | Entry_Time | Attendees | Security_Incidents | Staff_Deployed | Crowd_Risk_Index | Ticket_Revenue |
|---|---|---|---|---|---|---|---|---|
| 1 | MusicFest | Hyderabad | 01JAN2026:10:00:00 | 50000 | 5 | 200 | 85 | 2500000 |
| 2 | TechExpo | Bangalore | 05FEB2026:09:30:00 | 30000 | 2 | 150 | 70 | 1800000 |
| 3 | CricketFinal | Mumbai | 10MAR2026:14:00:00 | 120000 | 15 | 500 | 95 | 8000000 |
| 4 | PoliticalRally | Delhi | 20APR2026:11:00:00 | 200000 | 25 | 300 | 110 | 10000000 |
| 5 | FoodCarnival | Chennai | 15MAY2026:16:00:00 | -5000 | 1 | 100 | 40 | 500000 |
| 6 | ArtExpo | Pune | 25JUN2026:10:30:00 | 15000 | . | 80 | 55 | 600000 |
Intentional Errors
· Crowd_Risk_Index = 110 (Invalid >100)
·
Negative Attendees = -5000
·
Missing Security_Incidents
·
No Fraud_Flag
·
No Utilization classification
·
No date derivation
·
No cleaning of character strings
2. Corrected Full-Length SAS Code
data public_events_clean;
length Event_Name $50 Location $30 Fraud_Flag $3 Utilization_Class $10;
set public_events_raw;
Event_Name = propcase(strip(Event_Name));
Location = upcase(strip(Location));
if Attendees < 0 then Attendees = abs(Attendees);
Crowd_Risk_Index = coalesce(Crowd_Risk_Index, 0);
if Crowd_Risk_Index > 100 then Crowd_Risk_Index = 100;
Event_Date = datepart(Entry_Time);
Event_Month = intnx('month', Event_Date, 0, 'b');
Days_Since_Event = intck('day', Event_Date, today());
Staff_Utilization = (Attendees / Staff_Deployed);
if Staff_Utilization > 400 then Utilization_Class="High";
else if Staff_Utilization >200 then Utilization_Class="Medium";
else Utilization_Class="Low";
Crowd_Risk_Index = round((Security_Incidents*5 + Staff_Utilization/10),0.1);
format Event_Date date9. Event_Month monyy7.;
run;
proc print data=public_events_clean;
run;
OUTPUT:
| Obs | Event_Name | Location | Fraud_Flag | Utilization_Class | Entry_Time | Attendees | Security_Incidents | Staff_Deployed | Crowd_Risk_Index | Ticket_Revenue | Event_Date | Event_Month | Days_Since_Event | Staff_Utilization |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | Musicfest | HYDERABAD | Medium | 01JAN2026:10:00:00 | 50000 | 5 | 200 | 50.0 | 2500000 | 01JAN2026 | JAN2026 | 60 | 250.000 | |
| 2 | Techexpo | BANGALORE | Low | 05FEB2026:09:30:00 | 30000 | 2 | 150 | 30.0 | 1800000 | 05FEB2026 | FEB2026 | 25 | 200.000 | |
| 3 | Cricketfinal | MUMBAI | Medium | 10MAR2026:14:00:00 | 120000 | 15 | 500 | 99.0 | 8000000 | 10MAR2026 | MAR2026 | -8 | 240.000 | |
| 4 | Politicalrally | DELHI | High | 20APR2026:11:00:00 | 200000 | 25 | 300 | 191.7 | 10000000 | 20APR2026 | APR2026 | -49 | 666.667 | |
| 5 | Foodcarnival | CHENNAI | Low | 15MAY2026:16:00:00 | 5000 | 1 | 100 | 10.0 | 500000 | 15MAY2026 | MAY2026 | -74 | 50.000 | |
| 6 | Artexpo | PUNE | Low | 25JUN2026:10:30:00 | 15000 | . | 80 | . | 600000 | 25JUN2026 | JUN2026 | -115 | 187.500 |
·
Removes older versions
·
Prevents overwrite confusion
·
Maintains clean WORK library
3. Macro for Fraud Detection
%macro detect_fraud(input=,output=);
data &output.;
set &input.;
length Fraud_Flag $3;
if Crowd_Risk_Index > 90 or Security_Incidents > 20 then Fraud_Flag="YES";
else Fraud_Flag="NO";
run;
proc print data=&output.;
run;
%mend;
%detect_fraud(input=public_events_clean,output=public_events_final);
OUTPUT:
| Obs | Event_Name | Location | Fraud_Flag | Utilization_Class | Entry_Time | Attendees | Security_Incidents | Staff_Deployed | Crowd_Risk_Index | Ticket_Revenue | Event_Date | Event_Month | Days_Since_Event | Staff_Utilization |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | Musicfest | HYDERABAD | NO | Medium | 01JAN2026:10:00:00 | 50000 | 5 | 200 | 50.0 | 2500000 | 01JAN2026 | JAN2026 | 60 | 250.000 |
| 2 | Techexpo | BANGALORE | NO | Low | 05FEB2026:09:30:00 | 30000 | 2 | 150 | 30.0 | 1800000 | 05FEB2026 | FEB2026 | 25 | 200.000 |
| 3 | Cricketfinal | MUMBAI | YES | Medium | 10MAR2026:14:00:00 | 120000 | 15 | 500 | 99.0 | 8000000 | 10MAR2026 | MAR2026 | -8 | 240.000 |
| 4 | Politicalrally | DELHI | YES | High | 20APR2026:11:00:00 | 200000 | 25 | 300 | 191.7 | 10000000 | 20APR2026 | APR2026 | -49 | 666.667 |
| 5 | Foodcarnival | CHENNAI | NO | Low | 15MAY2026:16:00:00 | 5000 | 1 | 100 | 10.0 | 500000 | 15MAY2026 | MAY2026 | -74 | 50.000 |
| 6 | Artexpo | PUNE | NO | Low | 25JUN2026:10:30:00 | 15000 | . | 80 | . | 600000 | 25JUN2026 | JUN2026 | -115 | 187.500 |
4. Append Example
data new_events;
set public_events_final;
where Location="HYDERABAD";
run;
proc print data=new_events;
run;
OUTPUT:
| Obs | Event_Name | Location | Fraud_Flag | Utilization_Class | Entry_Time | Attendees | Security_Incidents | Staff_Deployed | Crowd_Risk_Index | Ticket_Revenue | Event_Date | Event_Month | Days_Since_Event | Staff_Utilization |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | Musicfest | HYDERABAD | NO | Medium | 01JAN2026:10:00:00 | 50000 | 5 | 200 | 50 | 2500000 | 01JAN2026 | JAN2026 | 60 | 250 |
proc append base=public_events_final
data=new_events force;
run;
proc print data=public_events_final;
run;
OUTPUT:
| Obs | Event_Name | Location | Fraud_Flag | Utilization_Class | Entry_Time | Attendees | Security_Incidents | Staff_Deployed | Crowd_Risk_Index | Ticket_Revenue | Event_Date | Event_Month | Days_Since_Event | Staff_Utilization |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | Musicfest | HYDERABAD | NO | Medium | 01JAN2026:10:00:00 | 50000 | 5 | 200 | 50.0 | 2500000 | 01JAN2026 | JAN2026 | 60 | 250.000 |
| 2 | Techexpo | BANGALORE | NO | Low | 05FEB2026:09:30:00 | 30000 | 2 | 150 | 30.0 | 1800000 | 05FEB2026 | FEB2026 | 25 | 200.000 |
| 3 | Cricketfinal | MUMBAI | YES | Medium | 10MAR2026:14:00:00 | 120000 | 15 | 500 | 99.0 | 8000000 | 10MAR2026 | MAR2026 | -8 | 240.000 |
| 4 | Politicalrally | DELHI | YES | High | 20APR2026:11:00:00 | 200000 | 25 | 300 | 191.7 | 10000000 | 20APR2026 | APR2026 | -49 | 666.667 |
| 5 | Foodcarnival | CHENNAI | NO | Low | 15MAY2026:16:00:00 | 5000 | 1 | 100 | 10.0 | 500000 | 15MAY2026 | MAY2026 | -74 | 50.000 |
| 6 | Artexpo | PUNE | NO | Low | 25JUN2026:10:30:00 | 15000 | . | 80 | . | 600000 | 25JUN2026 | JUN2026 | -115 | 187.500 |
| 7 | Musicfest | HYDERABAD | NO | Medium | 01JAN2026:10:00:00 | 50000 | 5 | 200 | 50.0 | 2500000 | 01JAN2026 | JAN2026 | 60 | 250.000 |
5. Merge Example
data staff_bonus;
input Location:$30. Bonus;
datalines;
HYDERABAD 50000
MUMBAI 100000
;
run;
proc print data=staff_bonus;
run;
OUTPUT:
| Obs | Location | Bonus |
|---|---|---|
| 1 | HYDERABAD | 50000 |
| 2 | MUMBAI | 100000 |
proc sort data=public_events_final; by Location; run;
proc print data=public_events_final;
run;
OUTPUT:
| Obs | Event_Name | Location | Fraud_Flag | Utilization_Class | Entry_Time | Attendees | Security_Incidents | Staff_Deployed | Crowd_Risk_Index | Ticket_Revenue | Event_Date | Event_Month | Days_Since_Event | Staff_Utilization |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | Techexpo | BANGALORE | NO | Low | 05FEB2026:09:30:00 | 30000 | 2 | 150 | 30.0 | 1800000 | 05FEB2026 | FEB2026 | 25 | 200.000 |
| 2 | Foodcarnival | CHENNAI | NO | Low | 15MAY2026:16:00:00 | 5000 | 1 | 100 | 10.0 | 500000 | 15MAY2026 | MAY2026 | -74 | 50.000 |
| 3 | Politicalrally | DELHI | YES | High | 20APR2026:11:00:00 | 200000 | 25 | 300 | 191.7 | 10000000 | 20APR2026 | APR2026 | -49 | 666.667 |
| 4 | Musicfest | HYDERABAD | NO | Medium | 01JAN2026:10:00:00 | 50000 | 5 | 200 | 50.0 | 2500000 | 01JAN2026 | JAN2026 | 60 | 250.000 |
| 5 | Musicfest | HYDERABAD | NO | Medium | 01JAN2026:10:00:00 | 50000 | 5 | 200 | 50.0 | 2500000 | 01JAN2026 | JAN2026 | 60 | 250.000 |
| 6 | Cricketfinal | MUMBAI | YES | Medium | 10MAR2026:14:00:00 | 120000 | 15 | 500 | 99.0 | 8000000 | 10MAR2026 | MAR2026 | -8 | 240.000 |
| 7 | Artexpo | PUNE | NO | Low | 25JUN2026:10:30:00 | 15000 | . | 80 | . | 600000 | 25JUN2026 | JUN2026 | -115 | 187.500 |
proc sort data=staff_bonus; by Location; run;
proc print data=staff_bonus;
run;
OUTPUT:
| Obs | Location | Bonus |
|---|---|---|
| 1 | HYDERABAD | 50000 |
| 2 | MUMBAI | 100000 |
data merged_data;
merge public_events_final staff_bonus;
by Location;
run;
proc print data=merged_data;
run;
OUTPUT:
| Obs | Event_Name | Location | Fraud_Flag | Utilization_Class | Entry_Time | Attendees | Security_Incidents | Staff_Deployed | Crowd_Risk_Index | Ticket_Revenue | Event_Date | Event_Month | Days_Since_Event | Staff_Utilization | Bonus |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | Techexpo | BANGALORE | NO | Low | 05FEB2026:09:30:00 | 30000 | 2 | 150 | 30.0 | 1800000 | 05FEB2026 | FEB2026 | 25 | 200.000 | . |
| 2 | Foodcarnival | CHENNAI | NO | Low | 15MAY2026:16:00:00 | 5000 | 1 | 100 | 10.0 | 500000 | 15MAY2026 | MAY2026 | -74 | 50.000 | . |
| 3 | Politicalrally | DELHI | YES | High | 20APR2026:11:00:00 | 200000 | 25 | 300 | 191.7 | 10000000 | 20APR2026 | APR2026 | -49 | 666.667 | . |
| 4 | Musicfest | HYDERABAD | NO | Medium | 01JAN2026:10:00:00 | 50000 | 5 | 200 | 50.0 | 2500000 | 01JAN2026 | JAN2026 | 60 | 250.000 | 50000 |
| 5 | Musicfest | HYDERABAD | NO | Medium | 01JAN2026:10:00:00 | 50000 | 5 | 200 | 50.0 | 2500000 | 01JAN2026 | JAN2026 | 60 | 250.000 | 50000 |
| 6 | Cricketfinal | MUMBAI | YES | Medium | 10MAR2026:14:00:00 | 120000 | 15 | 500 | 99.0 | 8000000 | 10MAR2026 | MAR2026 | -8 | 240.000 | 100000 |
| 7 | Artexpo | PUNE | NO | Low | 25JUN2026:10:30:00 | 15000 | . | 80 | . | 600000 | 25JUN2026 | JUN2026 | -115 | 187.500 | . |
6. Transpose Example
proc sort data=merged_data; by Location; run;
proc print data=merged_data;
run;
OUTPUT:
| Obs | Event_Name | Location | Fraud_Flag | Utilization_Class | Entry_Time | Attendees | Security_Incidents | Staff_Deployed | Crowd_Risk_Index | Ticket_Revenue | Event_Date | Event_Month | Days_Since_Event | Staff_Utilization | Bonus |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | Techexpo | BANGALORE | NO | Low | 05FEB2026:09:30:00 | 30000 | 2 | 150 | 30.0 | 1800000 | 05FEB2026 | FEB2026 | 25 | 200.000 | . |
| 2 | Foodcarnival | CHENNAI | NO | Low | 15MAY2026:16:00:00 | 5000 | 1 | 100 | 10.0 | 500000 | 15MAY2026 | MAY2026 | -74 | 50.000 | . |
| 3 | Politicalrally | DELHI | YES | High | 20APR2026:11:00:00 | 200000 | 25 | 300 | 191.7 | 10000000 | 20APR2026 | APR2026 | -49 | 666.667 | . |
| 4 | Musicfest | HYDERABAD | NO | Medium | 01JAN2026:10:00:00 | 50000 | 5 | 200 | 50.0 | 2500000 | 01JAN2026 | JAN2026 | 60 | 250.000 | 50000 |
| 5 | Musicfest | HYDERABAD | NO | Medium | 01JAN2026:10:00:00 | 50000 | 5 | 200 | 50.0 | 2500000 | 01JAN2026 | JAN2026 | 60 | 250.000 | 50000 |
| 6 | Cricketfinal | MUMBAI | YES | Medium | 10MAR2026:14:00:00 | 120000 | 15 | 500 | 99.0 | 8000000 | 10MAR2026 | MAR2026 | -8 | 240.000 | 100000 |
| 7 | Artexpo | PUNE | NO | Low | 25JUN2026:10:30:00 | 15000 | . | 80 | . | 600000 | 25JUN2026 | JUN2026 | -115 | 187.500 | . |
proc transpose data=merged_data out=transposed_data;
by Location;
var Attendees Staff_Deployed Security_Incidents;
run;
proc print data=transposed_data;
run;
OUTPUT:
| Obs | Location | _NAME_ | COL1 | COL2 |
|---|---|---|---|---|
| 1 | BANGALORE | Attendees | 30000 | . |
| 2 | BANGALORE | Staff_Deployed | 150 | . |
| 3 | BANGALORE | Security_Incidents | 2 | . |
| 4 | CHENNAI | Attendees | 5000 | . |
| 5 | CHENNAI | Staff_Deployed | 100 | . |
| 6 | CHENNAI | Security_Incidents | 1 | . |
| 7 | DELHI | Attendees | 200000 | . |
| 8 | DELHI | Staff_Deployed | 300 | . |
| 9 | DELHI | Security_Incidents | 25 | . |
| 10 | HYDERABAD | Attendees | 50000 | 50000 |
| 11 | HYDERABAD | Staff_Deployed | 200 | 200 |
| 12 | HYDERABAD | Security_Incidents | 5 | 5 |
| 13 | MUMBAI | Attendees | 120000 | . |
| 14 | MUMBAI | Staff_Deployed | 500 | . |
| 15 | MUMBAI | Security_Incidents | 15 | . |
| 16 | PUNE | Attendees | 15000 | . |
| 17 | PUNE | Staff_Deployed | 80 | . |
| 18 | PUNE | Security_Incidents | . | . |
7. Datasets Delete/Unwanted Datasets
proc datasets library=work nolist;
delete public_events_clean fraud_summary transposed_data;
quit;
LOG:
8. PROC FORMAT – Risk Band Classification
proc format;
value riskband
low-50 = "Low Risk"
50<-80 = "Moderate Risk"
80<-high = "High Risk";
run;
LOG:
·
Standardizes classification logic
·
Avoids hardcoding IF-THEN repeatedly
·
Regulatory-style mapping (like CDISC controlled terminology)
9. Apply Risk Format
data public_events_enhanced;
set public_events_final;
Risk_Band = put(Crowd_Risk_Index, riskband.);
run;
proc print data=public_events_enhanced;
run;
OUTPUT:
| Obs | Event_Name | Location | Fraud_Flag | Utilization_Class | Entry_Time | Attendees | Security_Incidents | Staff_Deployed | Crowd_Risk_Index | Ticket_Revenue | Event_Date | Event_Month | Days_Since_Event | Staff_Utilization | Risk_Band |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | Techexpo | BANGALORE | NO | Low | 05FEB2026:09:30:00 | 30000 | 2 | 150 | 30.0 | 1800000 | 05FEB2026 | FEB2026 | 25 | 200.000 | Low Risk |
| 2 | Foodcarnival | CHENNAI | NO | Low | 15MAY2026:16:00:00 | 5000 | 1 | 100 | 10.0 | 500000 | 15MAY2026 | MAY2026 | -74 | 50.000 | Low Risk |
| 3 | Politicalrally | DELHI | YES | High | 20APR2026:11:00:00 | 200000 | 25 | 300 | 191.7 | 10000000 | 20APR2026 | APR2026 | -49 | 666.667 | High Risk |
| 4 | Musicfest | HYDERABAD | NO | Medium | 01JAN2026:10:00:00 | 50000 | 5 | 200 | 50.0 | 2500000 | 01JAN2026 | JAN2026 | 60 | 250.000 | Low Risk |
| 5 | Musicfest | HYDERABAD | NO | Medium | 01JAN2026:10:00:00 | 50000 | 5 | 200 | 50.0 | 2500000 | 01JAN2026 | JAN2026 | 60 | 250.000 | Low Risk |
| 6 | Cricketfinal | MUMBAI | YES | Medium | 10MAR2026:14:00:00 | 120000 | 15 | 500 | 99.0 | 8000000 | 10MAR2026 | MAR2026 | -8 | 240.000 | High Risk |
| 7 | Artexpo | PUNE | NO | Low | 25JUN2026:10:30:00 | 15000 | . | 80 | . | 600000 | 25JUN2026 | JUN2026 | -115 | 187.500 | . |
·
Converts numeric risk to descriptive category
·
Improves interpretability
10. Duplicate Detection Logic
proc sort data=public_events_enhanced out=dedup_check nodupkey dupout=duplicates;
by Event_Name Event_Date Location;
run;
proc print data=dedup_check;
run;
OUTPUT:
| Obs | Event_Name | Location | Fraud_Flag | Utilization_Class | Entry_Time | Attendees | Security_Incidents | Staff_Deployed | Crowd_Risk_Index | Ticket_Revenue | Event_Date | Event_Month | Days_Since_Event | Staff_Utilization | Risk_Band |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | Artexpo | PUNE | NO | Low | 25JUN2026:10:30:00 | 15000 | . | 80 | . | 600000 | 25JUN2026 | JUN2026 | -115 | 187.500 | . |
| 2 | Cricketfinal | MUMBAI | YES | Medium | 10MAR2026:14:00:00 | 120000 | 15 | 500 | 99.0 | 8000000 | 10MAR2026 | MAR2026 | -8 | 240.000 | High Risk |
| 3 | Foodcarnival | CHENNAI | NO | Low | 15MAY2026:16:00:00 | 5000 | 1 | 100 | 10.0 | 500000 | 15MAY2026 | MAY2026 | -74 | 50.000 | Low Risk |
| 4 | Musicfest | HYDERABAD | NO | Medium | 01JAN2026:10:00:00 | 50000 | 5 | 200 | 50.0 | 2500000 | 01JAN2026 | JAN2026 | 60 | 250.000 | Low Risk |
| 5 | Politicalrally | DELHI | YES | High | 20APR2026:11:00:00 | 200000 | 25 | 300 | 191.7 | 10000000 | 20APR2026 | APR2026 | -49 | 666.667 | High Risk |
| 6 | Techexpo | BANGALORE | NO | Low | 05FEB2026:09:30:00 | 30000 | 2 | 150 | 30.0 | 1800000 | 05FEB2026 | FEB2026 | 25 | 200.000 | Low Risk |
proc print data=duplicates;
run;
OUTPUT:
| Obs | Event_Name | Location | Fraud_Flag | Utilization_Class | Entry_Time | Attendees | Security_Incidents | Staff_Deployed | Crowd_Risk_Index | Ticket_Revenue | Event_Date | Event_Month | Days_Since_Event | Staff_Utilization | Risk_Band |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | Musicfest | HYDERABAD | NO | Medium | 01JAN2026:10:00:00 | 50000 | 5 | 200 | 50 | 2500000 | 01JAN2026 | JAN2026 | 60 | 250 | Low Risk |
·
Prevents duplicate operational entries
·
dupout= captures duplicate dataset for audit
11. Revenue Per Capita Derivation
data revenue_analysis;
set dedup_check;
Revenue_Per_Capita = Ticket_Revenue / Attendees;
run;
proc print data=revenue_analysis;
run;
OUTPUT:
| Obs | Event_Name | Location | Fraud_Flag | Utilization_Class | Entry_Time | Attendees | Security_Incidents | Staff_Deployed | Crowd_Risk_Index | Ticket_Revenue | Event_Date | Event_Month | Days_Since_Event | Staff_Utilization | Risk_Band | Revenue_Per_Capita |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | Artexpo | PUNE | NO | Low | 25JUN2026:10:30:00 | 15000 | . | 80 | . | 600000 | 25JUN2026 | JUN2026 | -115 | 187.500 | . | 40.000 |
| 2 | Cricketfinal | MUMBAI | YES | Medium | 10MAR2026:14:00:00 | 120000 | 15 | 500 | 99.0 | 8000000 | 10MAR2026 | MAR2026 | -8 | 240.000 | High Risk | 66.667 |
| 3 | Foodcarnival | CHENNAI | NO | Low | 15MAY2026:16:00:00 | 5000 | 1 | 100 | 10.0 | 500000 | 15MAY2026 | MAY2026 | -74 | 50.000 | Low Risk | 100.000 |
| 4 | Musicfest | HYDERABAD | NO | Medium | 01JAN2026:10:00:00 | 50000 | 5 | 200 | 50.0 | 2500000 | 01JAN2026 | JAN2026 | 60 | 250.000 | Low Risk | 50.000 |
| 5 | Politicalrally | DELHI | YES | High | 20APR2026:11:00:00 | 200000 | 25 | 300 | 191.7 | 10000000 | 20APR2026 | APR2026 | -49 | 666.667 | High Risk | 50.000 |
| 6 | Techexpo | BANGALORE | NO | Low | 05FEB2026:09:30:00 | 30000 | 2 | 150 | 30.0 | 1800000 | 05FEB2026 | FEB2026 | 25 | 200.000 | Low Risk | 60.000 |
·
Detects ticket manipulation fraud
·
Low revenue per head = potential illegal
entries
12. Peak Hour Detection
data peak_analysis;
set revenue_analysis;
Entry_Hour = hour(Entry_Time);
if 15 < Entry_Hour < 21 then Peak_Flag="YES";
else Peak_Flag="NO";
run;
proc print data=peak_analysis;
run;
OUTPUT:
| Obs | Event_Name | Location | Fraud_Flag | Utilization_Class | Entry_Time | Attendees | Security_Incidents | Staff_Deployed | Crowd_Risk_Index | Ticket_Revenue | Event_Date | Event_Month | Days_Since_Event | Staff_Utilization | Risk_Band | Revenue_Per_Capita | Entry_Hour | Peak_Flag |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | Artexpo | PUNE | NO | Low | 25JUN2026:10:30:00 | 15000 | . | 80 | . | 600000 | 25JUN2026 | JUN2026 | -115 | 187.500 | . | 40.000 | 10 | NO |
| 2 | Cricketfinal | MUMBAI | YES | Medium | 10MAR2026:14:00:00 | 120000 | 15 | 500 | 99.0 | 8000000 | 10MAR2026 | MAR2026 | -8 | 240.000 | High Risk | 66.667 | 14 | NO |
| 3 | Foodcarnival | CHENNAI | NO | Low | 15MAY2026:16:00:00 | 5000 | 1 | 100 | 10.0 | 500000 | 15MAY2026 | MAY2026 | -74 | 50.000 | Low Risk | 100.000 | 16 | YES |
| 4 | Musicfest | HYDERABAD | NO | Medium | 01JAN2026:10:00:00 | 50000 | 5 | 200 | 50.0 | 2500000 | 01JAN2026 | JAN2026 | 60 | 250.000 | Low Risk | 50.000 | 10 | NO |
| 5 | Politicalrally | DELHI | YES | High | 20APR2026:11:00:00 | 200000 | 25 | 300 | 191.7 | 10000000 | 20APR2026 | APR2026 | -49 | 666.667 | High Risk | 50.000 | 11 | NO |
| 6 | Techexpo | BANGALORE | NO | Low | 05FEB2026:09:30:00 | 30000 | 2 | 150 | 30.0 | 1800000 | 05FEB2026 | FEB2026 | 25 | 200.000 | Low Risk | 60.000 | 9 | NO |
·
Identifies rush-hour crowd risk
·
Useful for deployment strategy
13. Rolling 30-Day Risk Monitoring
proc sort data=peak_analysis;
by Location Event_Date;
run;
proc print data=peak_analysis;
run;
OUTPUT:
| Obs | Event_Name | Location | Fraud_Flag | Utilization_Class | Entry_Time | Attendees | Security_Incidents | Staff_Deployed | Crowd_Risk_Index | Ticket_Revenue | Event_Date | Event_Month | Days_Since_Event | Staff_Utilization | Risk_Band | Revenue_Per_Capita | Entry_Hour | Peak_Flag |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | Techexpo | BANGALORE | NO | Low | 05FEB2026:09:30:00 | 30000 | 2 | 150 | 30.0 | 1800000 | 05FEB2026 | FEB2026 | 25 | 200.000 | Low Risk | 60.000 | 9 | NO |
| 2 | Foodcarnival | CHENNAI | NO | Low | 15MAY2026:16:00:00 | 5000 | 1 | 100 | 10.0 | 500000 | 15MAY2026 | MAY2026 | -74 | 50.000 | Low Risk | 100.000 | 16 | YES |
| 3 | Politicalrally | DELHI | YES | High | 20APR2026:11:00:00 | 200000 | 25 | 300 | 191.7 | 10000000 | 20APR2026 | APR2026 | -49 | 666.667 | High Risk | 50.000 | 11 | NO |
| 4 | Musicfest | HYDERABAD | NO | Medium | 01JAN2026:10:00:00 | 50000 | 5 | 200 | 50.0 | 2500000 | 01JAN2026 | JAN2026 | 60 | 250.000 | Low Risk | 50.000 | 10 | NO |
| 5 | Cricketfinal | MUMBAI | YES | Medium | 10MAR2026:14:00:00 | 120000 | 15 | 500 | 99.0 | 8000000 | 10MAR2026 | MAR2026 | -8 | 240.000 | High Risk | 66.667 | 14 | NO |
| 6 | Artexpo | PUNE | NO | Low | 25JUN2026:10:30:00 | 15000 | . | 80 | . | 600000 | 25JUN2026 | JUN2026 | -115 | 187.500 | . | 40.000 | 10 | NO |
data rolling_risk;
set peak_analysis;
by Location;
retain Prev_Date;
Days_Gap = intck('day', Prev_Date, Event_Date);
if Days_Gap <= 30 then Rolling_Risk_Flag="YES";
else Rolling_Risk_Flag="NO";
Prev_Date = Event_Date;
run;
proc print data=rolling_risk;
run;
OUTPUT:
| Obs | Event_Name | Location | Fraud_Flag | Utilization_Class | Entry_Time | Attendees | Security_Incidents | Staff_Deployed | Crowd_Risk_Index | Ticket_Revenue | Event_Date | Event_Month | Days_Since_Event | Staff_Utilization | Risk_Band | Revenue_Per_Capita | Entry_Hour | Peak_Flag | Prev_Date | Days_Gap | Rolling_Risk_Flag |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | Techexpo | BANGALORE | NO | Low | 05FEB2026:09:30:00 | 30000 | 2 | 150 | 30.0 | 1800000 | 05FEB2026 | FEB2026 | 25 | 200.000 | Low Risk | 60.000 | 9 | NO | 24142 | . | YES |
| 2 | Foodcarnival | CHENNAI | NO | Low | 15MAY2026:16:00:00 | 5000 | 1 | 100 | 10.0 | 500000 | 15MAY2026 | MAY2026 | -74 | 50.000 | Low Risk | 100.000 | 16 | YES | 24241 | 99 | NO |
| 3 | Politicalrally | DELHI | YES | High | 20APR2026:11:00:00 | 200000 | 25 | 300 | 191.7 | 10000000 | 20APR2026 | APR2026 | -49 | 666.667 | High Risk | 50.000 | 11 | NO | 24216 | -25 | YES |
| 4 | Musicfest | HYDERABAD | NO | Medium | 01JAN2026:10:00:00 | 50000 | 5 | 200 | 50.0 | 2500000 | 01JAN2026 | JAN2026 | 60 | 250.000 | Low Risk | 50.000 | 10 | NO | 24107 | -109 | YES |
| 5 | Cricketfinal | MUMBAI | YES | Medium | 10MAR2026:14:00:00 | 120000 | 15 | 500 | 99.0 | 8000000 | 10MAR2026 | MAR2026 | -8 | 240.000 | High Risk | 66.667 | 14 | NO | 24175 | 68 | NO |
| 6 | Artexpo | PUNE | NO | Low | 25JUN2026:10:30:00 | 15000 | . | 80 | . | 600000 | 25JUN2026 | JUN2026 | -115 | 187.500 | . | 40.000 | 10 | NO | 24282 | 107 | NO |
·
Detects frequent risky events in short
duration
·
Prevents repeated security lapses
14. HASH Object Lookup
data bonus_lookup;
if _N_=1 then do;
declare hash h(dataset:"staff_bonus");
h.defineKey("Location");
h.defineData("Bonus");
h.defineDone();
end;
set rolling_risk;
if h.find()=0 then Bonus_Assigned=Bonus;
else Bonus_Assigned=0;
run;
proc print data=bonus_lookup;
run;
OUTPUT:
| Obs | Event_Name | Location | Fraud_Flag | Utilization_Class | Entry_Time | Attendees | Security_Incidents | Staff_Deployed | Crowd_Risk_Index | Ticket_Revenue | Event_Date | Event_Month | Days_Since_Event | Staff_Utilization | Risk_Band | Revenue_Per_Capita | Entry_Hour | Peak_Flag | Prev_Date | Days_Gap | Rolling_Risk_Flag | Bonus_Assigned | Bonus |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | Techexpo | BANGALORE | NO | Low | 05FEB2026:09:30:00 | 30000 | 2 | 150 | 30.0 | 1800000 | 05FEB2026 | FEB2026 | 25 | 200.000 | Low Risk | 60.000 | 9 | NO | 24142 | . | YES | 0 | . |
| 2 | Foodcarnival | CHENNAI | NO | Low | 15MAY2026:16:00:00 | 5000 | 1 | 100 | 10.0 | 500000 | 15MAY2026 | MAY2026 | -74 | 50.000 | Low Risk | 100.000 | 16 | YES | 24241 | 99 | NO | 0 | . |
| 3 | Politicalrally | DELHI | YES | High | 20APR2026:11:00:00 | 200000 | 25 | 300 | 191.7 | 10000000 | 20APR2026 | APR2026 | -49 | 666.667 | High Risk | 50.000 | 11 | NO | 24216 | -25 | YES | 0 | . |
| 4 | Musicfest | HYDERABAD | NO | Medium | 01JAN2026:10:00:00 | 50000 | 5 | 200 | 50.0 | 2500000 | 01JAN2026 | JAN2026 | 60 | 250.000 | Low Risk | 50.000 | 10 | NO | 24107 | -109 | YES | 50000 | 50000 |
| 5 | Cricketfinal | MUMBAI | YES | Medium | 10MAR2026:14:00:00 | 120000 | 15 | 500 | 99.0 | 8000000 | 10MAR2026 | MAR2026 | -8 | 240.000 | High Risk | 66.667 | 14 | NO | 24175 | 68 | NO | 100000 | 100000 |
| 6 | Artexpo | PUNE | NO | Low | 25JUN2026:10:30:00 | 15000 | . | 80 | . | 600000 | 25JUN2026 | JUN2026 | -115 | 187.500 | . | 40.000 | 10 | NO | 24282 | 107 | NO | 0 | . |
·
Faster than MERGE
·
Efficient in large datasets
·
Real-world production optimization
15. Monthly Risk Summary Macro
%macro monthly_summary;
proc sql;
create table monthly_risk_summary as
select Location,
month(Event_Date) as Month,
mean(Crowd_Risk_Index) as Avg_Risk,
sum(Security_Incidents) as Total_Incidents
from bonus_lookup
group by Location, calculated Month;
quit;
proc print data=monthly_risk_summary;
run;
%mend;
%monthly_summary;
OUTPUT:
| Obs | Location | Month | Avg_Risk | Total_Incidents |
|---|---|---|---|---|
| 1 | BANGALORE | 2 | 30.0 | 2 |
| 2 | CHENNAI | 5 | 10.0 | 1 |
| 3 | DELHI | 4 | 191.7 | 25 |
| 4 | HYDERABAD | 1 | 50.0 | 5 |
| 5 | MUMBAI | 3 | 99.0 | 15 |
| 6 | PUNE | 6 | . | . |
·
Automated aggregation
·
Reusable reporting logic
16. PROC REPORT
proc report data=monthly_risk_summary nowd;
column Location Month Avg_Risk Total_Incidents;
run;
OUTPUT:
| Location | Month | Avg_Risk | Total_Incidents |
|---|---|---|---|
| BANGALORE | 2 | 30 | 2 |
| CHENNAI | 5 | 10 | 1 |
| DELHI | 4 | 191.7 | 25 |
| HYDERABAD | 1 | 50 | 5 |
| MUMBAI | 3 | 99 | 15 |
| PUNE | 6 | . | . |
·
Management reporting
·
Structured tabular output
17. PROC SQL Fraud Cross Validation
proc sql;
create table fraud_cross_check as
select *,
case
when Revenue_Per_Capita < 20 and Crowd_Risk_Index > 85 then "Revenue Mismatch"
when Security_Incidents > 20 then "High Security Threat"
else "Normal"
end as Fraud_Reason
from bonus_lookup;
quit;
proc print data=fraud_cross_check;
run;
OUTPUT:
| Obs | Event_Name | Location | Fraud_Flag | Utilization_Class | Entry_Time | Attendees | Security_Incidents | Staff_Deployed | Crowd_Risk_Index | Ticket_Revenue | Event_Date | Event_Month | Days_Since_Event | Staff_Utilization | Risk_Band | Revenue_Per_Capita | Entry_Hour | Peak_Flag | Prev_Date | Days_Gap | Rolling_Risk_Flag | Bonus_Assigned | Bonus | Fraud_Reason |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | Techexpo | BANGALORE | NO | Low | 05FEB2026:09:30:00 | 30000 | 2 | 150 | 30.0 | 1800000 | 05FEB2026 | FEB2026 | 25 | 200.000 | Low Risk | 60.000 | 9 | NO | 24142 | . | YES | 0 | . | Normal |
| 2 | Foodcarnival | CHENNAI | NO | Low | 15MAY2026:16:00:00 | 5000 | 1 | 100 | 10.0 | 500000 | 15MAY2026 | MAY2026 | -74 | 50.000 | Low Risk | 100.000 | 16 | YES | 24241 | 99 | NO | 0 | . | Normal |
| 3 | Politicalrally | DELHI | YES | High | 20APR2026:11:00:00 | 200000 | 25 | 300 | 191.7 | 10000000 | 20APR2026 | APR2026 | -49 | 666.667 | High Risk | 50.000 | 11 | NO | 24216 | -25 | YES | 0 | . | High Security Threat |
| 4 | Musicfest | HYDERABAD | NO | Medium | 01JAN2026:10:00:00 | 50000 | 5 | 200 | 50.0 | 2500000 | 01JAN2026 | JAN2026 | 60 | 250.000 | Low Risk | 50.000 | 10 | NO | 24107 | -109 | YES | 50000 | 50000 | Normal |
| 5 | Cricketfinal | MUMBAI | YES | Medium | 10MAR2026:14:00:00 | 120000 | 15 | 500 | 99.0 | 8000000 | 10MAR2026 | MAR2026 | -8 | 240.000 | High Risk | 66.667 | 14 | NO | 24175 | 68 | NO | 100000 | 100000 | Normal |
| 6 | Artexpo | PUNE | NO | Low | 25JUN2026:10:30:00 | 15000 | . | 80 | . | 600000 | 25JUN2026 | JUN2026 | -115 | 187.500 | . | 40.000 | 10 | NO | 24282 | 107 | NO | 0 | . | Normal |
·
Alternative validation method
·
SQL-style case logic
18. Integrity Check Dataset
data integrity_check;
set fraud_cross_check;
length Issue $20.;
if missing(Attendees) then Issue="Missing Attendees";
else if Attendees=0 then Issue="Zero Attendance";
else if Staff_Deployed=0 then Issue="No Staff Assigned";
run;
proc print data=integrity_check;
run;
OUTPUT:
| Obs | Event_Name | Location | Fraud_Flag | Utilization_Class | Entry_Time | Attendees | Security_Incidents | Staff_Deployed | Crowd_Risk_Index | Ticket_Revenue | Event_Date | Event_Month | Days_Since_Event | Staff_Utilization | Risk_Band | Revenue_Per_Capita | Entry_Hour | Peak_Flag | Prev_Date | Days_Gap | Rolling_Risk_Flag | Bonus_Assigned | Bonus | Fraud_Reason | Issue |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | Techexpo | BANGALORE | NO | Low | 05FEB2026:09:30:00 | 30000 | 2 | 150 | 30.0 | 1800000 | 05FEB2026 | FEB2026 | 25 | 200.000 | Low Risk | 60.000 | 9 | NO | 24142 | . | YES | 0 | . | Normal | |
| 2 | Foodcarnival | CHENNAI | NO | Low | 15MAY2026:16:00:00 | 5000 | 1 | 100 | 10.0 | 500000 | 15MAY2026 | MAY2026 | -74 | 50.000 | Low Risk | 100.000 | 16 | YES | 24241 | 99 | NO | 0 | . | Normal | |
| 3 | Politicalrally | DELHI | YES | High | 20APR2026:11:00:00 | 200000 | 25 | 300 | 191.7 | 10000000 | 20APR2026 | APR2026 | -49 | 666.667 | High Risk | 50.000 | 11 | NO | 24216 | -25 | YES | 0 | . | High Security Threat | |
| 4 | Musicfest | HYDERABAD | NO | Medium | 01JAN2026:10:00:00 | 50000 | 5 | 200 | 50.0 | 2500000 | 01JAN2026 | JAN2026 | 60 | 250.000 | Low Risk | 50.000 | 10 | NO | 24107 | -109 | YES | 50000 | 50000 | Normal | |
| 5 | Cricketfinal | MUMBAI | YES | Medium | 10MAR2026:14:00:00 | 120000 | 15 | 500 | 99.0 | 8000000 | 10MAR2026 | MAR2026 | -8 | 240.000 | High Risk | 66.667 | 14 | NO | 24175 | 68 | NO | 100000 | 100000 | Normal | |
| 6 | Artexpo | PUNE | NO | Low | 25JUN2026:10:30:00 | 15000 | . | 80 | . | 600000 | 25JUN2026 | JUN2026 | -115 | 187.500 | . | 40.000 | 10 | NO | 24282 | 107 | NO | 0 | . | Normal |
19. Automated Alert Dataset
data alert_dataset;
set fraud_cross_check;
if Fraud_Flag="YES" or Risk_Band="High Risk";
run;
proc print data=alert_dataset;
run;
OUTPUT:
| Obs | Event_Name | Location | Fraud_Flag | Utilization_Class | Entry_Time | Attendees | Security_Incidents | Staff_Deployed | Crowd_Risk_Index | Ticket_Revenue | Event_Date | Event_Month | Days_Since_Event | Staff_Utilization | Risk_Band | Revenue_Per_Capita | Entry_Hour | Peak_Flag | Prev_Date | Days_Gap | Rolling_Risk_Flag | Bonus_Assigned | Bonus | Fraud_Reason |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | Politicalrally | DELHI | YES | High | 20APR2026:11:00:00 | 200000 | 25 | 300 | 191.7 | 10000000 | 20APR2026 | APR2026 | -49 | 666.667 | High Risk | 50.0000 | 11 | NO | 24216 | -25 | YES | 0 | . | High Security Threat |
| 2 | Cricketfinal | MUMBAI | YES | Medium | 10MAR2026:14:00:00 | 120000 | 15 | 500 | 99.0 | 8000000 | 10MAR2026 | MAR2026 | -8 | 240.000 | High Risk | 66.6667 | 14 | NO | 24175 | 68 | NO | 100000 | 100000 | Normal |
20. Recalculated Weighted Risk Model
data weighted_model;
set fraud_cross_check;
Weighted_Risk =
(Security_Incidents*0.4)+ (Staff_Utilization*0.3)
+ (Revenue_Per_Capita*0.3);
run;
proc print data=weighted_model;
run;
OUTPUT:
| Obs | Event_Name | Location | Fraud_Flag | Utilization_Class | Entry_Time | Attendees | Security_Incidents | Staff_Deployed | Crowd_Risk_Index | Ticket_Revenue | Event_Date | Event_Month | Days_Since_Event | Staff_Utilization | Risk_Band | Revenue_Per_Capita | Entry_Hour | Peak_Flag | Prev_Date | Days_Gap | Rolling_Risk_Flag | Bonus_Assigned | Bonus | Fraud_Reason | Weighted_Risk |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | Techexpo | BANGALORE | NO | Low | 05FEB2026:09:30:00 | 30000 | 2 | 150 | 30.0 | 1800000 | 05FEB2026 | FEB2026 | 25 | 200.000 | Low Risk | 60.000 | 9 | NO | 24142 | . | YES | 0 | . | Normal | 78.8 |
| 2 | Foodcarnival | CHENNAI | NO | Low | 15MAY2026:16:00:00 | 5000 | 1 | 100 | 10.0 | 500000 | 15MAY2026 | MAY2026 | -74 | 50.000 | Low Risk | 100.000 | 16 | YES | 24241 | 99 | NO | 0 | . | Normal | 45.4 |
| 3 | Politicalrally | DELHI | YES | High | 20APR2026:11:00:00 | 200000 | 25 | 300 | 191.7 | 10000000 | 20APR2026 | APR2026 | -49 | 666.667 | High Risk | 50.000 | 11 | NO | 24216 | -25 | YES | 0 | . | High Security Threat | 225.0 |
| 4 | Musicfest | HYDERABAD | NO | Medium | 01JAN2026:10:00:00 | 50000 | 5 | 200 | 50.0 | 2500000 | 01JAN2026 | JAN2026 | 60 | 250.000 | Low Risk | 50.000 | 10 | NO | 24107 | -109 | YES | 50000 | 50000 | Normal | 92.0 |
| 5 | Cricketfinal | MUMBAI | YES | Medium | 10MAR2026:14:00:00 | 120000 | 15 | 500 | 99.0 | 8000000 | 10MAR2026 | MAR2026 | -8 | 240.000 | High Risk | 66.667 | 14 | NO | 24175 | 68 | NO | 100000 | 100000 | Normal | 98.0 |
| 6 | Artexpo | PUNE | NO | Low | 25JUN2026:10:30:00 | 15000 | . | 80 | . | 600000 | 25JUN2026 | JUN2026 | -115 | 187.500 | . | 40.000 | 10 | NO | 24282 | 107 | NO | 0 | . | Normal | . |
21. Event Growth Trend Using LAG( )
data growth_trend;
set weighted_model;
by Location;
Prev_Attendees = lag(Attendees);
Growth = Attendees - Prev_Attendees;
run;
proc print data=growth_trend;
run;
OUTPUT:
| Obs | Event_Name | Location | Fraud_Flag | Utilization_Class | Entry_Time | Attendees | Security_Incidents | Staff_Deployed | Crowd_Risk_Index | Ticket_Revenue | Event_Date | Event_Month | Days_Since_Event | Staff_Utilization | Risk_Band | Revenue_Per_Capita | Entry_Hour | Peak_Flag | Prev_Date | Days_Gap | Rolling_Risk_Flag | Bonus_Assigned | Bonus | Fraud_Reason | Weighted_Risk | Prev_Attendees | Growth |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | Techexpo | BANGALORE | NO | Low | 05FEB2026:09:30:00 | 30000 | 2 | 150 | 30.0 | 1800000 | 05FEB2026 | FEB2026 | 25 | 200.000 | Low Risk | 60.000 | 9 | NO | 24142 | . | YES | 0 | . | Normal | 78.8 | . | . |
| 2 | Foodcarnival | CHENNAI | NO | Low | 15MAY2026:16:00:00 | 5000 | 1 | 100 | 10.0 | 500000 | 15MAY2026 | MAY2026 | -74 | 50.000 | Low Risk | 100.000 | 16 | YES | 24241 | 99 | NO | 0 | . | Normal | 45.4 | 30000 | -25000 |
| 3 | Politicalrally | DELHI | YES | High | 20APR2026:11:00:00 | 200000 | 25 | 300 | 191.7 | 10000000 | 20APR2026 | APR2026 | -49 | 666.667 | High Risk | 50.000 | 11 | NO | 24216 | -25 | YES | 0 | . | High Security Threat | 225.0 | 5000 | 195000 |
| 4 | Musicfest | HYDERABAD | NO | Medium | 01JAN2026:10:00:00 | 50000 | 5 | 200 | 50.0 | 2500000 | 01JAN2026 | JAN2026 | 60 | 250.000 | Low Risk | 50.000 | 10 | NO | 24107 | -109 | YES | 50000 | 50000 | Normal | 92.0 | 200000 | -150000 |
| 5 | Cricketfinal | MUMBAI | YES | Medium | 10MAR2026:14:00:00 | 120000 | 15 | 500 | 99.0 | 8000000 | 10MAR2026 | MAR2026 | -8 | 240.000 | High Risk | 66.667 | 14 | NO | 24175 | 68 | NO | 100000 | 100000 | Normal | 98.0 | 50000 | 70000 |
| 6 | Artexpo | PUNE | NO | Low | 25JUN2026:10:30:00 | 15000 | . | 80 | . | 600000 | 25JUN2026 | JUN2026 | -115 | 187.500 | . | 40.000 | 10 | NO | 24282 | 107 | NO | 0 | . | Normal | . | 120000 | -105000 |
22. Final Audit Trail Dataset
data audit_trail;
set growth_trend;
Audit_Timestamp = datetime();
format Audit_Timestamp datetime20.;
run;
proc print data=audit_trail;
run;
OUTPUT:
| Obs | Event_Name | Location | Fraud_Flag | Utilization_Class | Entry_Time | Attendees | Security_Incidents | Staff_Deployed | Crowd_Risk_Index | Ticket_Revenue | Event_Date | Event_Month | Days_Since_Event | Staff_Utilization | Risk_Band | Revenue_Per_Capita | Entry_Hour | Peak_Flag | Prev_Date | Days_Gap | Rolling_Risk_Flag | Bonus_Assigned | Bonus | Fraud_Reason | Weighted_Risk | Prev_Attendees | Growth | Audit_Timestamp |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | Techexpo | BANGALORE | NO | Low | 05FEB2026:09:30:00 | 30000 | 2 | 150 | 30.0 | 1800000 | 05FEB2026 | FEB2026 | 25 | 200.000 | Low Risk | 60.000 | 9 | NO | 24142 | . | YES | 0 | . | Normal | 78.8 | . | . | 02MAR2026:20:03:28 |
| 2 | Foodcarnival | CHENNAI | NO | Low | 15MAY2026:16:00:00 | 5000 | 1 | 100 | 10.0 | 500000 | 15MAY2026 | MAY2026 | -74 | 50.000 | Low Risk | 100.000 | 16 | YES | 24241 | 99 | NO | 0 | . | Normal | 45.4 | 30000 | -25000 | 02MAR2026:20:03:28 |
| 3 | Politicalrally | DELHI | YES | High | 20APR2026:11:00:00 | 200000 | 25 | 300 | 191.7 | 10000000 | 20APR2026 | APR2026 | -49 | 666.667 | High Risk | 50.000 | 11 | NO | 24216 | -25 | YES | 0 | . | High Security Threat | 225.0 | 5000 | 195000 | 02MAR2026:20:03:28 |
| 4 | Musicfest | HYDERABAD | NO | Medium | 01JAN2026:10:00:00 | 50000 | 5 | 200 | 50.0 | 2500000 | 01JAN2026 | JAN2026 | 60 | 250.000 | Low Risk | 50.000 | 10 | NO | 24107 | -109 | YES | 50000 | 50000 | Normal | 92.0 | 200000 | -150000 | 02MAR2026:20:03:28 |
| 5 | Cricketfinal | MUMBAI | YES | Medium | 10MAR2026:14:00:00 | 120000 | 15 | 500 | 99.0 | 8000000 | 10MAR2026 | MAR2026 | -8 | 240.000 | High Risk | 66.667 | 14 | NO | 24175 | 68 | NO | 100000 | 100000 | Normal | 98.0 | 50000 | 70000 | 02MAR2026:20:03:28 |
| 6 | Artexpo | PUNE | NO | Low | 25JUN2026:10:30:00 | 15000 | . | 80 | . | 600000 | 25JUN2026 | JUN2026 | -115 | 187.500 | . | 40.000 | 10 | NO | 24282 | 107 | NO | 0 | . | Normal | . | 120000 | -105000 | 02MAR2026:20:03:28 |
Numeric & Character Functions
·
ABS()
·
ROUND()
·
COALESCE()
·
STRIP()
·
TRIM()
·
CAT()
·
CATX()
·
PROPCASE()
·
UPCASE()
·
LOWCASE()
Fraud Logic
Fraud if:
·
Risk Index > 90
OR
·
Security Incidents > 20
Explanation
DATA Step Philosophy
DATA step reads row by row. Similar to
clinical ADaM derivation.
SET Statement
Used to read existing dataset.
LENGTH Statement
Defines character variable storage.
PROPCASE
Standardizes Event_Name (Musicfest →
Musicfest)
STRIP
Removes trailing blanks.
UPCASE
Ensures Location consistency for merges.
ABS
Corrects negative attendance error.
COALESCE
Handles missing numeric values.
INTNX
Derives beginning of month.
INTCK
Calculates duration difference.
DATEPART
Extracts date from datetime.
ROUND
Controls decimal precision.
MACRO
Reusable logic block.
APPEND
Adds new records.
MERGE
Combines datasets by key.
TRANSPOSE
Converts rows to columns.
PROC DATASETS DELETE
Deletes datasets efficiently.
Common Errors Explained and Debugged
|
Error |
Cause |
Fix |
|
Negative Attendees |
Data entry error |
ABS() |
|
Risk > 100 |
Invalid scale |
Cap at 100 |
|
Missing Incidents |
Null value |
COALESCE |
|
Merge mismatch |
Case sensitivity |
UPCASE |
|
Overwritten dataset |
No delete |
PROC DATASETS |
Quality Control Checks
·
Frequency check for Fraud_Flag
·
Mean utilization
·
Check duplicates
·
Missing value check
· Risk distribution
10 Key Points About This Project
1.
The project simulates a real-time public event
monitoring system for large-scale crowd management.
2.
Intentional data errors such as negative attendance,
missing incidents, and risk values above 100 were introduced and systematically
corrected.
3.
Character standardization was implemented using STRIP,
PROPCASE, UPCASE, LOWCASE, CAT, and CATX functions to ensure merge compatibility.
4.
Date functions like MDY, INTCK, INTNX, and DATEPART
were used for event date derivations and rolling surveillance calculations.
5.
Staff utilization was calculated to determine
operational efficiency and classify events into Low, Medium, or High utilization
categories.
6.
A reusable fraud detection macro flagged events based
on high risk index and abnormal security incidents.
7.
Revenue Per Capita analysis helped identify ticket
fraud or illegal crowd entries.
8.
Rolling 30-day monitoring logic identified repeated
high-risk events in short time windows.
9.
SQL cross-validation and hash lookup techniques
improved validation speed and integrity.
10. The
system generated alert datasets and audit trails, ensuring traceability and
compliance readiness.
Summary
This project demonstrates how SAS can be used
to build a comprehensive large public event monitoring and fraud detection
system. Starting from raw data with intentional errors, we applied structured
debugging, cleaning, derivation logic, and validation checks. Advanced SAS
techniques including macros, SQL, hash objects, rolling date logic, and
classification models were integrated to create a scalable surveillance
framework. Operational metrics such as staff utilization, revenue per capita,
and crowd risk index were derived to support decision-making. The project also
incorporated duplicate detection, integrity checks, and audit trails to ensure
governance and compliance. Overall, it reflects a real-world operational
intelligence system suitable for government, security, or enterprise risk
management applications.
Conclusion
This
integrated SAS project successfully designed, debugged, and optimized a large
public event surveillance system. By combining structured dataset creation,
intentional error correction, fraud detection macros, rolling risk monitoring,
and operational performance metrics, we built a scalable and audit-ready
analytical framework. Advanced programming techniques such as SET, MERGE,
APPEND, PROC SQL, HASH objects, TRANSPOSE, and DATE functions strengthened both
performance and validation. The inclusion of utilization classification,
weighted risk modeling, and automated alerts enhanced real-world applicability.
This project not only demonstrates strong SAS technical capability but also
reflects business-oriented thinking in risk analytics, operational intelligence,
and fraud detection making it highly valuable for interviews and
enterprise-level implementation.
SAS INTERVIEW QUESTIONS
·
What is the difference between SET,
MERGE, and UPDATE?
·
What happens if you do not use RUN
statement?
· How does SAS handle missing numeric values in comparisons?
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
About the Author:
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
About the Author:
SAS Learning Hub is a data analytics and SAS programming platform focused on clinical, financial, and real-world data analysis. The content is created by professionals with academic training in Pharmaceutics and hands-on experience in Base SAS, PROC SQL, Macros, SDTM, and ADaM, providing practical and industry-relevant SAS learning resources.
Disclaimer:
The datasets and analysis in this article are created for educational and demonstration purposes only. They do not represent PUBLIC EVENT data.
Our Mission:
This blog provides industry-focused SAS programming tutorials and analytics projects covering finance, healthcare, and technology.
This project is suitable for:
· Students learning SAS
· Data analysts building portfolios
· Professionals preparing for SAS interviews
· Bloggers writing about analytics
· Clinical SAS Programmer
· Research Data Analyst
· Regulatory Data Validator
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Comments
Post a Comment