417.Can We Design, Debug, Detect Fraud, and Optimize Large Public Event Management Using Advanced SAS Programming Techniques?

Public Event Risk and Fraud Management Using Advanced SAS Techniques

--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

HERE IN THIS PROJECT WE USED THESE SAS STATEMENTS —DATA STEP| SET | IF-THEN-ELSE | LENGTH | FORMAT | INTCK | INTNX | MDY | MERGE | PROC SORT | PROC SQL | PROC APPEND | PROC TRANSPOSE | PROC REPORT | PROC FORMAT | MACRO (%MACRO / %MEND) | HASH OBJECT | PROC DATASETS DELETE | RUN | QUIT

--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Introduction

Managing large public events such as concerts, expos, political rallies, and sports finals requires advanced operational monitoring. Crowd control, security risk detection, staff deployment optimization, and fraud surveillance are critical components.

In this integrated SAS project, we designed a complete Public Event Risk Surveillance System that:

·       Creates structured datasets

·       Introduces intentional errors

·       Performs debugging and data correction

·       Applies fraud detection logic

·       Derives utilization metrics

·       Implements rolling risk monitoring

·       Uses macros, SQL, hash objects, and reporting procedures

·       Builds audit and alert mechanisms

This project simulates a real-world operational intelligence system similar to safety monitoring frameworks used in regulated industries.

Table Of Contents

1.     Business Context 

2.     Raw Dataset Creation with Intentional Errors

3.     Error Identification and Debugging Strategy

4.     Data Cleaning and Standardization

5.     Character and Numeric Function Applications

6.     Date Intelligence (MDY, INTCK, INTNX, DATEPART)

7.     Utilization and Risk Index Derivation

8.     Fraud Detection Macro Logic

9.     Revenue Per Capita Analysis

10.  Duplicate Detection and Data Integrity Checks

11.  Dataset Integration Using SET, MERGE, APPEND

12.  Hash Object Lookup Optimization

13.  Rolling 30-Day Risk Monitoring

14.  Peak Hour Crowd Analysis

15.  Risk Band Classification Using PROC FORMAT

16.  SQL-Based Fraud Cross Validation

17.  Weighted Risk Model Enhancement

18.  Growth Trend Analysis Using LAG( )

19.  PROC TRANSPOSE for Structural Reshaping

20.  Monthly Risk Aggregation Macro

21.  Dashboard Reporting via PROC REPORT

22.  Automated Alert Dataset Creation

23.  Audit Trail Generation

24.  QC and Validation Framework

25.  WORK Library Cleanup Using PROC DATASETS

Business Context

Imagine a government intelligence department monitoring large public events across India to:

  • Detect abnormal crowd surges
  • Prevent ticket fraud
  • Monitor security threats
  • Optimize staff deployment
  • Predict crowd risk levels

Variables included:

Variable

Description

Event_Name

Name of public event

Location

City

Attendees

Total attendance

Entry_Time

Entry timestamp

Security_Incidents

Reported incidents

Staff_Deployed

Number of staff

Crowd_Risk_Index

Calculated risk %

Ticket_Revenue

Revenue collected

Fraud_Flag

Fraud indicator

1. Raw Dataset Creation (With Intentional Errors)

data public_events_raw;

    length Event_Name $50 Location $30;

    format Entry_Time datetime20.;

    input Event_Name $ Location $ Attendees Entry_Time :datetime20.

          Security_Incidents Staff_Deployed Crowd_Risk_Index Ticket_Revenue;

datalines;

MusicFest Hyderabad 50000 01JAN2026:10:00:00 5 200 85 2500000

TechExpo Bangalore 30000 05FEB2026:09:30:00 2 150 70 1800000

CricketFinal Mumbai 120000 10MAR2026:14:00:00 15 500 95 8000000

PoliticalRally Delhi 200000 20APR2026:11:00:00 25 300 110 10000000

FoodCarnival Chennai -5000 15MAY2026:16:00:00 1 100 40 500000

ArtExpo Pune 15000 25JUN2026:10:30:00 . 80 55 600000

;

run;

proc print data=public_events_raw;

run;

OUTPUT:

ObsEvent_NameLocationEntry_TimeAttendeesSecurity_IncidentsStaff_DeployedCrowd_Risk_IndexTicket_Revenue
1MusicFestHyderabad01JAN2026:10:00:00500005200852500000
2TechExpoBangalore05FEB2026:09:30:00300002150701800000
3CricketFinalMumbai10MAR2026:14:00:0012000015500958000000
4PoliticalRallyDelhi20APR2026:11:00:002000002530011010000000
5FoodCarnivalChennai15MAY2026:16:00:00-5000110040500000
6ArtExpoPune25JUN2026:10:30:0015000.8055600000

Intentional Errors 

·  Crowd_Risk_Index = 110 (Invalid >100)

·  Negative Attendees = -5000

·  Missing Security_Incidents

·  No Fraud_Flag

·  No Utilization classification

·  No date derivation

·  No cleaning of character strings

2. Corrected Full-Length SAS Code

data public_events_clean;

    length Event_Name $50 Location $30 Fraud_Flag $3 Utilization_Class $10;

    set public_events_raw;

    Event_Name = propcase(strip(Event_Name));

    Location = upcase(strip(Location));

    if Attendees < 0 then Attendees = abs(Attendees);

    Crowd_Risk_Index = coalesce(Crowd_Risk_Index, 0);

    if Crowd_Risk_Index > 100 then Crowd_Risk_Index = 100;

    Event_Date = datepart(Entry_Time);

    Event_Month = intnx('month', Event_Date, 0, 'b');

    Days_Since_Event = intck('day', Event_Date, today());

    Staff_Utilization = (Attendees / Staff_Deployed);    

    if Staff_Utilization > 400 then Utilization_Class="High";

    else if Staff_Utilization >200 then Utilization_Class="Medium";

    else Utilization_Class="Low";

    Crowd_Risk_Index = round((Security_Incidents*5 + Staff_Utilization/10),0.1);

    format Event_Date date9. Event_Month monyy7.;

run;

proc print data=public_events_clean;

run;

OUTPUT:

ObsEvent_NameLocationFraud_FlagUtilization_ClassEntry_TimeAttendeesSecurity_IncidentsStaff_DeployedCrowd_Risk_IndexTicket_RevenueEvent_DateEvent_MonthDays_Since_EventStaff_Utilization
1MusicfestHYDERABAD Medium01JAN2026:10:00:0050000520050.0250000001JAN2026JAN202660250.000
2TechexpoBANGALORE Low05FEB2026:09:30:0030000215030.0180000005FEB2026FEB202625200.000
3CricketfinalMUMBAI Medium10MAR2026:14:00:001200001550099.0800000010MAR2026MAR2026-8240.000
4PoliticalrallyDELHI High20APR2026:11:00:0020000025300191.71000000020APR2026APR2026-49666.667
5FoodcarnivalCHENNAI Low15MAY2026:16:00:005000110010.050000015MAY2026MAY2026-7450.000
6ArtexpoPUNE Low25JUN2026:10:30:0015000.80.60000025JUN2026JUN2026-115187.500

·  Removes older versions

·  Prevents overwrite confusion

·  Maintains clean WORK library

3. Macro for Fraud Detection

%macro detect_fraud(input=,output=);

data &output.;

    set &input.;

    length Fraud_Flag $3;

    if Crowd_Risk_Index > 90 or Security_Incidents > 20 then Fraud_Flag="YES";

    else Fraud_Flag="NO";

run;

proc print data=&output.;

run;

%mend;

%detect_fraud(input=public_events_clean,output=public_events_final);

OUTPUT:

ObsEvent_NameLocationFraud_FlagUtilization_ClassEntry_TimeAttendeesSecurity_IncidentsStaff_DeployedCrowd_Risk_IndexTicket_RevenueEvent_DateEvent_MonthDays_Since_EventStaff_Utilization
1MusicfestHYDERABADNOMedium01JAN2026:10:00:0050000520050.0250000001JAN2026JAN202660250.000
2TechexpoBANGALORENOLow05FEB2026:09:30:0030000215030.0180000005FEB2026FEB202625200.000
3CricketfinalMUMBAIYESMedium10MAR2026:14:00:001200001550099.0800000010MAR2026MAR2026-8240.000
4PoliticalrallyDELHIYESHigh20APR2026:11:00:0020000025300191.71000000020APR2026APR2026-49666.667
5FoodcarnivalCHENNAINOLow15MAY2026:16:00:005000110010.050000015MAY2026MAY2026-7450.000
6ArtexpoPUNENOLow25JUN2026:10:30:0015000.80.60000025JUN2026JUN2026-115187.500

4. Append Example

data new_events;

    set public_events_final;

    where Location="HYDERABAD";

run;

proc print data=new_events;

run;

OUTPUT:

ObsEvent_NameLocationFraud_FlagUtilization_ClassEntry_TimeAttendeesSecurity_IncidentsStaff_DeployedCrowd_Risk_IndexTicket_RevenueEvent_DateEvent_MonthDays_Since_EventStaff_Utilization
1MusicfestHYDERABADNOMedium01JAN2026:10:00:0050000520050250000001JAN2026JAN202660250

proc append base=public_events_final 

            data=new_events force;

run;

proc print data=public_events_final;

run;

OUTPUT:

ObsEvent_NameLocationFraud_FlagUtilization_ClassEntry_TimeAttendeesSecurity_IncidentsStaff_DeployedCrowd_Risk_IndexTicket_RevenueEvent_DateEvent_MonthDays_Since_EventStaff_Utilization
1MusicfestHYDERABADNOMedium01JAN2026:10:00:0050000520050.0250000001JAN2026JAN202660250.000
2TechexpoBANGALORENOLow05FEB2026:09:30:0030000215030.0180000005FEB2026FEB202625200.000
3CricketfinalMUMBAIYESMedium10MAR2026:14:00:001200001550099.0800000010MAR2026MAR2026-8240.000
4PoliticalrallyDELHIYESHigh20APR2026:11:00:0020000025300191.71000000020APR2026APR2026-49666.667
5FoodcarnivalCHENNAINOLow15MAY2026:16:00:005000110010.050000015MAY2026MAY2026-7450.000
6ArtexpoPUNENOLow25JUN2026:10:30:0015000.80.60000025JUN2026JUN2026-115187.500
7MusicfestHYDERABADNOMedium01JAN2026:10:00:0050000520050.0250000001JAN2026JAN202660250.000

5. Merge Example

data staff_bonus;

    input Location:$30. Bonus;

datalines;

HYDERABAD 50000

MUMBAI 100000

;

run;

proc print data=staff_bonus;

run;

OUTPUT:

ObsLocationBonus
1HYDERABAD50000
2MUMBAI100000

proc sort data=public_events_final; by Location; run;

proc print data=public_events_final;

run;

OUTPUT:

ObsEvent_NameLocationFraud_FlagUtilization_ClassEntry_TimeAttendeesSecurity_IncidentsStaff_DeployedCrowd_Risk_IndexTicket_RevenueEvent_DateEvent_MonthDays_Since_EventStaff_Utilization
1TechexpoBANGALORENOLow05FEB2026:09:30:0030000215030.0180000005FEB2026FEB202625200.000
2FoodcarnivalCHENNAINOLow15MAY2026:16:00:005000110010.050000015MAY2026MAY2026-7450.000
3PoliticalrallyDELHIYESHigh20APR2026:11:00:0020000025300191.71000000020APR2026APR2026-49666.667
4MusicfestHYDERABADNOMedium01JAN2026:10:00:0050000520050.0250000001JAN2026JAN202660250.000
5MusicfestHYDERABADNOMedium01JAN2026:10:00:0050000520050.0250000001JAN2026JAN202660250.000
6CricketfinalMUMBAIYESMedium10MAR2026:14:00:001200001550099.0800000010MAR2026MAR2026-8240.000
7ArtexpoPUNENOLow25JUN2026:10:30:0015000.80.60000025JUN2026JUN2026-115187.500

proc sort data=staff_bonus; by Location; run;

proc print data=staff_bonus;

run;

OUTPUT:

ObsLocationBonus
1HYDERABAD50000
2MUMBAI100000

data merged_data;

    merge public_events_final staff_bonus;

    by Location;

run;

proc print data=merged_data;

run;

OUTPUT:

ObsEvent_NameLocationFraud_FlagUtilization_ClassEntry_TimeAttendeesSecurity_IncidentsStaff_DeployedCrowd_Risk_IndexTicket_RevenueEvent_DateEvent_MonthDays_Since_EventStaff_UtilizationBonus
1TechexpoBANGALORENOLow05FEB2026:09:30:0030000215030.0180000005FEB2026FEB202625200.000.
2FoodcarnivalCHENNAINOLow15MAY2026:16:00:005000110010.050000015MAY2026MAY2026-7450.000.
3PoliticalrallyDELHIYESHigh20APR2026:11:00:0020000025300191.71000000020APR2026APR2026-49666.667.
4MusicfestHYDERABADNOMedium01JAN2026:10:00:0050000520050.0250000001JAN2026JAN202660250.00050000
5MusicfestHYDERABADNOMedium01JAN2026:10:00:0050000520050.0250000001JAN2026JAN202660250.00050000
6CricketfinalMUMBAIYESMedium10MAR2026:14:00:001200001550099.0800000010MAR2026MAR2026-8240.000100000
7ArtexpoPUNENOLow25JUN2026:10:30:0015000.80.60000025JUN2026JUN2026-115187.500.

6. Transpose Example

proc sort data=merged_data; by Location; run;

proc print data=merged_data;

run;

OUTPUT:

ObsEvent_NameLocationFraud_FlagUtilization_ClassEntry_TimeAttendeesSecurity_IncidentsStaff_DeployedCrowd_Risk_IndexTicket_RevenueEvent_DateEvent_MonthDays_Since_EventStaff_UtilizationBonus
1TechexpoBANGALORENOLow05FEB2026:09:30:0030000215030.0180000005FEB2026FEB202625200.000.
2FoodcarnivalCHENNAINOLow15MAY2026:16:00:005000110010.050000015MAY2026MAY2026-7450.000.
3PoliticalrallyDELHIYESHigh20APR2026:11:00:0020000025300191.71000000020APR2026APR2026-49666.667.
4MusicfestHYDERABADNOMedium01JAN2026:10:00:0050000520050.0250000001JAN2026JAN202660250.00050000
5MusicfestHYDERABADNOMedium01JAN2026:10:00:0050000520050.0250000001JAN2026JAN202660250.00050000
6CricketfinalMUMBAIYESMedium10MAR2026:14:00:001200001550099.0800000010MAR2026MAR2026-8240.000100000
7ArtexpoPUNENOLow25JUN2026:10:30:0015000.80.60000025JUN2026JUN2026-115187.500.

proc transpose data=merged_data out=transposed_data;

    by Location;

    var Attendees Staff_Deployed Security_Incidents;

run;

proc print data=transposed_data;

run;

OUTPUT:

ObsLocation_NAME_COL1COL2
1BANGALOREAttendees30000.
2BANGALOREStaff_Deployed150.
3BANGALORESecurity_Incidents2.
4CHENNAIAttendees5000.
5CHENNAIStaff_Deployed100.
6CHENNAISecurity_Incidents1.
7DELHIAttendees200000.
8DELHIStaff_Deployed300.
9DELHISecurity_Incidents25.
10HYDERABADAttendees5000050000
11HYDERABADStaff_Deployed200200
12HYDERABADSecurity_Incidents55
13MUMBAIAttendees120000.
14MUMBAIStaff_Deployed500.
15MUMBAISecurity_Incidents15.
16PUNEAttendees15000.
17PUNEStaff_Deployed80.
18PUNESecurity_Incidents..

7. Datasets Delete/Unwanted Datasets

proc datasets library=work nolist;

   delete public_events_clean fraud_summary transposed_data;

quit;

LOG:

NOTE: The file WORK.FRAUD_SUMMARY (memtype=DATA) was not found, but appears on a DELETE statement.
NOTE: Deleting WORK.PUBLIC_EVENTS_CLEAN (memtype=DATA).
NOTE: Deleting WORK.TRANSPOSED_DATA (memtype=DATA).

8. PROC FORMAT – Risk Band Classification

proc format;

    value riskband

        low-50 = "Low Risk"

        50<-80 = "Moderate Risk"

        80<-high = "High Risk";

run;

LOG:

NOTE: Format RISKBAND has been output.

·  Standardizes classification logic

·  Avoids hardcoding IF-THEN repeatedly

·  Regulatory-style mapping (like CDISC controlled terminology)

9. Apply Risk Format

data public_events_enhanced;

    set public_events_final;

    Risk_Band = put(Crowd_Risk_Index, riskband.);

run;

proc print data=public_events_enhanced;

run;

OUTPUT:

ObsEvent_NameLocationFraud_FlagUtilization_ClassEntry_TimeAttendeesSecurity_IncidentsStaff_DeployedCrowd_Risk_IndexTicket_RevenueEvent_DateEvent_MonthDays_Since_EventStaff_UtilizationRisk_Band
1TechexpoBANGALORENOLow05FEB2026:09:30:0030000215030.0180000005FEB2026FEB202625200.000Low Risk
2FoodcarnivalCHENNAINOLow15MAY2026:16:00:005000110010.050000015MAY2026MAY2026-7450.000Low Risk
3PoliticalrallyDELHIYESHigh20APR2026:11:00:0020000025300191.71000000020APR2026APR2026-49666.667High Risk
4MusicfestHYDERABADNOMedium01JAN2026:10:00:0050000520050.0250000001JAN2026JAN202660250.000Low Risk
5MusicfestHYDERABADNOMedium01JAN2026:10:00:0050000520050.0250000001JAN2026JAN202660250.000Low Risk
6CricketfinalMUMBAIYESMedium10MAR2026:14:00:001200001550099.0800000010MAR2026MAR2026-8240.000High Risk
7ArtexpoPUNENOLow25JUN2026:10:30:0015000.80.60000025JUN2026JUN2026-115187.500.

·  Converts numeric risk to descriptive category

·  Improves interpretability

10. Duplicate Detection Logic

proc sort data=public_events_enhanced out=dedup_check nodupkey dupout=duplicates;

    by Event_Name Event_Date Location;

run;

proc print data=dedup_check;

run;

OUTPUT:

ObsEvent_NameLocationFraud_FlagUtilization_ClassEntry_TimeAttendeesSecurity_IncidentsStaff_DeployedCrowd_Risk_IndexTicket_RevenueEvent_DateEvent_MonthDays_Since_EventStaff_UtilizationRisk_Band
1ArtexpoPUNENOLow25JUN2026:10:30:0015000.80.60000025JUN2026JUN2026-115187.500.
2CricketfinalMUMBAIYESMedium10MAR2026:14:00:001200001550099.0800000010MAR2026MAR2026-8240.000High Risk
3FoodcarnivalCHENNAINOLow15MAY2026:16:00:005000110010.050000015MAY2026MAY2026-7450.000Low Risk
4MusicfestHYDERABADNOMedium01JAN2026:10:00:0050000520050.0250000001JAN2026JAN202660250.000Low Risk
5PoliticalrallyDELHIYESHigh20APR2026:11:00:0020000025300191.71000000020APR2026APR2026-49666.667High Risk
6TechexpoBANGALORENOLow05FEB2026:09:30:0030000215030.0180000005FEB2026FEB202625200.000Low Risk

proc print data=duplicates;

run;

OUTPUT:

ObsEvent_NameLocationFraud_FlagUtilization_ClassEntry_TimeAttendeesSecurity_IncidentsStaff_DeployedCrowd_Risk_IndexTicket_RevenueEvent_DateEvent_MonthDays_Since_EventStaff_UtilizationRisk_Band
1MusicfestHYDERABADNOMedium01JAN2026:10:00:0050000520050250000001JAN2026JAN202660250Low Risk

·  Prevents duplicate operational entries

·  dupout= captures duplicate dataset for audit

11. Revenue Per Capita Derivation

data revenue_analysis;

    set dedup_check;

    Revenue_Per_Capita = Ticket_Revenue / Attendees;

run;

proc print data=revenue_analysis;

run;

OUTPUT:

ObsEvent_NameLocationFraud_FlagUtilization_ClassEntry_TimeAttendeesSecurity_IncidentsStaff_DeployedCrowd_Risk_IndexTicket_RevenueEvent_DateEvent_MonthDays_Since_EventStaff_UtilizationRisk_BandRevenue_Per_Capita
1ArtexpoPUNENOLow25JUN2026:10:30:0015000.80.60000025JUN2026JUN2026-115187.500.40.000
2CricketfinalMUMBAIYESMedium10MAR2026:14:00:001200001550099.0800000010MAR2026MAR2026-8240.000High Risk66.667
3FoodcarnivalCHENNAINOLow15MAY2026:16:00:005000110010.050000015MAY2026MAY2026-7450.000Low Risk100.000
4MusicfestHYDERABADNOMedium01JAN2026:10:00:0050000520050.0250000001JAN2026JAN202660250.000Low Risk50.000
5PoliticalrallyDELHIYESHigh20APR2026:11:00:0020000025300191.71000000020APR2026APR2026-49666.667High Risk50.000
6TechexpoBANGALORENOLow05FEB2026:09:30:0030000215030.0180000005FEB2026FEB202625200.000Low Risk60.000

·  Detects ticket manipulation fraud

·  Low revenue per head = potential illegal entries

12. Peak Hour Detection

data peak_analysis;

    set revenue_analysis;

    Entry_Hour = hour(Entry_Time);

    if 15 < Entry_Hour < 21 then Peak_Flag="YES";

    else Peak_Flag="NO";

run;

proc print data=peak_analysis;

run;

OUTPUT:

ObsEvent_NameLocationFraud_FlagUtilization_ClassEntry_TimeAttendeesSecurity_IncidentsStaff_DeployedCrowd_Risk_IndexTicket_RevenueEvent_DateEvent_MonthDays_Since_EventStaff_UtilizationRisk_BandRevenue_Per_CapitaEntry_HourPeak_Flag
1ArtexpoPUNENOLow25JUN2026:10:30:0015000.80.60000025JUN2026JUN2026-115187.500.40.00010NO
2CricketfinalMUMBAIYESMedium10MAR2026:14:00:001200001550099.0800000010MAR2026MAR2026-8240.000High Risk66.66714NO
3FoodcarnivalCHENNAINOLow15MAY2026:16:00:005000110010.050000015MAY2026MAY2026-7450.000Low Risk100.00016YES
4MusicfestHYDERABADNOMedium01JAN2026:10:00:0050000520050.0250000001JAN2026JAN202660250.000Low Risk50.00010NO
5PoliticalrallyDELHIYESHigh20APR2026:11:00:0020000025300191.71000000020APR2026APR2026-49666.667High Risk50.00011NO
6TechexpoBANGALORENOLow05FEB2026:09:30:0030000215030.0180000005FEB2026FEB202625200.000Low Risk60.0009NO

·  Identifies rush-hour crowd risk

·  Useful for deployment strategy

13. Rolling 30-Day Risk Monitoring

proc sort data=peak_analysis;

    by Location Event_Date;

run;

proc print data=peak_analysis;

run;

OUTPUT:

ObsEvent_NameLocationFraud_FlagUtilization_ClassEntry_TimeAttendeesSecurity_IncidentsStaff_DeployedCrowd_Risk_IndexTicket_RevenueEvent_DateEvent_MonthDays_Since_EventStaff_UtilizationRisk_BandRevenue_Per_CapitaEntry_HourPeak_Flag
1TechexpoBANGALORENOLow05FEB2026:09:30:0030000215030.0180000005FEB2026FEB202625200.000Low Risk60.0009NO
2FoodcarnivalCHENNAINOLow15MAY2026:16:00:005000110010.050000015MAY2026MAY2026-7450.000Low Risk100.00016YES
3PoliticalrallyDELHIYESHigh20APR2026:11:00:0020000025300191.71000000020APR2026APR2026-49666.667High Risk50.00011NO
4MusicfestHYDERABADNOMedium01JAN2026:10:00:0050000520050.0250000001JAN2026JAN202660250.000Low Risk50.00010NO
5CricketfinalMUMBAIYESMedium10MAR2026:14:00:001200001550099.0800000010MAR2026MAR2026-8240.000High Risk66.66714NO
6ArtexpoPUNENOLow25JUN2026:10:30:0015000.80.60000025JUN2026JUN2026-115187.500.40.00010NO

data rolling_risk;

    set peak_analysis;

    by Location;

    retain Prev_Date;

    Days_Gap = intck('day', Prev_Date, Event_Date);

    if Days_Gap <= 30 then Rolling_Risk_Flag="YES";

    else Rolling_Risk_Flag="NO";

    Prev_Date = Event_Date;

run;

proc print data=rolling_risk;

run;

OUTPUT:

ObsEvent_NameLocationFraud_FlagUtilization_ClassEntry_TimeAttendeesSecurity_IncidentsStaff_DeployedCrowd_Risk_IndexTicket_RevenueEvent_DateEvent_MonthDays_Since_EventStaff_UtilizationRisk_BandRevenue_Per_CapitaEntry_HourPeak_FlagPrev_DateDays_GapRolling_Risk_Flag
1TechexpoBANGALORENOLow05FEB2026:09:30:0030000215030.0180000005FEB2026FEB202625200.000Low Risk60.0009NO24142.YES
2FoodcarnivalCHENNAINOLow15MAY2026:16:00:005000110010.050000015MAY2026MAY2026-7450.000Low Risk100.00016YES2424199NO
3PoliticalrallyDELHIYESHigh20APR2026:11:00:0020000025300191.71000000020APR2026APR2026-49666.667High Risk50.00011NO24216-25YES
4MusicfestHYDERABADNOMedium01JAN2026:10:00:0050000520050.0250000001JAN2026JAN202660250.000Low Risk50.00010NO24107-109YES
5CricketfinalMUMBAIYESMedium10MAR2026:14:00:001200001550099.0800000010MAR2026MAR2026-8240.000High Risk66.66714NO2417568NO
6ArtexpoPUNENOLow25JUN2026:10:30:0015000.80.60000025JUN2026JUN2026-115187.500.40.00010NO24282107NO

·  Detects frequent risky events in short duration

·  Prevents repeated security lapses

14. HASH Object Lookup 

data bonus_lookup;

    if _N_=1 then do;

        declare hash h(dataset:"staff_bonus");

        h.defineKey("Location");

        h.defineData("Bonus");

        h.defineDone();

    end;

    set rolling_risk;

    if h.find()=0 then Bonus_Assigned=Bonus;

    else Bonus_Assigned=0;

run;

proc print data=bonus_lookup;

run;

OUTPUT:

ObsEvent_NameLocationFraud_FlagUtilization_ClassEntry_TimeAttendeesSecurity_IncidentsStaff_DeployedCrowd_Risk_IndexTicket_RevenueEvent_DateEvent_MonthDays_Since_EventStaff_UtilizationRisk_BandRevenue_Per_CapitaEntry_HourPeak_FlagPrev_DateDays_GapRolling_Risk_FlagBonus_AssignedBonus
1TechexpoBANGALORENOLow05FEB2026:09:30:0030000215030.0180000005FEB2026FEB202625200.000Low Risk60.0009NO24142.YES0.
2FoodcarnivalCHENNAINOLow15MAY2026:16:00:005000110010.050000015MAY2026MAY2026-7450.000Low Risk100.00016YES2424199NO0.
3PoliticalrallyDELHIYESHigh20APR2026:11:00:0020000025300191.71000000020APR2026APR2026-49666.667High Risk50.00011NO24216-25YES0.
4MusicfestHYDERABADNOMedium01JAN2026:10:00:0050000520050.0250000001JAN2026JAN202660250.000Low Risk50.00010NO24107-109YES5000050000
5CricketfinalMUMBAIYESMedium10MAR2026:14:00:001200001550099.0800000010MAR2026MAR2026-8240.000High Risk66.66714NO2417568NO100000100000
6ArtexpoPUNENOLow25JUN2026:10:30:0015000.80.60000025JUN2026JUN2026-115187.500.40.00010NO24282107NO0.

·  Faster than MERGE

·  Efficient in large datasets

·  Real-world production optimization

15. Monthly Risk Summary Macro

%macro monthly_summary;

proc sql;

    create table monthly_risk_summary as

    select Location,

           month(Event_Date) as Month,

           mean(Crowd_Risk_Index) as Avg_Risk,

           sum(Security_Incidents) as Total_Incidents

    from bonus_lookup

    group by Location, calculated Month;

quit;

proc print data=monthly_risk_summary;

run;

%mend;


%monthly_summary;

OUTPUT:

ObsLocationMonthAvg_RiskTotal_Incidents
1BANGALORE230.02
2CHENNAI510.01
3DELHI4191.725
4HYDERABAD150.05
5MUMBAI399.015
6PUNE6..

·  Automated aggregation

·  Reusable reporting logic

16. PROC REPORT 

proc report data=monthly_risk_summary nowd;

    column Location Month Avg_Risk Total_Incidents;

run;

OUTPUT:

LocationMonthAvg_RiskTotal_Incidents
BANGALORE2302
CHENNAI5101
DELHI4191.725
HYDERABAD1505
MUMBAI39915
PUNE6..

·  Management reporting

·  Structured tabular output

17. PROC SQL Fraud Cross Validation

proc sql;

  create table fraud_cross_check as

  select *,

    case 

     when Revenue_Per_Capita < 20 and Crowd_Risk_Index > 85 then "Revenue Mismatch"

     when Security_Incidents > 20 then "High Security Threat"

     else "Normal"

     end as Fraud_Reason

  from bonus_lookup;

quit;

proc print data=fraud_cross_check;

run;

OUTPUT:

ObsEvent_NameLocationFraud_FlagUtilization_ClassEntry_TimeAttendeesSecurity_IncidentsStaff_DeployedCrowd_Risk_IndexTicket_RevenueEvent_DateEvent_MonthDays_Since_EventStaff_UtilizationRisk_BandRevenue_Per_CapitaEntry_HourPeak_FlagPrev_DateDays_GapRolling_Risk_FlagBonus_AssignedBonusFraud_Reason
1TechexpoBANGALORENOLow05FEB2026:09:30:0030000215030.0180000005FEB2026FEB202625200.000Low Risk60.0009NO24142.YES0.Normal
2FoodcarnivalCHENNAINOLow15MAY2026:16:00:005000110010.050000015MAY2026MAY2026-7450.000Low Risk100.00016YES2424199NO0.Normal
3PoliticalrallyDELHIYESHigh20APR2026:11:00:0020000025300191.71000000020APR2026APR2026-49666.667High Risk50.00011NO24216-25YES0.High Security Threat
4MusicfestHYDERABADNOMedium01JAN2026:10:00:0050000520050.0250000001JAN2026JAN202660250.000Low Risk50.00010NO24107-109YES5000050000Normal
5CricketfinalMUMBAIYESMedium10MAR2026:14:00:001200001550099.0800000010MAR2026MAR2026-8240.000High Risk66.66714NO2417568NO100000100000Normal
6ArtexpoPUNENOLow25JUN2026:10:30:0015000.80.60000025JUN2026JUN2026-115187.500.40.00010NO24282107NO0.Normal

·  Alternative validation method

·  SQL-style case logic

18. Integrity Check Dataset

data integrity_check;

    set fraud_cross_check;

    length Issue $20.;

    if missing(Attendees) then Issue="Missing Attendees";

    else if Attendees=0 then Issue="Zero Attendance";

    else if Staff_Deployed=0 then Issue="No Staff Assigned";

run;

proc print data=integrity_check;

run;

OUTPUT:

ObsEvent_NameLocationFraud_FlagUtilization_ClassEntry_TimeAttendeesSecurity_IncidentsStaff_DeployedCrowd_Risk_IndexTicket_RevenueEvent_DateEvent_MonthDays_Since_EventStaff_UtilizationRisk_BandRevenue_Per_CapitaEntry_HourPeak_FlagPrev_DateDays_GapRolling_Risk_FlagBonus_AssignedBonusFraud_ReasonIssue
1TechexpoBANGALORENOLow05FEB2026:09:30:0030000215030.0180000005FEB2026FEB202625200.000Low Risk60.0009NO24142.YES0.Normal 
2FoodcarnivalCHENNAINOLow15MAY2026:16:00:005000110010.050000015MAY2026MAY2026-7450.000Low Risk100.00016YES2424199NO0.Normal 
3PoliticalrallyDELHIYESHigh20APR2026:11:00:0020000025300191.71000000020APR2026APR2026-49666.667High Risk50.00011NO24216-25YES0.High Security Threat 
4MusicfestHYDERABADNOMedium01JAN2026:10:00:0050000520050.0250000001JAN2026JAN202660250.000Low Risk50.00010NO24107-109YES5000050000Normal 
5CricketfinalMUMBAIYESMedium10MAR2026:14:00:001200001550099.0800000010MAR2026MAR2026-8240.000High Risk66.66714NO2417568NO100000100000Normal 
6ArtexpoPUNENOLow25JUN2026:10:30:0015000.80.60000025JUN2026JUN2026-115187.500.40.00010NO24282107NO0.Normal 

19. Automated Alert Dataset

data alert_dataset;

    set fraud_cross_check;

    if Fraud_Flag="YES" or Risk_Band="High Risk";

run;

proc print data=alert_dataset;

run;

OUTPUT:

ObsEvent_NameLocationFraud_FlagUtilization_ClassEntry_TimeAttendeesSecurity_IncidentsStaff_DeployedCrowd_Risk_IndexTicket_RevenueEvent_DateEvent_MonthDays_Since_EventStaff_UtilizationRisk_BandRevenue_Per_CapitaEntry_HourPeak_FlagPrev_DateDays_GapRolling_Risk_FlagBonus_AssignedBonusFraud_Reason
1PoliticalrallyDELHIYESHigh20APR2026:11:00:0020000025300191.71000000020APR2026APR2026-49666.667High Risk50.000011NO24216-25YES0.High Security Threat
2CricketfinalMUMBAIYESMedium10MAR2026:14:00:001200001550099.0800000010MAR2026MAR2026-8240.000High Risk66.666714NO2417568NO100000100000Normal

20. Recalculated Weighted Risk Model

data weighted_model;

    set fraud_cross_check;

    Weighted_Risk =

          (Security_Incidents*0.4)+ (Staff_Utilization*0.3)

        + (Revenue_Per_Capita*0.3);

run;

proc print data=weighted_model;

run;

OUTPUT:

ObsEvent_NameLocationFraud_FlagUtilization_ClassEntry_TimeAttendeesSecurity_IncidentsStaff_DeployedCrowd_Risk_IndexTicket_RevenueEvent_DateEvent_MonthDays_Since_EventStaff_UtilizationRisk_BandRevenue_Per_CapitaEntry_HourPeak_FlagPrev_DateDays_GapRolling_Risk_FlagBonus_AssignedBonusFraud_ReasonWeighted_Risk
1TechexpoBANGALORENOLow05FEB2026:09:30:0030000215030.0180000005FEB2026FEB202625200.000Low Risk60.0009NO24142.YES0.Normal78.8
2FoodcarnivalCHENNAINOLow15MAY2026:16:00:005000110010.050000015MAY2026MAY2026-7450.000Low Risk100.00016YES2424199NO0.Normal45.4
3PoliticalrallyDELHIYESHigh20APR2026:11:00:0020000025300191.71000000020APR2026APR2026-49666.667High Risk50.00011NO24216-25YES0.High Security Threat225.0
4MusicfestHYDERABADNOMedium01JAN2026:10:00:0050000520050.0250000001JAN2026JAN202660250.000Low Risk50.00010NO24107-109YES5000050000Normal92.0
5CricketfinalMUMBAIYESMedium10MAR2026:14:00:001200001550099.0800000010MAR2026MAR2026-8240.000High Risk66.66714NO2417568NO100000100000Normal98.0
6ArtexpoPUNENOLow25JUN2026:10:30:0015000.80.60000025JUN2026JUN2026-115187.500.40.00010NO24282107NO0.Normal.

21. Event Growth Trend Using LAG( )

data growth_trend;

    set weighted_model;

    by Location;

    Prev_Attendees = lag(Attendees);

    Growth = Attendees - Prev_Attendees;

run;

proc print data=growth_trend;

run;

OUTPUT:

ObsEvent_NameLocationFraud_FlagUtilization_ClassEntry_TimeAttendeesSecurity_IncidentsStaff_DeployedCrowd_Risk_IndexTicket_RevenueEvent_DateEvent_MonthDays_Since_EventStaff_UtilizationRisk_BandRevenue_Per_CapitaEntry_HourPeak_FlagPrev_DateDays_GapRolling_Risk_FlagBonus_AssignedBonusFraud_ReasonWeighted_RiskPrev_AttendeesGrowth
1TechexpoBANGALORENOLow05FEB2026:09:30:0030000215030.0180000005FEB2026FEB202625200.000Low Risk60.0009NO24142.YES0.Normal78.8..
2FoodcarnivalCHENNAINOLow15MAY2026:16:00:005000110010.050000015MAY2026MAY2026-7450.000Low Risk100.00016YES2424199NO0.Normal45.430000-25000
3PoliticalrallyDELHIYESHigh20APR2026:11:00:0020000025300191.71000000020APR2026APR2026-49666.667High Risk50.00011NO24216-25YES0.High Security Threat225.05000195000
4MusicfestHYDERABADNOMedium01JAN2026:10:00:0050000520050.0250000001JAN2026JAN202660250.000Low Risk50.00010NO24107-109YES5000050000Normal92.0200000-150000
5CricketfinalMUMBAIYESMedium10MAR2026:14:00:001200001550099.0800000010MAR2026MAR2026-8240.000High Risk66.66714NO2417568NO100000100000Normal98.05000070000
6ArtexpoPUNENOLow25JUN2026:10:30:0015000.80.60000025JUN2026JUN2026-115187.500.40.00010NO24282107NO0.Normal.120000-105000

22. Final Audit Trail Dataset

data audit_trail;

    set growth_trend;

    Audit_Timestamp = datetime();

    format Audit_Timestamp datetime20.;

run;

proc print data=audit_trail;

run;

OUTPUT:

ObsEvent_NameLocationFraud_FlagUtilization_ClassEntry_TimeAttendeesSecurity_IncidentsStaff_DeployedCrowd_Risk_IndexTicket_RevenueEvent_DateEvent_MonthDays_Since_EventStaff_UtilizationRisk_BandRevenue_Per_CapitaEntry_HourPeak_FlagPrev_DateDays_GapRolling_Risk_FlagBonus_AssignedBonusFraud_ReasonWeighted_RiskPrev_AttendeesGrowthAudit_Timestamp
1TechexpoBANGALORENOLow05FEB2026:09:30:0030000215030.0180000005FEB2026FEB202625200.000Low Risk60.0009NO24142.YES0.Normal78.8..02MAR2026:20:03:28
2FoodcarnivalCHENNAINOLow15MAY2026:16:00:005000110010.050000015MAY2026MAY2026-7450.000Low Risk100.00016YES2424199NO0.Normal45.430000-2500002MAR2026:20:03:28
3PoliticalrallyDELHIYESHigh20APR2026:11:00:0020000025300191.71000000020APR2026APR2026-49666.667High Risk50.00011NO24216-25YES0.High Security Threat225.0500019500002MAR2026:20:03:28
4MusicfestHYDERABADNOMedium01JAN2026:10:00:0050000520050.0250000001JAN2026JAN202660250.000Low Risk50.00010NO24107-109YES5000050000Normal92.0200000-15000002MAR2026:20:03:28
5CricketfinalMUMBAIYESMedium10MAR2026:14:00:001200001550099.0800000010MAR2026MAR2026-8240.000High Risk66.66714NO2417568NO100000100000Normal98.0500007000002MAR2026:20:03:28
6ArtexpoPUNENOLow25JUN2026:10:30:0015000.80.60000025JUN2026JUN2026-115187.500.40.00010NO24282107NO0.Normal.120000-10500002MAR2026:20:03:28

Numeric & Character Functions 

·       ABS()

·       ROUND()

·       COALESCE()

·       STRIP()

·       TRIM()

·       CAT()

·       CATX()

·       PROPCASE()

·       UPCASE()

·       LOWCASE()

Fraud Logic

Fraud if:

·       Risk Index > 90
OR

·       Security Incidents > 20

Explanation

DATA Step Philosophy

DATA step reads row by row. Similar to clinical ADaM derivation.

SET Statement

Used to read existing dataset.

LENGTH Statement

Defines character variable storage.

PROPCASE

Standardizes Event_Name (Musicfest → Musicfest)

STRIP

Removes trailing blanks.

UPCASE

Ensures Location consistency for merges.

ABS

Corrects negative attendance error.

COALESCE

Handles missing numeric values.

INTNX

Derives beginning of month.

INTCK

Calculates duration difference.

DATEPART

Extracts date from datetime.

ROUND

Controls decimal precision.

MACRO

Reusable logic block.

APPEND

Adds new records.

MERGE

Combines datasets by key.

TRANSPOSE

Converts rows to columns.

PROC DATASETS DELETE

Deletes datasets efficiently.

Common Errors Explained and Debugged

Error

Cause

Fix

Negative Attendees

Data entry error

ABS()

Risk > 100

Invalid scale

Cap at 100

Missing Incidents

Null value

COALESCE

Merge mismatch

Case sensitivity

UPCASE

Overwritten dataset

No delete

PROC DATASETS

Quality Control Checks

·       Frequency check for Fraud_Flag

·       Mean utilization

·       Check duplicates

·       Missing value check

·       Risk distribution

10 Key Points About This Project

1.     The project simulates a real-time public event monitoring system for large-scale crowd management.

2.     Intentional data errors such as negative attendance, missing incidents, and risk values above 100 were introduced and systematically corrected.

3.     Character standardization was implemented using STRIP, PROPCASE, UPCASE, LOWCASE, CAT, and CATX functions to ensure merge compatibility.

4.     Date functions like MDY, INTCK, INTNX, and DATEPART were used for event date derivations and rolling surveillance calculations.

5.     Staff utilization was calculated to determine operational efficiency and classify events into Low, Medium, or High utilization categories.

6.     A reusable fraud detection macro flagged events based on high risk index and abnormal security incidents.

7.     Revenue Per Capita analysis helped identify ticket fraud or illegal crowd entries.

8.     Rolling 30-day monitoring logic identified repeated high-risk events in short time windows.

9.     SQL cross-validation and hash lookup techniques improved validation speed and integrity.

10.  The system generated alert datasets and audit trails, ensuring traceability and compliance readiness.

Summary

This project demonstrates how SAS can be used to build a comprehensive large public event monitoring and fraud detection system. Starting from raw data with intentional errors, we applied structured debugging, cleaning, derivation logic, and validation checks. Advanced SAS techniques including macros, SQL, hash objects, rolling date logic, and classification models were integrated to create a scalable surveillance framework. Operational metrics such as staff utilization, revenue per capita, and crowd risk index were derived to support decision-making. The project also incorporated duplicate detection, integrity checks, and audit trails to ensure governance and compliance. Overall, it reflects a real-world operational intelligence system suitable for government, security, or enterprise risk management applications.

Conclusion

This integrated SAS project successfully designed, debugged, and optimized a large public event surveillance system. By combining structured dataset creation, intentional error correction, fraud detection macros, rolling risk monitoring, and operational performance metrics, we built a scalable and audit-ready analytical framework. Advanced programming techniques such as SET, MERGE, APPEND, PROC SQL, HASH objects, TRANSPOSE, and DATE functions strengthened both performance and validation. The inclusion of utilization classification, weighted risk modeling, and automated alerts enhanced real-world applicability. This project not only demonstrates strong SAS technical capability but also reflects business-oriented thinking in risk analytics, operational intelligence, and fraud detection making it highly valuable for interviews and enterprise-level implementation.


SAS INTERVIEW QUESTIONS

·  What is the difference between SET, MERGE, and UPDATE?

·  What happens if you do not use RUN statement?

·  How does SAS handle missing numeric values in comparisons?

--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

About the Author:

SAS Learning Hub is a data analytics and SAS programming platform focused on clinical, financial, and real-world data analysis. The content is created by professionals with academic training in Pharmaceutics and hands-on experience in Base SAS, PROC SQL, Macros, SDTM, and ADaM, providing practical and industry-relevant SAS learning resources.


Disclaimer:

The datasets and analysis in this article are created for educational and demonstration purposes only. They do not represent PUBLIC EVENT data.


Our Mission:

This blog provides industry-focused SAS programming tutorials and analytics projects covering finance, healthcare, and technology.


This project is suitable for:

·  Students learning SAS

·  Data analysts building portfolios

·  Professionals preparing for SAS interviews

·  Bloggers writing about analytics 

·  Clinical SAS Programmer

·  Research Data Analyst

·  Regulatory Data Validator

--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Follow Us On : 


 
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

--->Follow our blog for more SAS-based analytics projects and industry data models.

---> Support Us By Following Our Blog..

To deepen your understanding of SAS analytics, please refer to our other data science and industry-focused projects listed below:



--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

About Us | Contact Privacy Policy


Comments

Popular posts from this blog

412.Can We Build And Clean A University Course Analytics & Fraud Detection System In Sas Using Only Macros While Intentionally Creating And Fixing Errors?

416.Can We Design, Debug, Detect Fraud, and Build an ADaM-Ready Autonomous Drone Flight Analytics System Using Advanced SAS Programming Techniques?

409.Can We Build a Reliable Emergency Services Analytics & Fraud Detection System in SAS While Identifying and Fixing Intentional Errors?