409.Can We Build a Reliable Emergency Services Analytics & Fraud Detection System in SAS While Identifying and Fixing Intentional Errors?
Building a Reliable Emergency Services Analytics & Fraud Detection System in SAS Through Systematic Error Identification and Correction
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
HERE WE USED THESE SAS STATEMENTS AND FUNCTIONS FOR THIS PROJECT:
DATA STEP | SET | INPUT | DATALINES | LENGTH | FORMAT | IF-THEN-ELSE | MDY | INTCK | INTNX | STRIP | TRIM | CAT | CATX | PROPCASE | UPCASE | LOWCASE | COALESCE | MERGE | PROC SQL | PROC MEANS | PROC UNIVARIATE | PROC FREQ | PROC CORR | PROC SGPLOT | PROC TRANSPOSE | PROC APPEND | PROC DATASETS DELETE | %MACRO / %MEND
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
INTRODUCTION
Table of Contents
1.
Business Context
2.
Dataset Design
3.
Raw Dataset Creation (With Intentional Errors)
4.
Identifying the Errors
5.
Corrected Dataset Code (Full Version)
6.
Utilization Classification Macro
7.
Fraud Detection Logic
8.
Descriptive Analysis (PROC MEANS, UNIVARIATE)
9.
Correlation Analysis
10. Visualization
(PROC SGPLOT)
11. Data
Manipulation (APPEND, TRANSPOSE)
12. Why
Each Procedure Is Used
13. Business
Insights
14. Conclusion
Introduction:
Emergency services require accurate, reliable, and timely data to ensure
public safety and operational efficiency. This project builds a structured
SAS-based analytics and fraud detection system that simulates real-world
emergency operations, introduces intentional data errors, detects
inconsistencies, and corrects them using statistical validation, automation,
and logical programming techniques.
Emergency
services such as ambulance, fire, and police departments operate in
high-pressure environments where response time, staff readiness, equipment
availability, and incident handling capacity directly impact public safety.
In this
project, we will:
·
Create a custom emergency services dataset (with more than 12
observations).
·
Introduce intentional errors.
·
Detect and correct those errors.
Business Context
Emergency departments must answer:
· Are response
times within acceptable thresholds?
·
Is staff utilization optimized?
·
Are equipment failures causing delays?
·
Is there any fraudulent over-reporting
of incidents?
·
Are some cities under-resourced?
We simulate this environment using SAS programming.
1. Raw Dataset Creation (With Intentional Errors)
data emergency_raw;
length Service_Type $20 City $20 Equipment_Status $15;
format Service_Date date9.;
input Service_ID Service_Type $ City $ Response_Time_Minutes
Incidents_Handled Staff_Available Equipment_Status $
Readiness_Score Service_Date : date9.;
datalines;
101 ambulance hyderabad 8 20 15 functional 85 01JAN2026
102 Fire HYDERABAD 15 30 18 functional 90 05JAN2026
103 police mumbai -5 25 10 damaged 75 10JAN2026
104 Ambulance mumbai 12 200 5 functional 88 15JAN2026
105 fire chennai 20 18 12 missing 92 20JAN2026
106 police delhi 7 22 0 functional 80 25JAN2026
107 ambulance delhi 9 19 14 functional . 28JAN2026
108 fire kolkata 30 40 20 functional 95 30JAN2026
109 police kolkata 11 21 9 damaged 78 02FEB2026
110 ambulance chennai 14 16 13 functional 83 05FEB2026
111 fire mumbai 16 23 15 functional 89 08FEB2026
112 police hyderabad 13 17 11 functional 84 10FEB2026
113 ambulance mumbai 18 28 16 functional 91 12FEB2026
114 fire delhi 22 35 17 functional 94 14FEB2026
;
run;
proc print data=emergency_raw;
run;
OUTPUT:
| Obs | Service_Type | City | Equipment_Status | Service_Date | Service_ID | Response_Time_Minutes | Incidents_Handled | Staff_Available | Readiness_Score |
|---|---|---|---|---|---|---|---|---|---|
| 1 | ambulance | hyderabad | functional | 01JAN2026 | 101 | 8 | 20 | 15 | 85 |
| 2 | Fire | HYDERABAD | functional | 05JAN2026 | 102 | 15 | 30 | 18 | 90 |
| 3 | police | mumbai | damaged | 10JAN2026 | 103 | -5 | 25 | 10 | 75 |
| 4 | Ambulance | mumbai | functional | 15JAN2026 | 104 | 12 | 200 | 5 | 88 |
| 5 | fire | chennai | missing | 20JAN2026 | 105 | 20 | 18 | 12 | 92 |
| 6 | police | delhi | functional | 25JAN2026 | 106 | 7 | 22 | 0 | 80 |
| 7 | ambulance | delhi | functional | 28JAN2026 | 107 | 9 | 19 | 14 | . |
| 8 | fire | kolkata | functional | 30JAN2026 | 108 | 30 | 40 | 20 | 95 |
| 9 | police | kolkata | damaged | 02FEB2026 | 109 | 11 | 21 | 9 | 78 |
| 10 | ambulance | chennai | functional | 05FEB2026 | 110 | 14 | 16 | 13 | 83 |
| 11 | fire | mumbai | functional | 08FEB2026 | 111 | 16 | 23 | 15 | 89 |
| 12 | police | hyderabad | functional | 10FEB2026 | 112 | 13 | 17 | 11 | 84 |
| 13 | ambulance | mumbai | functional | 12FEB2026 | 113 | 18 | 28 | 16 | 91 |
| 14 | fire | delhi | functional | 14FEB2026 | 114 | 22 | 35 | 17 | 94 |
Intentional Errors Introduced
- Negative response time (-5).
- Incidents_Handled = 200
(outlier).
- Staff_Available = 0.
- Equipment_Status =
"missing".
- Readiness_Score = missing.
- Mixed case inconsistencies.
- Logical inconsistency: High
readiness but equipment damaged.
2. Identifying Errors
proc means data=emergency_raw n mean min max;
var Response_Time_Minutes Incidents_Handled Staff_Available Readiness_Score;
run;
OUTPUT:
The MEANS Procedure
| Variable | N | Mean | Minimum | Maximum |
|---|---|---|---|---|
Response_Time_Minutes Incidents_Handled Staff_Available Readiness_Score | 14 14 14 13 | 13.5714286 36.7142857 12.5000000 86.4615385 | -5.0000000 16.0000000 0 75.0000000 | 30.0000000 200.0000000 20.0000000 95.0000000 |
proc freq data=emergency_raw;
tables Equipment_Status Service_Type City / missing;
run;
OUTPUT:
The FREQ Procedure
| Equipment_Status | Frequency | Percent | Cumulative Frequency | Cumulative Percent |
|---|---|---|---|---|
| damaged | 2 | 14.29 | 2 | 14.29 |
| functional | 11 | 78.57 | 13 | 92.86 |
| missing | 1 | 7.14 | 14 | 100.00 |
| Service_Type | Frequency | Percent | Cumulative Frequency | Cumulative Percent |
|---|---|---|---|---|
| Ambulance | 1 | 7.14 | 1 | 7.14 |
| Fire | 1 | 7.14 | 2 | 14.29 |
| ambulance | 4 | 28.57 | 6 | 42.86 |
| fire | 4 | 28.57 | 10 | 71.43 |
| police | 4 | 28.57 | 14 | 100.00 |
| City | Frequency | Percent | Cumulative Frequency | Cumulative Percent |
|---|---|---|---|---|
| HYDERABAD | 1 | 7.14 | 1 | 7.14 |
| chennai | 2 | 14.29 | 3 | 21.43 |
| delhi | 3 | 21.43 | 6 | 42.86 |
| hyderabad | 2 | 14.29 | 8 | 57.14 |
| kolkata | 2 | 14.29 | 10 | 71.43 |
| mumbai | 4 | 28.57 | 14 | 100.00 |
·
Identify equipment inconsistencies
·
Verify service type standardization
·
Detect rare suspicious categories
proc univariate data=emergency_raw;
var Incidents_Handled Response_Time_Minutes;
run;
OUTPUT:
The UNIVARIATE Procedure
Variable: Incidents_Handled
| Moments | |||
|---|---|---|---|
| N | 14 | Sum Weights | 14 |
| Mean | 36.7142857 | Sum Observations | 514 |
| Std Deviation | 47.5126357 | Variance | 2257.45055 |
| Skewness | 3.60459 | Kurtosis | 13.2493791 |
| Uncorrected SS | 48218 | Corrected SS | 29346.8571 |
| Coeff Variation | 129.411848 | Std Error Mean | 12.698286 |
| Basic Statistical Measures | |||
|---|---|---|---|
| Location | Variability | ||
| Mean | 36.71429 | Std Deviation | 47.51264 |
| Median | 22.50000 | Variance | 2257 |
| Mode | . | Range | 184.00000 |
| Interquartile Range | 11.00000 | ||
| Tests for Location: Mu0=0 | ||||
|---|---|---|---|---|
| Test | Statistic | p Value | ||
| Student's t | t | 2.891279 | Pr > |t| | 0.0126 |
| Sign | M | 7 | Pr >= |M| | 0.0001 |
| Signed Rank | S | 52.5 | Pr >= |S| | 0.0001 |
| Quantiles (Definition 5) | |
|---|---|
| Level | Quantile |
| 100% Max | 200.0 |
| 99% | 200.0 |
| 95% | 200.0 |
| 90% | 40.0 |
| 75% Q3 | 30.0 |
| 50% Median | 22.5 |
| 25% Q1 | 19.0 |
| 10% | 17.0 |
| 5% | 16.0 |
| 1% | 16.0 |
| 0% Min | 16.0 |
| Extreme Observations | |||
|---|---|---|---|
| Lowest | Highest | ||
| Value | Obs | Value | Obs |
| 16 | 10 | 28 | 13 |
| 17 | 12 | 30 | 2 |
| 18 | 5 | 35 | 14 |
| 19 | 7 | 40 | 8 |
| 20 | 1 | 200 | 4 |
The UNIVARIATE Procedure
Variable: Response_Time_Minutes
| Moments | |||
|---|---|---|---|
| N | 14 | Sum Weights | 14 |
| Mean | 13.5714286 | Sum Observations | 190 |
| Std Deviation | 8.13079886 | Variance | 66.1098901 |
| Skewness | -0.2631101 | Kurtosis | 1.82343499 |
| Uncorrected SS | 3438 | Corrected SS | 859.428571 |
| Coeff Variation | 59.9111495 | Std Error Mean | 2.1730474 |
| Basic Statistical Measures | |||
|---|---|---|---|
| Location | Variability | ||
| Mean | 13.57143 | Std Deviation | 8.13080 |
| Median | 13.50000 | Variance | 66.10989 |
| Mode | . | Range | 35.00000 |
| Interquartile Range | 9.00000 | ||
| Tests for Location: Mu0=0 | ||||
|---|---|---|---|---|
| Test | Statistic | p Value | ||
| Student's t | t | 6.245344 | Pr > |t| | <.0001 |
| Sign | M | 6 | Pr >= |M| | 0.0018 |
| Signed Rank | S | 51.5 | Pr >= |S| | 0.0002 |
| Quantiles (Definition 5) | |
|---|---|
| Level | Quantile |
| 100% Max | 30.0 |
| 99% | 30.0 |
| 95% | 30.0 |
| 90% | 22.0 |
| 75% Q3 | 18.0 |
| 50% Median | 13.5 |
| 25% Q1 | 9.0 |
| 10% | 7.0 |
| 5% | -5.0 |
| 1% | -5.0 |
| 0% Min | -5.0 |
| Extreme Observations | |||
|---|---|---|---|
| Lowest | Highest | ||
| Value | Obs | Value | Obs |
| -5 | 3 | 16 | 11 |
| 7 | 6 | 18 | 13 |
| 8 | 1 | 20 | 5 |
| 9 | 7 | 22 | 14 |
| 11 | 9 | 30 | 8 |
Helps identify:
·
Skewness
·
Extreme observations
·
Distribution shape
Useful for operational monitoring.
3. Corrected Dataset (Full Clean Code)
data emergency_clean;
set emergency_raw;
Service_Type = propcase(strip(Service_Type));
City = propcase(strip(City));
Equipment_Status = lowcase(strip(Equipment_Status));
if Response_Time_Minutes < 0 then Response_Time_Minutes = .;
if Incidents_Handled > 100 then Incidents_Handled = 100;
if Staff_Available = 0 then Staff_Available = .;
if Equipment_Status = "missing" then Equipment_Status = "damaged";
Readiness_Score = coalesce(Readiness_Score, 80);
Month_Start = intnx('month', Service_Date, 0, 'b');
Month_End = intnx('month', Service_Date, 0, 'e');
Days_From_Start = intck('day', mdy(1,1,2026), Service_Date);
format Month_Start Month_End date9.;
run;
proc print data=emergency_clean;
run;
OUTPUT:
| Obs | Service_Type | City | Equipment_Status | Service_Date | Service_ID | Response_Time_Minutes | Incidents_Handled | Staff_Available | Readiness_Score | Month_Start | Month_End | Days_From_Start |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | Ambulance | Hyderabad | functional | 01JAN2026 | 101 | 8 | 20 | 15 | 85 | 01JAN2026 | 31JAN2026 | 0 |
| 2 | Fire | Hyderabad | functional | 05JAN2026 | 102 | 15 | 30 | 18 | 90 | 01JAN2026 | 31JAN2026 | 4 |
| 3 | Police | Mumbai | damaged | 10JAN2026 | 103 | . | 25 | 10 | 75 | 01JAN2026 | 31JAN2026 | 9 |
| 4 | Ambulance | Mumbai | functional | 15JAN2026 | 104 | 12 | 100 | 5 | 88 | 01JAN2026 | 31JAN2026 | 14 |
| 5 | Fire | Chennai | damaged | 20JAN2026 | 105 | 20 | 18 | 12 | 92 | 01JAN2026 | 31JAN2026 | 19 |
| 6 | Police | Delhi | functional | 25JAN2026 | 106 | 7 | 22 | . | 80 | 01JAN2026 | 31JAN2026 | 24 |
| 7 | Ambulance | Delhi | functional | 28JAN2026 | 107 | 9 | 19 | 14 | 80 | 01JAN2026 | 31JAN2026 | 27 |
| 8 | Fire | Kolkata | functional | 30JAN2026 | 108 | 30 | 40 | 20 | 95 | 01JAN2026 | 31JAN2026 | 29 |
| 9 | Police | Kolkata | damaged | 02FEB2026 | 109 | 11 | 21 | 9 | 78 | 01FEB2026 | 28FEB2026 | 32 |
| 10 | Ambulance | Chennai | functional | 05FEB2026 | 110 | 14 | 16 | 13 | 83 | 01FEB2026 | 28FEB2026 | 35 |
| 11 | Fire | Mumbai | functional | 08FEB2026 | 111 | 16 | 23 | 15 | 89 | 01FEB2026 | 28FEB2026 | 38 |
| 12 | Police | Hyderabad | functional | 10FEB2026 | 112 | 13 | 17 | 11 | 84 | 01FEB2026 | 28FEB2026 | 40 |
| 13 | Ambulance | Mumbai | functional | 12FEB2026 | 113 | 18 | 28 | 16 | 91 | 01FEB2026 | 28FEB2026 | 42 |
| 14 | Fire | Delhi | functional | 14FEB2026 | 114 | 22 | 35 | 17 | 94 | 01FEB2026 | 28FEB2026 | 44 |
·
Conditional logic (IF-THEN)
·
Missing value handling
·
Value capping
·
Logical validation
Functions used:
·
STRIP( )
→ removes leading/trailing spaces
·
PROPCASE( )
→ proper case formatting
·
LOWCASE( ) / UPCASE( )
→ consistency
· COALESCE ( )
used to replace missing readiness scores logically instead of deleting records.
Date functions applied:
·
MDY( )
→ construct dates
·
INTCK( )
→ calculate duration
·
INTNX( )
→ derive month start/end
4. Utilization Classification Macro
%macro utilization;
data emergency_util;
set emergency_clean;
length Utilization_Class $8.;
Utilization_Rate = Incidents_Handled / Staff_Available;
if Utilization_Rate >= 2 then Utilization_Class="High";
else if Utilization_Rate >=1 then Utilization_Class="Medium";
else Utilization_Class="Low";
run;
proc print data=emergency_util;
run;
%mend;
%utilization;
OUTPUT:
| Obs | Service_Type | City | Equipment_Status | Service_Date | Service_ID | Response_Time_Minutes | Incidents_Handled | Staff_Available | Readiness_Score | Month_Start | Month_End | Days_From_Start | Utilization_Class | Utilization_Rate |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | Ambulance | Hyderabad | functional | 01JAN2026 | 101 | 8 | 20 | 15 | 85 | 01JAN2026 | 31JAN2026 | 0 | Medium | 1.3333 |
| 2 | Fire | Hyderabad | functional | 05JAN2026 | 102 | 15 | 30 | 18 | 90 | 01JAN2026 | 31JAN2026 | 4 | Medium | 1.6667 |
| 3 | Police | Mumbai | damaged | 10JAN2026 | 103 | . | 25 | 10 | 75 | 01JAN2026 | 31JAN2026 | 9 | High | 2.5000 |
| 4 | Ambulance | Mumbai | functional | 15JAN2026 | 104 | 12 | 100 | 5 | 88 | 01JAN2026 | 31JAN2026 | 14 | High | 20.0000 |
| 5 | Fire | Chennai | damaged | 20JAN2026 | 105 | 20 | 18 | 12 | 92 | 01JAN2026 | 31JAN2026 | 19 | Medium | 1.5000 |
| 6 | Police | Delhi | functional | 25JAN2026 | 106 | 7 | 22 | . | 80 | 01JAN2026 | 31JAN2026 | 24 | Low | . |
| 7 | Ambulance | Delhi | functional | 28JAN2026 | 107 | 9 | 19 | 14 | 80 | 01JAN2026 | 31JAN2026 | 27 | Medium | 1.3571 |
| 8 | Fire | Kolkata | functional | 30JAN2026 | 108 | 30 | 40 | 20 | 95 | 01JAN2026 | 31JAN2026 | 29 | High | 2.0000 |
| 9 | Police | Kolkata | damaged | 02FEB2026 | 109 | 11 | 21 | 9 | 78 | 01FEB2026 | 28FEB2026 | 32 | High | 2.3333 |
| 10 | Ambulance | Chennai | functional | 05FEB2026 | 110 | 14 | 16 | 13 | 83 | 01FEB2026 | 28FEB2026 | 35 | Medium | 1.2308 |
| 11 | Fire | Mumbai | functional | 08FEB2026 | 111 | 16 | 23 | 15 | 89 | 01FEB2026 | 28FEB2026 | 38 | Medium | 1.5333 |
| 12 | Police | Hyderabad | functional | 10FEB2026 | 112 | 13 | 17 | 11 | 84 | 01FEB2026 | 28FEB2026 | 40 | Medium | 1.5455 |
| 13 | Ambulance | Mumbai | functional | 12FEB2026 | 113 | 18 | 28 | 16 | 91 | 01FEB2026 | 28FEB2026 | 42 | Medium | 1.7500 |
| 14 | Fire | Delhi | functional | 14FEB2026 | 114 | 22 | 35 | 17 | 94 | 01FEB2026 | 28FEB2026 | 44 | High | 2.0588 |
5. Fraud Detection Macro
%macro fraud_check;
data emergency_fraud;
set emergency_util;
Fraud_Flag=0;
if Response_Time_Minutes < 3 and Incidents_Handled > 30 then Fraud_Flag=1;
if Equipment_Status="damaged" and Readiness_Score > 90 then Fraud_Flag=1;
if Staff_Available=. and Incidents_Handled>20 then Fraud_Flag=1;
run;
proc print data=emergency_fraud;
run;
%mend;
%fraud_check;
OUTPUT:
| Obs | Service_Type | City | Equipment_Status | Service_Date | Service_ID | Response_Time_Minutes | Incidents_Handled | Staff_Available | Readiness_Score | Month_Start | Month_End | Days_From_Start | Utilization_Class | Utilization_Rate | Fraud_Flag |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | Ambulance | Hyderabad | functional | 01JAN2026 | 101 | 8 | 20 | 15 | 85 | 01JAN2026 | 31JAN2026 | 0 | Medium | 1.3333 | 0 |
| 2 | Fire | Hyderabad | functional | 05JAN2026 | 102 | 15 | 30 | 18 | 90 | 01JAN2026 | 31JAN2026 | 4 | Medium | 1.6667 | 0 |
| 3 | Police | Mumbai | damaged | 10JAN2026 | 103 | . | 25 | 10 | 75 | 01JAN2026 | 31JAN2026 | 9 | High | 2.5000 | 0 |
| 4 | Ambulance | Mumbai | functional | 15JAN2026 | 104 | 12 | 100 | 5 | 88 | 01JAN2026 | 31JAN2026 | 14 | High | 20.0000 | 0 |
| 5 | Fire | Chennai | damaged | 20JAN2026 | 105 | 20 | 18 | 12 | 92 | 01JAN2026 | 31JAN2026 | 19 | Medium | 1.5000 | 1 |
| 6 | Police | Delhi | functional | 25JAN2026 | 106 | 7 | 22 | . | 80 | 01JAN2026 | 31JAN2026 | 24 | Low | . | 1 |
| 7 | Ambulance | Delhi | functional | 28JAN2026 | 107 | 9 | 19 | 14 | 80 | 01JAN2026 | 31JAN2026 | 27 | Medium | 1.3571 | 0 |
| 8 | Fire | Kolkata | functional | 30JAN2026 | 108 | 30 | 40 | 20 | 95 | 01JAN2026 | 31JAN2026 | 29 | High | 2.0000 | 0 |
| 9 | Police | Kolkata | damaged | 02FEB2026 | 109 | 11 | 21 | 9 | 78 | 01FEB2026 | 28FEB2026 | 32 | High | 2.3333 | 0 |
| 10 | Ambulance | Chennai | functional | 05FEB2026 | 110 | 14 | 16 | 13 | 83 | 01FEB2026 | 28FEB2026 | 35 | Medium | 1.2308 | 0 |
| 11 | Fire | Mumbai | functional | 08FEB2026 | 111 | 16 | 23 | 15 | 89 | 01FEB2026 | 28FEB2026 | 38 | Medium | 1.5333 | 0 |
| 12 | Police | Hyderabad | functional | 10FEB2026 | 112 | 13 | 17 | 11 | 84 | 01FEB2026 | 28FEB2026 | 40 | Medium | 1.5455 | 0 |
| 13 | Ambulance | Mumbai | functional | 12FEB2026 | 113 | 18 | 28 | 16 | 91 | 01FEB2026 | 28FEB2026 | 42 | Medium | 1.7500 | 0 |
| 14 | Fire | Delhi | functional | 14FEB2026 | 114 | 22 | 35 | 17 | 94 | 01FEB2026 | 28FEB2026 | 44 | High | 2.0588 | 0 |
Fraud logic:
·
Response_Time < 3 minutes but incidents > 30.
·
Equipment damaged but readiness > 90.
·
Staff missing but high incidents.
6. Statistical Analysis
PROC MEANS
proc means data=emergency_fraud mean median std min max;
class Service_Type;
var Response_Time_Minutes Incidents_Handled Readiness_Score;
run;
OUTPUT:
The MEANS Procedure
| Service_Type | N Obs | Variable | Mean | Median | Std Dev | Minimum | Maximum |
|---|---|---|---|---|---|---|---|
| Ambulance | 5 | Response_Time_Minutes Incidents_Handled Readiness_Score | 12.2000000 36.6000000 85.4000000 | 12.0000000 20.0000000 85.0000000 | 4.0249224 35.7183426 4.2778499 | 8.0000000 16.0000000 80.0000000 | 18.0000000 100.0000000 91.0000000 |
| Fire | 5 | Response_Time_Minutes Incidents_Handled Readiness_Score | 20.6000000 29.2000000 92.0000000 | 20.0000000 30.0000000 92.0000000 | 5.9833101 8.8713020 2.5495098 | 15.0000000 18.0000000 89.0000000 | 30.0000000 40.0000000 95.0000000 |
| Police | 4 | Response_Time_Minutes Incidents_Handled Readiness_Score | 10.3333333 21.2500000 79.2500000 | 11.0000000 21.5000000 79.0000000 | 3.0550505 3.3040379 3.7749172 | 7.0000000 17.0000000 75.0000000 | 13.0000000 25.0000000 84.0000000 |
·
Mean
·
Median
·
Standard Deviation
·
Min/Max
Used to detect range errors and abnormal values.
PROC CORR
proc corr data=emergency_fraud;
var Response_Time_Minutes Incidents_Handled Staff_Available Readiness_Score;
run;
OUTPUT:
The CORR Procedure
| 4 Variables: | Response_Time_Minutes Incidents_Handled Staff_Available Readiness_Score |
|---|
| Simple Statistics | ||||||
|---|---|---|---|---|---|---|
| Variable | N | Mean | Std Dev | Sum | Minimum | Maximum |
| Response_Time_Minutes | 13 | 15.00000 | 6.37704 | 195.00000 | 7.00000 | 30.00000 |
| Incidents_Handled | 14 | 29.57143 | 21.43941 | 414.00000 | 16.00000 | 100.00000 |
| Staff_Available | 13 | 13.46154 | 4.07462 | 175.00000 | 5.00000 | 20.00000 |
| Readiness_Score | 14 | 86.00000 | 6.23884 | 1204 | 75.00000 | 95.00000 |
| Pearson Correlation Coefficients Prob > |r| under H0: Rho=0 Number of Observations | ||||
|---|---|---|---|---|
| Response_Time_Minutes | Incidents_Handled | Staff_Available | Readiness_Score | |
| Response_Time_Minutes | 1.00000 13 | 0.10443 0.7342 13 | 0.56223 0.0571 12 | 0.84777 0.0003 13 |
| Incidents_Handled | 0.10443 0.7342 13 | 1.00000 14 | -0.39884 0.1770 13 | 0.27950 0.3332 14 |
| Staff_Available | 0.56223 0.0571 12 | -0.39884 0.1770 13 | 1.00000 13 | 0.56450 0.0444 13 |
| Readiness_Score | 0.84777 0.0003 13 | 0.27950 0.3332 14 | 0.56450 0.0444 13 | 1.00000 14 |
·
Response time
·
Incidents handled
·
Staff availability
·
Readiness score
Supports operational optimization.
7. Visualization
proc sgplot data=emergency_fraud;
scatter x=Response_Time_Minutes y=Incidents_Handled;
run;
OUTPUT:
proc sgplot data=emergency_fraud;
vbar Service_Type / response=Readiness_Score stat=mean;
run;
OUTPUT:
8. PROC SQL Usage
proc sql;
create table city_summary as
select City,
mean(Response_Time_Minutes) as Avg_Response,
sum(Incidents_Handled) as Total_Incidents
from emergency_fraud
group by City;
quit;
proc print data=city_summary;
run;
OUTPUT:
| Obs | City | Avg_Response | Total_Incidents |
|---|---|---|---|
| 1 | Chennai | 17.0000 | 34 |
| 2 | Delhi | 12.6667 | 76 |
| 3 | Hyderabad | 12.0000 | 67 |
| 4 | Kolkata | 20.5000 | 61 |
| 5 | Mumbai | 15.3333 | 176 |
City-level summary:
·
Average response time
·
Total incidents
Efficient for reporting and dashboard preparation.
9. PROC TRANSPOSE
proc transpose data=city_summary out=city_transpose;
by City NotSorted;
run;
proc print data=city_transpose;
run;
OUTPUT:
| Obs | City | _NAME_ | COL1 |
|---|---|---|---|
| 1 | Chennai | Avg_Response | 17.000 |
| 2 | Chennai | Total_Incidents | 34.000 |
| 3 | Delhi | Avg_Response | 12.667 |
| 4 | Delhi | Total_Incidents | 76.000 |
| 5 | Hyderabad | Avg_Response | 12.000 |
| 6 | Hyderabad | Total_Incidents | 67.000 |
| 7 | Kolkata | Avg_Response | 20.500 |
| 8 | Kolkata | Total_Incidents | 61.000 |
| 9 | Mumbai | Avg_Response | 15.333 |
| 10 | Mumbai | Total_Incidents | 176.000 |
10. PROC APPEND
proc append base=emergency_clean
data=emergency_raw force;
run;
proc print data=emergency_raw;
run;
OUTPUT:
| Obs | Service_Type | City | Equipment_Status | Service_Date | Service_ID | Response_Time_Minutes | Incidents_Handled | Staff_Available | Readiness_Score |
|---|---|---|---|---|---|---|---|---|---|
| 1 | ambulance | hyderabad | functional | 01JAN2026 | 101 | 8 | 20 | 15 | 85 |
| 2 | Fire | HYDERABAD | functional | 05JAN2026 | 102 | 15 | 30 | 18 | 90 |
| 3 | police | mumbai | damaged | 10JAN2026 | 103 | -5 | 25 | 10 | 75 |
| 4 | Ambulance | mumbai | functional | 15JAN2026 | 104 | 12 | 200 | 5 | 88 |
| 5 | fire | chennai | missing | 20JAN2026 | 105 | 20 | 18 | 12 | 92 |
| 6 | police | delhi | functional | 25JAN2026 | 106 | 7 | 22 | 0 | 80 |
| 7 | ambulance | delhi | functional | 28JAN2026 | 107 | 9 | 19 | 14 | . |
| 8 | fire | kolkata | functional | 30JAN2026 | 108 | 30 | 40 | 20 | 95 |
| 9 | police | kolkata | damaged | 02FEB2026 | 109 | 11 | 21 | 9 | 78 |
| 10 | ambulance | chennai | functional | 05FEB2026 | 110 | 14 | 16 | 13 | 83 |
| 11 | fire | mumbai | functional | 08FEB2026 | 111 | 16 | 23 | 15 | 89 |
| 12 | police | hyderabad | functional | 10FEB2026 | 112 | 13 | 17 | 11 | 84 |
| 13 | ambulance | mumbai | functional | 12FEB2026 | 113 | 18 | 28 | 16 | 91 |
| 14 | fire | delhi | functional | 14FEB2026 | 114 | 22 | 35 | 17 | 94 |
·
Vertical concatenation
·
Horizontal joining
·
Incremental data updates
11. PROC DATASETS DELETE
proc datasets library=work;
delete emergency_raw;
quit;
LOG:
12. Why Each Procedure Is Used
DATA STEP
Used for row-level transformation and error
correction.
SET
Reads dataset sequentially.
MERGE
Combines datasets by key variable.
PROC SQL
Advanced aggregation and joins.
PROC MEANS
Summary statistics.
PROC UNIVARIATE
Detect skewness and outliers.
PROC FREQ
Categorical distribution.
PROC CORR
Correlation matrix.
PROC SGPLOT
Data visualization.
MACROS
Reusable automation.
·
Defined
a Clear Business Objective – The project focuses on measuring
emergency service efficiency and detecting potential fraud or operational
inconsistencies.
·
Designed
a Realistic Emergency Dataset – Included Ambulance, Fire, and Police services
operating across multiple cities with measurable performance indicators.
·
Selected
Meaningful Operational Variables – Used Response Time, Incidents
Handled, Staff Available, Equipment Status, Readiness Score, and Service Date.
·
Structured
Dataset with Proper Attributes – Applied correct variable types,
length statements, and date formats to avoid structural errors.
·
Inserted
Intentional Errors – Introduced negative response times, extreme
outliers, missing values, and logical inconsistencies to simulate real-world
dirty data.
·
Performed
Initial Data Profiling – Used statistical summaries to detect abnormal
minimum, maximum, and missing values.
·
Validated
Categorical Variables – Checked service types, equipment status, and
city names for inconsistencies.
·
Identified
Outliers and Distribution Issues – Analyzed spread and unusual values
to detect unrealistic operational records.
·
Standardized
Character Data – Cleaned text using functions like STRIP, PROPCASE,
UPCASE, and LOWCASE for consistency.
·
Corrected
Logical Numeric Errors – Applied conditional rules to fix negative
response times and unrealistic incident counts.
·
Handled
Missing Data Strategically – Used COALESCE and business logic instead
of deleting records to preserve data integrity.
·
Applied
Date Intelligence Functions – Used MDY, INTCK, and INTNX for
time-based calculations and monthly aggregation.
·
Calculated
Workforce Utilization Rate – Derived Incidents ÷ Staff to measure
operational load efficiency.
·
Classified
Utilization Using Macro Logic – Automated categorization into High,
Medium, and Low workload levels.
·
Designed
Fraud Detection Rules – Flagged unrealistic operational combinations
such as damaged equipment with high readiness.
·
Automated
Fraud Flagging – Implemented reusable macro logic to identify
suspicious records.
·
Performed
Correlation Analysis – Examined relationships between staffing,
response time, and readiness scores.
·
Generated
Aggregated Reports – Used SQL summarization to produce city-level and
service-level insights.
·
Created
Visualizations for Management – Developed scatter plots and bar charts
to clearly communicate patterns.
·
Built
an End-to-End Reliable Framework – Integrated data cleaning,
validation, fraud detection, automation, statistical analysis, and reporting
into a complete SAS-based system.
14. Business Insights
·
Mumbai shows highest incident overload.
·
Kolkata has longer response time.
·
Damaged equipment correlated with low
readiness.
15. Conclusion
This
project demonstrates that a reliable Emergency Services Analytics and Fraud
Detection System can be successfully built using SAS when structured validation
and systematic error correction techniques are applied. By designing a realistic
dataset, intentionally inserting data quality issues, and then identifying and
correcting those errors, we simulated real-world operational challenges.
Statistical procedures such as summary analysis, distribution checks, and
correlation evaluation ensured data accuracy and reliability. Character and
numeric validation improved consistency, while date functions enhanced
time-based intelligence. The use of macros automated utilization classification
and fraud detection logic, making the system scalable and reusable.
Visualization and aggregation supported management-level decision-making. Most
importantly, the project highlighted how data cleaning, anomaly detection, and
logical validation are critical in high-risk domains like emergency services. Overall,
this end-to-end framework proves that SAS provides powerful tools to build
accurate, efficient, and fraud-aware emergency operations analytics systems.
SAS INTERVIEW QUESTIONS
·
What is ADSL?
·
What is ADAE?
·
What is the purpose of TLFs?
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
About the Author:
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
About the Author:
SAS Learning Hub is a data analytics and SAS programming platform focused on clinical, financial, and real-world data analysis. The content is created by professionals with academic training in Pharmaceutics and hands-on experience in Base SAS, PROC SQL, Macros, SDTM, and ADaM, providing practical and industry-relevant SAS learning resources.
Disclaimer:
The datasets and analysis in this article are created for educational and demonstration purposes only. They do not represent EMERGENCY SERVICES data.
Our Mission:
This blog provides industry-focused SAS programming tutorials and analytics projects covering finance, healthcare, and technology.
This project is suitable for:
· Students learning SAS
· Data analysts building portfolios
· Professionals preparing for SAS interviews
· Bloggers writing about analytics
· Clinical SAS Programmer
· Research Data Analyst
· Regulatory Data Validator
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Comments
Post a Comment