414.Can We Design, Debug, and Detect Fraud in a Smart Grid Load Balancing System Using Advanced SAS Programming Techniques?
Designing, Debugging, and Detecting Fraud in a Smart Grid Load Balancing System Using Advanced SAS
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
HERE IN THIS PROJECT WE USED THESE SAS STATEMENTS —DATA STEP | INPUT | DATALINES | IF-THEN-ELSE | OUTPUT | DELETE | SET | MERGE | PROC SORT | PROC APPEND | PROC TRANSPOSE | PROC MEANS | PROC FREQ | PROC SQL | PROC REPORT | PROC COMPARE | %MACRO | %MEND
Introduction
In
today’s energy-driven world, smart grid systems play a critical role in
balancing electricity demand and supply. Grid load balancing ensures stable
power distribution across regions by monitoring demand (MW), renewable
contribution, storage usage, system frequency deviation, and outage events.
However, real-world datasets often contain intentional or unintentional errors
such as invalid date formats, incorrect merges, misclassified utilization
levels, inconsistent character formats, and fraudulent manipulations in
reporting, validation
layers, QC comparisons, advanced macro automation, rolling metrics, anomaly
scoring, and audit-ready reporting structure.
In this
project, we will:
- Create a Grid Load
Balancing dataset
- Introduce intentional
errors
- Detect and correct those
errors
- Use SAS macros for fraud
detection
- Apply date functions
(MDY, INTCK, INTNX)
- Use SET, MERGE, APPEND,
TRANSPOSE
- Apply numeric and
character functions
- Use PROC DATASETS DELETE
- Provide full explanation of
every step
Table of Contents
- Business Context
- Dataset Design
- Intentional Errors
- Full Corrected SAS Code
- Macro Fraud Detection Logic
- Date Handling (MDY, INTCK,
INTNX)
- Data Manipulation (SET,
MERGE, APPEND, TRANSPOSE)
- Character Functions
- Numeric Functions
- Classification Logic
- Error Debugging Section
- Rolling 3-Month Stability Trend
- Advanced Fraud Scoring Model
- Duplicate Record Detection
- Data Validation Macro
- QC Comparison Program
- Risk Categorization Layer
- Outlier Detection Using Z-Score
- Monthly Aggregation Dataset
- Dynamic Reporting Macro
- Audit Trail Dataset
- Conclusion
Business Context
Electric
grid operators monitor:
- Demand in MW
- Renewable percentage
- Battery storage usage
- Frequency stability (50 Hz
standard in India)
- Outage counts
- Stability Index (calculated
score)
Fraud or
manipulation may occur when:
- Renewable share is inflated
- Storage usage misreported
- Frequency deviation hidden
- Dates manipulated
- Duplicate records appended
Build
system → Introduce errors → Detect → Correct → Final clean dataset.
1. Create Raw Dataset (With Intentional Errors)
data grid_load_raw;
input Grid_Region $ Demand_MW Renewable_Share Storage_Usage Frequency_Deviation
Outage_Events Stability_Index Month Day Year;
format Grid_Date date9.;
Grid_Date = mdy(Month, Day, Year);
Utilization_Percentage = Demand_MW / 1000 * 100;
datalines;
north 1200 45 30 0.02 1 85 01 15 2025
south 1500 60 40 0.05 2 90 02 20 2025
east 1800 110 50 0.10 3 70 03 25 2025
west 900 30 20 0.50 5 60 04 10 2025
central 1300 55 35 . 0 88 05 05 2025
north 1250 45 30 0.02 1 85 13 15 2025
south 1400 60 40 0.03 2 92 06 31 2025
east 2000 50 45 0.08 4 75 07 18 2025
west 950 35 25 0.02 1 89 08 22 2025
central 1600 65 50 0.01 0 95 09 12 2025
north 1700 48 38 0.03 2 82 10 28 2025
south 1800 70 55 0.09 3 78 11 15 2025
east 1900 52 42 0.04 1 91 12 05 2025
;
run;
proc print data=grid_load_raw;
run;
OUTPUT:
| Obs | Grid_Region | Demand_MW | Renewable_Share | Storage_Usage | Frequency_Deviation | Outage_Events | Stability_Index | Month | Day | Year | Grid_Date | Utilization_Percentage |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | north | 1200 | 45 | 30 | 0.02 | 1 | 85 | 1 | 15 | 2025 | 15JAN2025 | 120 |
| 2 | south | 1500 | 60 | 40 | 0.05 | 2 | 90 | 2 | 20 | 2025 | 20FEB2025 | 150 |
| 3 | east | 1800 | 110 | 50 | 0.10 | 3 | 70 | 3 | 25 | 2025 | 25MAR2025 | 180 |
| 4 | west | 900 | 30 | 20 | 0.50 | 5 | 60 | 4 | 10 | 2025 | 10APR2025 | 90 |
| 5 | central | 1300 | 55 | 35 | . | 0 | 88 | 5 | 5 | 2025 | 05MAY2025 | 130 |
| 6 | north | 1250 | 45 | 30 | 0.02 | 1 | 85 | 13 | 15 | 2025 | . | 125 |
| 7 | south | 1400 | 60 | 40 | 0.03 | 2 | 92 | 6 | 31 | 2025 | . | 140 |
| 8 | east | 2000 | 50 | 45 | 0.08 | 4 | 75 | 7 | 18 | 2025 | 18JUL2025 | 200 |
| 9 | west | 950 | 35 | 25 | 0.02 | 1 | 89 | 8 | 22 | 2025 | 22AUG2025 | 95 |
| 10 | central | 1600 | 65 | 50 | 0.01 | 0 | 95 | 9 | 12 | 2025 | 12SEP2025 | 160 |
| 11 | north | 1700 | 48 | 38 | 0.03 | 2 | 82 | 10 | 28 | 2025 | 28OCT2025 | 170 |
| 12 | south | 1800 | 70 | 55 | 0.09 | 3 | 78 | 11 | 15 | 2025 | 15NOV2025 | 180 |
| 13 | east | 1900 | 52 | 42 | 0.04 | 1 | 91 | 12 | 5 | 2025 | 05DEC2025 | 190 |
❌ Intentional Errors
·
Renewable_Share = 110 (Impossible)
·
Frequency_Deviation missing
·
Month = 13 (Invalid)
·
June 31 (Invalid date)
·
Utilization formula simplistic
·
No classification
·
Lowercase region inconsistencies
2. Clean & Correct Dataset
data grid_load_clean;
set grid_load_raw;
length Utilization_Class $12.;
Grid_Region = propcase(strip(Grid_Region));
Renewable_Share = min(Renewable_Share,100);
if missing(Frequency_Deviation) then Frequency_Deviation = 0;
if Month > 12 then delete;
if Month=6 and Day>30 then Day=30;
Grid_Date = mdy(Month, Day, Year);
Months_Since_Start = intck('month','01JAN2025'd,Grid_Date);
Next_Month = intnx('month',Grid_Date,1);
Utilization_Percentage = round((Demand_MW/2000)*100,0.1);
if Utilization_Percentage > 85 then Utilization_Class="High";
else if Utilization_Percentage > 60 then Utilization_Class="Medium";
else Utilization_Class="Low";
Fraud_Flag = 0;
if Renewable_Share > 95 and Stability_Index < 70 then Fraud_Flag=1;
if Frequency_Deviation > 0.3 then Fraud_Flag=1;
if Outage_Events > 4 and Stability_Index > 90 then Fraud_Flag=1;
run;
proc print data=grid_load_clean;
run;
OUTPUT:
| Obs | Grid_Region | Demand_MW | Renewable_Share | Storage_Usage | Frequency_Deviation | Outage_Events | Stability_Index | Month | Day | Year | Grid_Date | Utilization_Percentage | Utilization_Class | Months_Since_Start | Next_Month | Fraud_Flag |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | North | 1200 | 45 | 30 | 0.02 | 1 | 85 | 1 | 15 | 2025 | 15JAN2025 | 60.0 | Low | 0 | 23773 | 0 |
| 2 | South | 1500 | 60 | 40 | 0.05 | 2 | 90 | 2 | 20 | 2025 | 20FEB2025 | 75.0 | Medium | 1 | 23801 | 0 |
| 3 | East | 1800 | 100 | 50 | 0.10 | 3 | 70 | 3 | 25 | 2025 | 25MAR2025 | 90.0 | High | 2 | 23832 | 0 |
| 4 | West | 900 | 30 | 20 | 0.50 | 5 | 60 | 4 | 10 | 2025 | 10APR2025 | 45.0 | Low | 3 | 23862 | 1 |
| 5 | Central | 1300 | 55 | 35 | 0.00 | 0 | 88 | 5 | 5 | 2025 | 05MAY2025 | 65.0 | Medium | 4 | 23893 | 0 |
| 6 | South | 1400 | 60 | 40 | 0.03 | 2 | 92 | 6 | 30 | 2025 | 30JUN2025 | 70.0 | Medium | 5 | 23923 | 0 |
| 7 | East | 2000 | 50 | 45 | 0.08 | 4 | 75 | 7 | 18 | 2025 | 18JUL2025 | 100.0 | High | 6 | 23954 | 0 |
| 8 | West | 950 | 35 | 25 | 0.02 | 1 | 89 | 8 | 22 | 2025 | 22AUG2025 | 47.5 | Low | 7 | 23985 | 0 |
| 9 | Central | 1600 | 65 | 50 | 0.01 | 0 | 95 | 9 | 12 | 2025 | 12SEP2025 | 80.0 | Medium | 8 | 24015 | 0 |
| 10 | North | 1700 | 48 | 38 | 0.03 | 2 | 82 | 10 | 28 | 2025 | 28OCT2025 | 85.0 | Medium | 9 | 24046 | 0 |
| 11 | South | 1800 | 70 | 55 | 0.09 | 3 | 78 | 11 | 15 | 2025 | 15NOV2025 | 90.0 | High | 10 | 24076 | 0 |
| 12 | East | 1900 | 52 | 42 | 0.04 | 1 | 91 | 12 | 5 | 2025 | 05DEC2025 | 95.0 | High | 11 | 24107 | 0 |
set grid_load_raw;
Reads raw dataset.
propcase(strip())
Removes spaces and standardizes case.
min(Renewable_Share,100);
Caps impossible values.
missing()
Handles null frequency.
intck
Counts months difference.
intnx
Calculates next month date.
Classification logic
Categorizes grid load.
3. SeparateFraud Detection Macro
%macro fraud_check(dataset);
data fraud_flagged;
set &dataset;
Fraud_Flag = 0;
if Renewable_Share > 95 and Stability_Index < 70 then Fraud_Flag=1;
if Frequency_Deviation > 0.3 then Fraud_Flag=1;
if Outage_Events > 4 and Stability_Index > 90 then Fraud_Flag=1;
run;
proc print data=fraud_flagged;
run;
%mend;
%fraud_check(grid_load_clean);
OUTPUT:
| Obs | Grid_Region | Demand_MW | Renewable_Share | Storage_Usage | Frequency_Deviation | Outage_Events | Stability_Index | Month | Day | Year | Grid_Date | Utilization_Percentage | Utilization_Class | Months_Since_Start | Next_Month | Fraud_Flag |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | North | 1200 | 45 | 30 | 0.02 | 1 | 85 | 1 | 15 | 2025 | 15JAN2025 | 60.0 | Low | 0 | 23773 | 0 |
| 2 | South | 1500 | 60 | 40 | 0.05 | 2 | 90 | 2 | 20 | 2025 | 20FEB2025 | 75.0 | Medium | 1 | 23801 | 0 |
| 3 | East | 1800 | 100 | 50 | 0.10 | 3 | 70 | 3 | 25 | 2025 | 25MAR2025 | 90.0 | High | 2 | 23832 | 0 |
| 4 | West | 900 | 30 | 20 | 0.50 | 5 | 60 | 4 | 10 | 2025 | 10APR2025 | 45.0 | Low | 3 | 23862 | 1 |
| 5 | Central | 1300 | 55 | 35 | 0.00 | 0 | 88 | 5 | 5 | 2025 | 05MAY2025 | 65.0 | Medium | 4 | 23893 | 0 |
| 6 | South | 1400 | 60 | 40 | 0.03 | 2 | 92 | 6 | 30 | 2025 | 30JUN2025 | 70.0 | Medium | 5 | 23923 | 0 |
| 7 | East | 2000 | 50 | 45 | 0.08 | 4 | 75 | 7 | 18 | 2025 | 18JUL2025 | 100.0 | High | 6 | 23954 | 0 |
| 8 | West | 950 | 35 | 25 | 0.02 | 1 | 89 | 8 | 22 | 2025 | 22AUG2025 | 47.5 | Low | 7 | 23985 | 0 |
| 9 | Central | 1600 | 65 | 50 | 0.01 | 0 | 95 | 9 | 12 | 2025 | 12SEP2025 | 80.0 | Medium | 8 | 24015 | 0 |
| 10 | North | 1700 | 48 | 38 | 0.03 | 2 | 82 | 10 | 28 | 2025 | 28OCT2025 | 85.0 | Medium | 9 | 24046 | 0 |
| 11 | South | 1800 | 70 | 55 | 0.09 | 3 | 78 | 11 | 15 | 2025 | 15NOV2025 | 90.0 | High | 10 | 24076 | 0 |
| 12 | East | 1900 | 52 | 42 | 0.04 | 1 | 91 | 12 | 5 | 2025 | 05DEC2025 | 95.0 | High | 11 | 24107 | 0 |
·
Reusable logic
·
Scalable to multiple datasets
·
Centralized fraud rules
4. Append & Merge
data grid_backup;
set grid_load_clean;
run;
proc print data=grid_backup;
run;
OUTPUT:
| Obs | Grid_Region | Demand_MW | Renewable_Share | Storage_Usage | Frequency_Deviation | Outage_Events | Stability_Index | Month | Day | Year | Grid_Date | Utilization_Percentage | Utilization_Class | Months_Since_Start | Next_Month | Fraud_Flag |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | North | 1200 | 45 | 30 | 0.02 | 1 | 85 | 1 | 15 | 2025 | 15JAN2025 | 60.0 | Low | 0 | 23773 | 0 |
| 2 | South | 1500 | 60 | 40 | 0.05 | 2 | 90 | 2 | 20 | 2025 | 20FEB2025 | 75.0 | Medium | 1 | 23801 | 0 |
| 3 | East | 1800 | 100 | 50 | 0.10 | 3 | 70 | 3 | 25 | 2025 | 25MAR2025 | 90.0 | High | 2 | 23832 | 0 |
| 4 | West | 900 | 30 | 20 | 0.50 | 5 | 60 | 4 | 10 | 2025 | 10APR2025 | 45.0 | Low | 3 | 23862 | 1 |
| 5 | Central | 1300 | 55 | 35 | 0.00 | 0 | 88 | 5 | 5 | 2025 | 05MAY2025 | 65.0 | Medium | 4 | 23893 | 0 |
| 6 | South | 1400 | 60 | 40 | 0.03 | 2 | 92 | 6 | 30 | 2025 | 30JUN2025 | 70.0 | Medium | 5 | 23923 | 0 |
| 7 | East | 2000 | 50 | 45 | 0.08 | 4 | 75 | 7 | 18 | 2025 | 18JUL2025 | 100.0 | High | 6 | 23954 | 0 |
| 8 | West | 950 | 35 | 25 | 0.02 | 1 | 89 | 8 | 22 | 2025 | 22AUG2025 | 47.5 | Low | 7 | 23985 | 0 |
| 9 | Central | 1600 | 65 | 50 | 0.01 | 0 | 95 | 9 | 12 | 2025 | 12SEP2025 | 80.0 | Medium | 8 | 24015 | 0 |
| 10 | North | 1700 | 48 | 38 | 0.03 | 2 | 82 | 10 | 28 | 2025 | 28OCT2025 | 85.0 | Medium | 9 | 24046 | 0 |
| 11 | South | 1800 | 70 | 55 | 0.09 | 3 | 78 | 11 | 15 | 2025 | 15NOV2025 | 90.0 | High | 10 | 24076 | 0 |
| 12 | East | 1900 | 52 | 42 | 0.04 | 1 | 91 | 12 | 5 | 2025 | 05DEC2025 | 95.0 | High | 11 | 24107 | 0 |
proc append base=grid_backup
data=grid_load_clean force;
run;
proc print data=grid_backup;
run;
OUTPUT:
| Obs | Grid_Region | Demand_MW | Renewable_Share | Storage_Usage | Frequency_Deviation | Outage_Events | Stability_Index | Month | Day | Year | Grid_Date | Utilization_Percentage | Utilization_Class | Months_Since_Start | Next_Month | Fraud_Flag |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | North | 1200 | 45 | 30 | 0.02 | 1 | 85 | 1 | 15 | 2025 | 15JAN2025 | 60.0 | Low | 0 | 23773 | 0 |
| 2 | South | 1500 | 60 | 40 | 0.05 | 2 | 90 | 2 | 20 | 2025 | 20FEB2025 | 75.0 | Medium | 1 | 23801 | 0 |
| 3 | East | 1800 | 100 | 50 | 0.10 | 3 | 70 | 3 | 25 | 2025 | 25MAR2025 | 90.0 | High | 2 | 23832 | 0 |
| 4 | West | 900 | 30 | 20 | 0.50 | 5 | 60 | 4 | 10 | 2025 | 10APR2025 | 45.0 | Low | 3 | 23862 | 1 |
| 5 | Central | 1300 | 55 | 35 | 0.00 | 0 | 88 | 5 | 5 | 2025 | 05MAY2025 | 65.0 | Medium | 4 | 23893 | 0 |
| 6 | South | 1400 | 60 | 40 | 0.03 | 2 | 92 | 6 | 30 | 2025 | 30JUN2025 | 70.0 | Medium | 5 | 23923 | 0 |
| 7 | East | 2000 | 50 | 45 | 0.08 | 4 | 75 | 7 | 18 | 2025 | 18JUL2025 | 100.0 | High | 6 | 23954 | 0 |
| 8 | West | 950 | 35 | 25 | 0.02 | 1 | 89 | 8 | 22 | 2025 | 22AUG2025 | 47.5 | Low | 7 | 23985 | 0 |
| 9 | Central | 1600 | 65 | 50 | 0.01 | 0 | 95 | 9 | 12 | 2025 | 12SEP2025 | 80.0 | Medium | 8 | 24015 | 0 |
| 10 | North | 1700 | 48 | 38 | 0.03 | 2 | 82 | 10 | 28 | 2025 | 28OCT2025 | 85.0 | Medium | 9 | 24046 | 0 |
| 11 | South | 1800 | 70 | 55 | 0.09 | 3 | 78 | 11 | 15 | 2025 | 15NOV2025 | 90.0 | High | 10 | 24076 | 0 |
| 12 | East | 1900 | 52 | 42 | 0.04 | 1 | 91 | 12 | 5 | 2025 | 05DEC2025 | 95.0 | High | 11 | 24107 | 0 |
| 13 | North | 1200 | 45 | 30 | 0.02 | 1 | 85 | 1 | 15 | 2025 | 15JAN2025 | 60.0 | Low | 0 | 23773 | 0 |
| 14 | South | 1500 | 60 | 40 | 0.05 | 2 | 90 | 2 | 20 | 2025 | 20FEB2025 | 75.0 | Medium | 1 | 23801 | 0 |
| 15 | East | 1800 | 100 | 50 | 0.10 | 3 | 70 | 3 | 25 | 2025 | 25MAR2025 | 90.0 | High | 2 | 23832 | 0 |
| 16 | West | 900 | 30 | 20 | 0.50 | 5 | 60 | 4 | 10 | 2025 | 10APR2025 | 45.0 | Low | 3 | 23862 | 1 |
| 17 | Central | 1300 | 55 | 35 | 0.00 | 0 | 88 | 5 | 5 | 2025 | 05MAY2025 | 65.0 | Medium | 4 | 23893 | 0 |
| 18 | South | 1400 | 60 | 40 | 0.03 | 2 | 92 | 6 | 30 | 2025 | 30JUN2025 | 70.0 | Medium | 5 | 23923 | 0 |
| 19 | East | 2000 | 50 | 45 | 0.08 | 4 | 75 | 7 | 18 | 2025 | 18JUL2025 | 100.0 | High | 6 | 23954 | 0 |
| 20 | West | 950 | 35 | 25 | 0.02 | 1 | 89 | 8 | 22 | 2025 | 22AUG2025 | 47.5 | Low | 7 | 23985 | 0 |
| 21 | Central | 1600 | 65 | 50 | 0.01 | 0 | 95 | 9 | 12 | 2025 | 12SEP2025 | 80.0 | Medium | 8 | 24015 | 0 |
| 22 | North | 1700 | 48 | 38 | 0.03 | 2 | 82 | 10 | 28 | 2025 | 28OCT2025 | 85.0 | Medium | 9 | 24046 | 0 |
| 23 | South | 1800 | 70 | 55 | 0.09 | 3 | 78 | 11 | 15 | 2025 | 15NOV2025 | 90.0 | High | 10 | 24076 | 0 |
| 24 | East | 1900 | 52 | 42 | 0.04 | 1 | 91 | 12 | 5 | 2025 | 05DEC2025 | 95.0 | High | 11 | 24107 | 0 |
Merge Example
data region_capacity;
input Grid_Region $ Capacity_MW;
datalines;
North 2000
South 2200
East 2100
West 1500
Central 1800
;
run;
proc print data=region_capacity;
run;
OUTPUT:
| Obs | Grid_Region | Capacity_MW |
|---|---|---|
| 1 | North | 2000 |
| 2 | South | 2200 |
| 3 | East | 2100 |
| 4 | West | 1500 |
| 5 | Central | 1800 |
proc sort data=grid_load_clean; by Grid_Region; run;
proc print data=grid_load_clean;
run;
OUTPUT:
| Obs | Grid_Region | Demand_MW | Renewable_Share | Storage_Usage | Frequency_Deviation | Outage_Events | Stability_Index | Month | Day | Year | Grid_Date | Utilization_Percentage | Utilization_Class | Months_Since_Start | Next_Month | Fraud_Flag |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | Central | 1300 | 55 | 35 | 0.00 | 0 | 88 | 5 | 5 | 2025 | 05MAY2025 | 65.0 | Medium | 4 | 23893 | 0 |
| 2 | Central | 1600 | 65 | 50 | 0.01 | 0 | 95 | 9 | 12 | 2025 | 12SEP2025 | 80.0 | Medium | 8 | 24015 | 0 |
| 3 | East | 1800 | 100 | 50 | 0.10 | 3 | 70 | 3 | 25 | 2025 | 25MAR2025 | 90.0 | High | 2 | 23832 | 0 |
| 4 | East | 2000 | 50 | 45 | 0.08 | 4 | 75 | 7 | 18 | 2025 | 18JUL2025 | 100.0 | High | 6 | 23954 | 0 |
| 5 | East | 1900 | 52 | 42 | 0.04 | 1 | 91 | 12 | 5 | 2025 | 05DEC2025 | 95.0 | High | 11 | 24107 | 0 |
| 6 | North | 1200 | 45 | 30 | 0.02 | 1 | 85 | 1 | 15 | 2025 | 15JAN2025 | 60.0 | Low | 0 | 23773 | 0 |
| 7 | North | 1700 | 48 | 38 | 0.03 | 2 | 82 | 10 | 28 | 2025 | 28OCT2025 | 85.0 | Medium | 9 | 24046 | 0 |
| 8 | South | 1500 | 60 | 40 | 0.05 | 2 | 90 | 2 | 20 | 2025 | 20FEB2025 | 75.0 | Medium | 1 | 23801 | 0 |
| 9 | South | 1400 | 60 | 40 | 0.03 | 2 | 92 | 6 | 30 | 2025 | 30JUN2025 | 70.0 | Medium | 5 | 23923 | 0 |
| 10 | South | 1800 | 70 | 55 | 0.09 | 3 | 78 | 11 | 15 | 2025 | 15NOV2025 | 90.0 | High | 10 | 24076 | 0 |
| 11 | West | 900 | 30 | 20 | 0.50 | 5 | 60 | 4 | 10 | 2025 | 10APR2025 | 45.0 | Low | 3 | 23862 | 1 |
| 12 | West | 950 | 35 | 25 | 0.02 | 1 | 89 | 8 | 22 | 2025 | 22AUG2025 | 47.5 | Low | 7 | 23985 | 0 |
proc sort data=region_capacity; by Grid_Region; run;
proc print data=region_capacity;
run;
OUTPUT:
| Obs | Grid_Region | Capacity_MW |
|---|---|---|
| 1 | Central | 1800 |
| 2 | East | 2100 |
| 3 | North | 2000 |
| 4 | South | 2200 |
| 5 | West | 1500 |
data merged_grid;
merge grid_load_clean region_capacity;
by Grid_Region;
run;
proc print data=merged_grid;
run;
OUTPUT:
| Obs | Grid_Region | Demand_MW | Renewable_Share | Storage_Usage | Frequency_Deviation | Outage_Events | Stability_Index | Month | Day | Year | Grid_Date | Utilization_Percentage | Utilization_Class | Months_Since_Start | Next_Month | Fraud_Flag | Capacity_MW |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | Central | 1300 | 55 | 35 | 0.00 | 0 | 88 | 5 | 5 | 2025 | 05MAY2025 | 65.0 | Medium | 4 | 23893 | 0 | 1800 |
| 2 | Central | 1600 | 65 | 50 | 0.01 | 0 | 95 | 9 | 12 | 2025 | 12SEP2025 | 80.0 | Medium | 8 | 24015 | 0 | 1800 |
| 3 | East | 1800 | 100 | 50 | 0.10 | 3 | 70 | 3 | 25 | 2025 | 25MAR2025 | 90.0 | High | 2 | 23832 | 0 | 2100 |
| 4 | East | 2000 | 50 | 45 | 0.08 | 4 | 75 | 7 | 18 | 2025 | 18JUL2025 | 100.0 | High | 6 | 23954 | 0 | 2100 |
| 5 | East | 1900 | 52 | 42 | 0.04 | 1 | 91 | 12 | 5 | 2025 | 05DEC2025 | 95.0 | High | 11 | 24107 | 0 | 2100 |
| 6 | North | 1200 | 45 | 30 | 0.02 | 1 | 85 | 1 | 15 | 2025 | 15JAN2025 | 60.0 | Low | 0 | 23773 | 0 | 2000 |
| 7 | North | 1700 | 48 | 38 | 0.03 | 2 | 82 | 10 | 28 | 2025 | 28OCT2025 | 85.0 | Medium | 9 | 24046 | 0 | 2000 |
| 8 | South | 1500 | 60 | 40 | 0.05 | 2 | 90 | 2 | 20 | 2025 | 20FEB2025 | 75.0 | Medium | 1 | 23801 | 0 | 2200 |
| 9 | South | 1400 | 60 | 40 | 0.03 | 2 | 92 | 6 | 30 | 2025 | 30JUN2025 | 70.0 | Medium | 5 | 23923 | 0 | 2200 |
| 10 | South | 1800 | 70 | 55 | 0.09 | 3 | 78 | 11 | 15 | 2025 | 15NOV2025 | 90.0 | High | 10 | 24076 | 0 | 2200 |
| 11 | West | 900 | 30 | 20 | 0.50 | 5 | 60 | 4 | 10 | 2025 | 10APR2025 | 45.0 | Low | 3 | 23862 | 1 | 1500 |
| 12 | West | 950 | 35 | 25 | 0.02 | 1 | 89 | 8 | 22 | 2025 | 22AUG2025 | 47.5 | Low | 7 | 23985 | 0 | 1500 |
5. Transpose
proc transpose data=merged_grid out=transposed_grid;
by Grid_Region;
var Demand_MW Renewable_Share;
run;
proc print data=transposed_grid;
run;
OUTPUT:
| Obs | Grid_Region | _NAME_ | COL1 | COL2 | COL3 |
|---|---|---|---|---|---|
| 1 | Central | Demand_MW | 1300 | 1600 | . |
| 2 | Central | Renewable_Share | 55 | 65 | . |
| 3 | East | Demand_MW | 1800 | 2000 | 1900 |
| 4 | East | Renewable_Share | 100 | 50 | 52 |
| 5 | North | Demand_MW | 1200 | 1700 | . |
| 6 | North | Renewable_Share | 45 | 48 | . |
| 7 | South | Demand_MW | 1500 | 1400 | 1800 |
| 8 | South | Renewable_Share | 60 | 60 | 70 |
| 9 | West | Demand_MW | 900 | 950 | . |
| 10 | West | Renewable_Share | 30 | 35 | . |
6. Rolling 3-Month Stability Trend
proc sort data=grid_load_clean;
by Grid_Region Grid_Date;
run;
proc print data=grid_load_clean;
run;
OUTPUT:
| Obs | Grid_Region | Demand_MW | Renewable_Share | Storage_Usage | Frequency_Deviation | Outage_Events | Stability_Index | Month | Day | Year | Grid_Date | Utilization_Percentage | Utilization_Class | Months_Since_Start | Next_Month | Fraud_Flag |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | Central | 1300 | 55 | 35 | 0.00 | 0 | 88 | 5 | 5 | 2025 | 05MAY2025 | 65.0 | Medium | 4 | 23893 | 0 |
| 2 | Central | 1600 | 65 | 50 | 0.01 | 0 | 95 | 9 | 12 | 2025 | 12SEP2025 | 80.0 | Medium | 8 | 24015 | 0 |
| 3 | East | 1800 | 100 | 50 | 0.10 | 3 | 70 | 3 | 25 | 2025 | 25MAR2025 | 90.0 | High | 2 | 23832 | 0 |
| 4 | East | 2000 | 50 | 45 | 0.08 | 4 | 75 | 7 | 18 | 2025 | 18JUL2025 | 100.0 | High | 6 | 23954 | 0 |
| 5 | East | 1900 | 52 | 42 | 0.04 | 1 | 91 | 12 | 5 | 2025 | 05DEC2025 | 95.0 | High | 11 | 24107 | 0 |
| 6 | North | 1200 | 45 | 30 | 0.02 | 1 | 85 | 1 | 15 | 2025 | 15JAN2025 | 60.0 | Low | 0 | 23773 | 0 |
| 7 | North | 1700 | 48 | 38 | 0.03 | 2 | 82 | 10 | 28 | 2025 | 28OCT2025 | 85.0 | Medium | 9 | 24046 | 0 |
| 8 | South | 1500 | 60 | 40 | 0.05 | 2 | 90 | 2 | 20 | 2025 | 20FEB2025 | 75.0 | Medium | 1 | 23801 | 0 |
| 9 | South | 1400 | 60 | 40 | 0.03 | 2 | 92 | 6 | 30 | 2025 | 30JUN2025 | 70.0 | Medium | 5 | 23923 | 0 |
| 10 | South | 1800 | 70 | 55 | 0.09 | 3 | 78 | 11 | 15 | 2025 | 15NOV2025 | 90.0 | High | 10 | 24076 | 0 |
| 11 | West | 900 | 30 | 20 | 0.50 | 5 | 60 | 4 | 10 | 2025 | 10APR2025 | 45.0 | Low | 3 | 23862 | 1 |
| 12 | West | 950 | 35 | 25 | 0.02 | 1 | 89 | 8 | 22 | 2025 | 22AUG2025 | 47.5 | Low | 7 | 23985 | 0 |
data rolling_stability;
set grid_load_clean;
by Grid_Region;
retain stab_lag1 stab_lag2;
Rolling_3M_Stability = mean(Stability_Index, stab_lag1, stab_lag2);
stab_lag2 = stab_lag1;
stab_lag1 = Stability_Index;
run;
proc print data=rolling_stability;
run;
OUTPUT:
| Obs | Grid_Region | Demand_MW | Renewable_Share | Storage_Usage | Frequency_Deviation | Outage_Events | Stability_Index | Month | Day | Year | Grid_Date | Utilization_Percentage | Utilization_Class | Months_Since_Start | Next_Month | Fraud_Flag | stab_lag1 | stab_lag2 | Rolling_3M_Stability |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | Central | 1300 | 55 | 35 | 0.00 | 0 | 88 | 5 | 5 | 2025 | 05MAY2025 | 65.0 | Medium | 4 | 23893 | 0 | 88 | . | 88.0000 |
| 2 | Central | 1600 | 65 | 50 | 0.01 | 0 | 95 | 9 | 12 | 2025 | 12SEP2025 | 80.0 | Medium | 8 | 24015 | 0 | 95 | 88 | 91.5000 |
| 3 | East | 1800 | 100 | 50 | 0.10 | 3 | 70 | 3 | 25 | 2025 | 25MAR2025 | 90.0 | High | 2 | 23832 | 0 | 70 | 95 | 84.3333 |
| 4 | East | 2000 | 50 | 45 | 0.08 | 4 | 75 | 7 | 18 | 2025 | 18JUL2025 | 100.0 | High | 6 | 23954 | 0 | 75 | 70 | 80.0000 |
| 5 | East | 1900 | 52 | 42 | 0.04 | 1 | 91 | 12 | 5 | 2025 | 05DEC2025 | 95.0 | High | 11 | 24107 | 0 | 91 | 75 | 78.6667 |
| 6 | North | 1200 | 45 | 30 | 0.02 | 1 | 85 | 1 | 15 | 2025 | 15JAN2025 | 60.0 | Low | 0 | 23773 | 0 | 85 | 91 | 83.6667 |
| 7 | North | 1700 | 48 | 38 | 0.03 | 2 | 82 | 10 | 28 | 2025 | 28OCT2025 | 85.0 | Medium | 9 | 24046 | 0 | 82 | 85 | 86.0000 |
| 8 | South | 1500 | 60 | 40 | 0.05 | 2 | 90 | 2 | 20 | 2025 | 20FEB2025 | 75.0 | Medium | 1 | 23801 | 0 | 90 | 82 | 85.6667 |
| 9 | South | 1400 | 60 | 40 | 0.03 | 2 | 92 | 6 | 30 | 2025 | 30JUN2025 | 70.0 | Medium | 5 | 23923 | 0 | 92 | 90 | 88.0000 |
| 10 | South | 1800 | 70 | 55 | 0.09 | 3 | 78 | 11 | 15 | 2025 | 15NOV2025 | 90.0 | High | 10 | 24076 | 0 | 78 | 92 | 86.6667 |
| 11 | West | 900 | 30 | 20 | 0.50 | 5 | 60 | 4 | 10 | 2025 | 10APR2025 | 45.0 | Low | 3 | 23862 | 1 | 60 | 78 | 76.6667 |
| 12 | West | 950 | 35 | 25 | 0.02 | 1 | 89 | 8 | 22 | 2025 | 22AUG2025 | 47.5 | Low | 7 | 23985 | 0 | 89 | 60 | 75.6667 |
·
RETAIN preserves prior values
·
Computes moving average
·
Detects gradual deterioration in grid
stability
7. Advanced Fraud Risk Scoring
data fraud_scoring;
set grid_load_clean;
Fraud_Score = 0;
Fraud_Score + (Renewable_Share > 95) * 20;
Fraud_Score + (Frequency_Deviation > 0.3) * 30;
Fraud_Score + (Outage_Events > 4) * 25;
Fraud_Score + (Stability_Index < 65) * 25;
length Fraud_Level $12.;
if Fraud_Score >= 60 then Fraud_Level="High Risk";
else if Fraud_Score >= 30 then Fraud_Level="Medium Risk";
else Fraud_Level="Low Risk";
run;
proc print data=fraud_scoring;
run;
OUTPUT:
| Obs | Grid_Region | Demand_MW | Renewable_Share | Storage_Usage | Frequency_Deviation | Outage_Events | Stability_Index | Month | Day | Year | Grid_Date | Utilization_Percentage | Utilization_Class | Months_Since_Start | Next_Month | Fraud_Flag | Fraud_Score | Fraud_Level |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | Central | 1300 | 55 | 35 | 0.00 | 0 | 88 | 5 | 5 | 2025 | 05MAY2025 | 65.0 | Medium | 4 | 23893 | 0 | 0 | Low Risk |
| 2 | Central | 1600 | 65 | 50 | 0.01 | 0 | 95 | 9 | 12 | 2025 | 12SEP2025 | 80.0 | Medium | 8 | 24015 | 0 | 0 | Low Risk |
| 3 | East | 1800 | 100 | 50 | 0.10 | 3 | 70 | 3 | 25 | 2025 | 25MAR2025 | 90.0 | High | 2 | 23832 | 0 | 20 | Low Risk |
| 4 | East | 2000 | 50 | 45 | 0.08 | 4 | 75 | 7 | 18 | 2025 | 18JUL2025 | 100.0 | High | 6 | 23954 | 0 | 0 | Low Risk |
| 5 | East | 1900 | 52 | 42 | 0.04 | 1 | 91 | 12 | 5 | 2025 | 05DEC2025 | 95.0 | High | 11 | 24107 | 0 | 0 | Low Risk |
| 6 | North | 1200 | 45 | 30 | 0.02 | 1 | 85 | 1 | 15 | 2025 | 15JAN2025 | 60.0 | Low | 0 | 23773 | 0 | 0 | Low Risk |
| 7 | North | 1700 | 48 | 38 | 0.03 | 2 | 82 | 10 | 28 | 2025 | 28OCT2025 | 85.0 | Medium | 9 | 24046 | 0 | 0 | Low Risk |
| 8 | South | 1500 | 60 | 40 | 0.05 | 2 | 90 | 2 | 20 | 2025 | 20FEB2025 | 75.0 | Medium | 1 | 23801 | 0 | 0 | Low Risk |
| 9 | South | 1400 | 60 | 40 | 0.03 | 2 | 92 | 6 | 30 | 2025 | 30JUN2025 | 70.0 | Medium | 5 | 23923 | 0 | 0 | Low Risk |
| 10 | South | 1800 | 70 | 55 | 0.09 | 3 | 78 | 11 | 15 | 2025 | 15NOV2025 | 90.0 | High | 10 | 24076 | 0 | 0 | Low Risk |
| 11 | West | 900 | 30 | 20 | 0.50 | 5 | 60 | 4 | 10 | 2025 | 10APR2025 | 45.0 | Low | 3 | 23862 | 1 | 80 | High Risk |
| 12 | West | 950 | 35 | 25 | 0.02 | 1 | 89 | 8 | 22 | 2025 | 22AUG2025 | 47.5 | Low | 7 | 23985 | 0 | 0 | Low Risk |
·
Weighted scoring is more realistic
·
Helps prioritize investigations
8. Duplicate Detection Logic
proc sort data=grid_load_clean out=dup_check nodupkey dupout=duplicates;
by Grid_Region Grid_Date;
run;
proc print data=dup_check;
run;
OUTPUT:
| Obs | Grid_Region | Demand_MW | Renewable_Share | Storage_Usage | Frequency_Deviation | Outage_Events | Stability_Index | Month | Day | Year | Grid_Date | Utilization_Percentage | Utilization_Class | Months_Since_Start | Next_Month | Fraud_Flag |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | Central | 1300 | 55 | 35 | 0.00 | 0 | 88 | 5 | 5 | 2025 | 05MAY2025 | 65.0 | Medium | 4 | 23893 | 0 |
| 2 | Central | 1600 | 65 | 50 | 0.01 | 0 | 95 | 9 | 12 | 2025 | 12SEP2025 | 80.0 | Medium | 8 | 24015 | 0 |
| 3 | East | 1800 | 100 | 50 | 0.10 | 3 | 70 | 3 | 25 | 2025 | 25MAR2025 | 90.0 | High | 2 | 23832 | 0 |
| 4 | East | 2000 | 50 | 45 | 0.08 | 4 | 75 | 7 | 18 | 2025 | 18JUL2025 | 100.0 | High | 6 | 23954 | 0 |
| 5 | East | 1900 | 52 | 42 | 0.04 | 1 | 91 | 12 | 5 | 2025 | 05DEC2025 | 95.0 | High | 11 | 24107 | 0 |
| 6 | North | 1200 | 45 | 30 | 0.02 | 1 | 85 | 1 | 15 | 2025 | 15JAN2025 | 60.0 | Low | 0 | 23773 | 0 |
| 7 | North | 1700 | 48 | 38 | 0.03 | 2 | 82 | 10 | 28 | 2025 | 28OCT2025 | 85.0 | Medium | 9 | 24046 | 0 |
| 8 | South | 1500 | 60 | 40 | 0.05 | 2 | 90 | 2 | 20 | 2025 | 20FEB2025 | 75.0 | Medium | 1 | 23801 | 0 |
| 9 | South | 1400 | 60 | 40 | 0.03 | 2 | 92 | 6 | 30 | 2025 | 30JUN2025 | 70.0 | Medium | 5 | 23923 | 0 |
| 10 | South | 1800 | 70 | 55 | 0.09 | 3 | 78 | 11 | 15 | 2025 | 15NOV2025 | 90.0 | High | 10 | 24076 | 0 |
| 11 | West | 900 | 30 | 20 | 0.50 | 5 | 60 | 4 | 10 | 2025 | 10APR2025 | 45.0 | Low | 3 | 23862 | 1 |
| 12 | West | 950 | 35 | 25 | 0.02 | 1 | 89 | 8 | 22 | 2025 | 22AUG2025 | 47.5 | Low | 7 | 23985 | 0 |
·
NODUPKEY removes duplicates
·
DUPOUT captures suspicious duplicates
·
Important after PROC APPEND operations
9. Data Validation Macro
%macro validate_data(dataset);
proc means data=&dataset n nmiss min max;
var Demand_MW Renewable_Share Frequency_Deviation Stability_Index;
run;
proc freq data=&dataset;
tables Utilization_Class Fraud_Flag / missing;
run;
%mend;
%validate_data(grid_load_clean);
OUTPUT:
The MEANS Procedure
| Variable | N | N Miss | Minimum | Maximum |
|---|---|---|---|---|
Demand_MW Renewable_Share Frequency_Deviation Stability_Index | 12 12 12 12 | 0 0 0 0 | 900.0000000 30.0000000 0 60.0000000 | 2000.00 100.0000000 0.5000000 95.0000000 |
The FREQ Procedure
| Utilization_Class | Frequency | Percent | Cumulative Frequency | Cumulative Percent |
|---|---|---|---|---|
| High | 4 | 33.33 | 4 | 33.33 |
| Low | 3 | 25.00 | 7 | 58.33 |
| Medium | 5 | 41.67 | 12 | 100.00 |
| Fraud_Flag | Frequency | Percent | Cumulative Frequency | Cumulative Percent |
|---|---|---|---|---|
| 0 | 11 | 91.67 | 11 | 91.67 |
| 1 | 1 | 8.33 | 12 | 100.00 |
·
Quick QC summary
·
Detects missing values
·
Validates ranges
10. Independent QC Comparison Program
proc compare base=grid_load_clean
compare=fraud_scoring
out=compare_results
outnoequal;
id Grid_Region Grid_Date;
run;
OUTPUT:
The COMPARE Procedure
Comparison of WORK.GRID_LOAD_CLEAN with WORK.FRAUD_SCORING
(Method=EXACT)
Data Set Summary
Dataset Created Modified NVar NObs
WORK.GRID_LOAD_CLEAN 28FEB26:00:41:29 28FEB26:00:41:29 16 12
WORK.FRAUD_SCORING 28FEB26:00:41:45 28FEB26:00:41:45 18 12
Variables Summary
Number of Variables in Common: 16.
Number of Variables in WORK.FRAUD_SCORING but not in WORK.GRID_LOAD_CLEAN: 2.
Number of ID Variables: 2.
Observation Summary
Observation Base Compare ID
First Obs 1 1 Grid_Region=Central Grid_Date=05MAY2025
Last Obs 12 12 Grid_Region=West Grid_Date=22AUG2025
Number of Observations in Common: 12.
Total Number of Observations Read from WORK.GRID_LOAD_CLEAN: 12.
Total Number of Observations Read from WORK.FRAUD_SCORING: 12.
Number of Observations with Some Compared Variables Unequal: 0.
Number of Observations with All Compared Variables Equal: 12.
NOTE: No unequal values were found. All values compared are exactly equal.
·
Standard industry QC practice
· Detects derivation mismatches
11. Z-Score Outlier Detection
proc means data=grid_load_clean mean std noprint;
var Demand_MW;
output out=stats mean=mean_d std=std_d;
run;
proc print data=stats;
run;
OUTPUT:
| Obs | _TYPE_ | _FREQ_ | mean_d | std_d |
|---|---|---|---|---|
| 1 | 0 | 12 | 1504.17 | 362.101 |
data zscore_check;
if _n_=1 then set stats;
set grid_load_clean;
Z_Demand = (Demand_MW - mean_d) / std_d;
if abs(Z_Demand) > 1 then Outlier_Flag="Y";
else Outlier_Flag="N";
run;
proc print data=zscore_check;
run;
OUTPUT:
| Obs | _TYPE_ | _FREQ_ | mean_d | std_d | Grid_Region | Demand_MW | Renewable_Share | Storage_Usage | Frequency_Deviation | Outage_Events | Stability_Index | Month | Day | Year | Grid_Date | Utilization_Percentage | Utilization_Class | Months_Since_Start | Next_Month | Fraud_Flag | Z_Demand | Outlier_Flag |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | 0 | 12 | 1504.17 | 362.101 | Central | 1300 | 55 | 35 | 0.00 | 0 | 88 | 5 | 5 | 2025 | 05MAY2025 | 65.0 | Medium | 4 | 23893 | 0 | -0.56384 | N |
| 2 | 0 | 12 | 1504.17 | 362.101 | Central | 1600 | 65 | 50 | 0.01 | 0 | 95 | 9 | 12 | 2025 | 12SEP2025 | 80.0 | Medium | 8 | 24015 | 0 | 0.26466 | N |
| 3 | 0 | 12 | 1504.17 | 362.101 | East | 1800 | 100 | 50 | 0.10 | 3 | 70 | 3 | 25 | 2025 | 25MAR2025 | 90.0 | High | 2 | 23832 | 0 | 0.81699 | N |
| 4 | 0 | 12 | 1504.17 | 362.101 | East | 2000 | 50 | 45 | 0.08 | 4 | 75 | 7 | 18 | 2025 | 18JUL2025 | 100.0 | High | 6 | 23954 | 0 | 1.36932 | Y |
| 5 | 0 | 12 | 1504.17 | 362.101 | East | 1900 | 52 | 42 | 0.04 | 1 | 91 | 12 | 5 | 2025 | 05DEC2025 | 95.0 | High | 11 | 24107 | 0 | 1.09316 | Y |
| 6 | 0 | 12 | 1504.17 | 362.101 | North | 1200 | 45 | 30 | 0.02 | 1 | 85 | 1 | 15 | 2025 | 15JAN2025 | 60.0 | Low | 0 | 23773 | 0 | -0.84000 | N |
| 7 | 0 | 12 | 1504.17 | 362.101 | North | 1700 | 48 | 38 | 0.03 | 2 | 82 | 10 | 28 | 2025 | 28OCT2025 | 85.0 | Medium | 9 | 24046 | 0 | 0.54082 | N |
| 8 | 0 | 12 | 1504.17 | 362.101 | South | 1500 | 60 | 40 | 0.05 | 2 | 90 | 2 | 20 | 2025 | 20FEB2025 | 75.0 | Medium | 1 | 23801 | 0 | -0.01151 | N |
| 9 | 0 | 12 | 1504.17 | 362.101 | South | 1400 | 60 | 40 | 0.03 | 2 | 92 | 6 | 30 | 2025 | 30JUN2025 | 70.0 | Medium | 5 | 23923 | 0 | -0.28767 | N |
| 10 | 0 | 12 | 1504.17 | 362.101 | South | 1800 | 70 | 55 | 0.09 | 3 | 78 | 11 | 15 | 2025 | 15NOV2025 | 90.0 | High | 10 | 24076 | 0 | 0.81699 | N |
| 11 | 0 | 12 | 1504.17 | 362.101 | West | 900 | 30 | 20 | 0.50 | 5 | 60 | 4 | 10 | 2025 | 10APR2025 | 45.0 | Low | 3 | 23862 | 1 | -1.66850 | Y |
| 12 | 0 | 12 | 1504.17 | 362.101 | West | 950 | 35 | 25 | 0.02 | 1 | 89 | 8 | 22 | 2025 | 22AUG2025 | 47.5 | Low | 7 | 23985 | 0 | -1.53042 | Y |
·
Identifies statistical anomalies
·
Important for fraud analytics
12. Monthly Aggregation Dataset
proc sql;
create table monthly_summary as
select Grid_Region,
intnx('month', Grid_Date, 0, 'b') format=date9. as Month_Start,
mean(Demand_MW) as Avg_Demand,
mean(Stability_Index) as Avg_Stability,
sum(Outage_Events) as Total_Outages
from grid_load_clean
group by Grid_Region, calculated Month_Start;
quit;
proc print data=monthly_summary;
run;
OUTPUT:
| Obs | Grid_Region | Month_Start | Avg_Demand | Avg_Stability | Total_Outages |
|---|---|---|---|---|---|
| 1 | Central | 01MAY2025 | 1300 | 88 | 0 |
| 2 | Central | 01SEP2025 | 1600 | 95 | 0 |
| 3 | East | 01MAR2025 | 1800 | 70 | 3 |
| 4 | East | 01JUL2025 | 2000 | 75 | 4 |
| 5 | East | 01DEC2025 | 1900 | 91 | 1 |
| 6 | North | 01JAN2025 | 1200 | 85 | 1 |
| 7 | North | 01OCT2025 | 1700 | 82 | 2 |
| 8 | South | 01FEB2025 | 1500 | 90 | 2 |
| 9 | South | 01JUN2025 | 1400 | 92 | 2 |
| 10 | South | 01NOV2025 | 1800 | 78 | 3 |
| 11 | West | 01APR2025 | 900 | 60 | 5 |
| 12 | West | 01AUG2025 | 950 | 89 | 1 |
·
Supports management dashboards
·
Groups data by month
·
Uses INTNX with beginning option
13. Risk Categorization Layer
data risk_profile;
set fraud_scoring;
Risk_Index = mean(Fraud_Score, Utilization_Percentage);
length Risk_Category $12.;
if Risk_Index > 80 then Risk_Category="Critical";
else if Risk_Index > 60 then Risk_Category="Severe";
else if Risk_Index > 40 then Risk_Category="Moderate";
else Risk_Category="Stable";
run;
proc print data=risk_profile;
run;
OUTPUT:
| Obs | Grid_Region | Demand_MW | Renewable_Share | Storage_Usage | Frequency_Deviation | Outage_Events | Stability_Index | Month | Day | Year | Grid_Date | Utilization_Percentage | Utilization_Class | Months_Since_Start | Next_Month | Fraud_Flag | Fraud_Score | Fraud_Level | Risk_Index | Risk_Category |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | Central | 1300 | 55 | 35 | 0.00 | 0 | 88 | 5 | 5 | 2025 | 05MAY2025 | 65.0 | Medium | 4 | 23893 | 0 | 0 | Low Risk | 32.50 | Stable |
| 2 | Central | 1600 | 65 | 50 | 0.01 | 0 | 95 | 9 | 12 | 2025 | 12SEP2025 | 80.0 | Medium | 8 | 24015 | 0 | 0 | Low Risk | 40.00 | Stable |
| 3 | East | 1800 | 100 | 50 | 0.10 | 3 | 70 | 3 | 25 | 2025 | 25MAR2025 | 90.0 | High | 2 | 23832 | 0 | 20 | Low Risk | 55.00 | Moderate |
| 4 | East | 2000 | 50 | 45 | 0.08 | 4 | 75 | 7 | 18 | 2025 | 18JUL2025 | 100.0 | High | 6 | 23954 | 0 | 0 | Low Risk | 50.00 | Moderate |
| 5 | East | 1900 | 52 | 42 | 0.04 | 1 | 91 | 12 | 5 | 2025 | 05DEC2025 | 95.0 | High | 11 | 24107 | 0 | 0 | Low Risk | 47.50 | Moderate |
| 6 | North | 1200 | 45 | 30 | 0.02 | 1 | 85 | 1 | 15 | 2025 | 15JAN2025 | 60.0 | Low | 0 | 23773 | 0 | 0 | Low Risk | 30.00 | Stable |
| 7 | North | 1700 | 48 | 38 | 0.03 | 2 | 82 | 10 | 28 | 2025 | 28OCT2025 | 85.0 | Medium | 9 | 24046 | 0 | 0 | Low Risk | 42.50 | Moderate |
| 8 | South | 1500 | 60 | 40 | 0.05 | 2 | 90 | 2 | 20 | 2025 | 20FEB2025 | 75.0 | Medium | 1 | 23801 | 0 | 0 | Low Risk | 37.50 | Stable |
| 9 | South | 1400 | 60 | 40 | 0.03 | 2 | 92 | 6 | 30 | 2025 | 30JUN2025 | 70.0 | Medium | 5 | 23923 | 0 | 0 | Low Risk | 35.00 | Stable |
| 10 | South | 1800 | 70 | 55 | 0.09 | 3 | 78 | 11 | 15 | 2025 | 15NOV2025 | 90.0 | High | 10 | 24076 | 0 | 0 | Low Risk | 45.00 | Moderate |
| 11 | West | 900 | 30 | 20 | 0.50 | 5 | 60 | 4 | 10 | 2025 | 10APR2025 | 45.0 | Low | 3 | 23862 | 1 | 80 | High Risk | 62.50 | Severe |
| 12 | West | 950 | 35 | 25 | 0.02 | 1 | 89 | 8 | 22 | 2025 | 22AUG2025 | 47.5 | Low | 7 | 23985 | 0 | 0 | Low Risk | 23.75 | Stable |
14. Dynamic Reporting Macro
%macro region_report(region_name);
proc report data=grid_load_clean nowd;
where Grid_Region="®ion_name";
column Grid_Date Demand_MW Stability_Index Fraud_Flag;
define Grid_Date / display;
define Demand_MW / analysis mean;
define Stability_Index / analysis mean;
run;
%mend;
%region_report(North);
OUTPUT:
| Grid_Date | Demand_MW | Stability_Index | Fraud_Flag |
|---|---|---|---|
| 15JAN2025 | 1200 | 85 | 0 |
| 28OCT2025 | 1700 | 82 | 0 |
·
Generates region-specific reporting
·
Parameterized macro
·
Reusable across regions
15. Audit Trail Dataset
data audit_trail;
length Action $50 User $20 Timestamp 8;
format Timestamp datetime20.;
Action="Fraud Score Generated";
User="SYSTEM";
Timestamp=datetime();
output;
run;
proc print data=audit_trail;
run;
OUTPUT:
| Obs | Action | User | Timestamp |
|---|---|---|---|
| 1 | Fraud Score Generated | SYSTEM | 28FEB2026:06:13:53 |
·
Maintains audit trace
·
Important in regulatory environments
16. Character Function Advanced Usage
data character_demo;
set grid_load_clean;
Region_Upper = upcase(Grid_Region);
Region_Lower = lowcase(Grid_Region);
Region_Label = catx(" - ", propcase(Grid_Region), put(Year,4.));
Region_Clean = strip(trim(Grid_Region));
run;
proc print data=character_demo;
run;
OUTPUT:
| Obs | Grid_Region | Demand_MW | Renewable_Share | Storage_Usage | Frequency_Deviation | Outage_Events | Stability_Index | Month | Day | Year | Grid_Date | Utilization_Percentage | Utilization_Class | Months_Since_Start | Next_Month | Fraud_Flag | Region_Upper | Region_Lower | Region_Label | Region_Clean |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | Central | 1300 | 55 | 35 | 0.00 | 0 | 88 | 5 | 5 | 2025 | 05MAY2025 | 65.0 | Medium | 4 | 23893 | 0 | CENTRAL | central | Central - 2025 | Central |
| 2 | Central | 1600 | 65 | 50 | 0.01 | 0 | 95 | 9 | 12 | 2025 | 12SEP2025 | 80.0 | Medium | 8 | 24015 | 0 | CENTRAL | central | Central - 2025 | Central |
| 3 | East | 1800 | 100 | 50 | 0.10 | 3 | 70 | 3 | 25 | 2025 | 25MAR2025 | 90.0 | High | 2 | 23832 | 0 | EAST | east | East - 2025 | East |
| 4 | East | 2000 | 50 | 45 | 0.08 | 4 | 75 | 7 | 18 | 2025 | 18JUL2025 | 100.0 | High | 6 | 23954 | 0 | EAST | east | East - 2025 | East |
| 5 | East | 1900 | 52 | 42 | 0.04 | 1 | 91 | 12 | 5 | 2025 | 05DEC2025 | 95.0 | High | 11 | 24107 | 0 | EAST | east | East - 2025 | East |
| 6 | North | 1200 | 45 | 30 | 0.02 | 1 | 85 | 1 | 15 | 2025 | 15JAN2025 | 60.0 | Low | 0 | 23773 | 0 | NORTH | north | North - 2025 | North |
| 7 | North | 1700 | 48 | 38 | 0.03 | 2 | 82 | 10 | 28 | 2025 | 28OCT2025 | 85.0 | Medium | 9 | 24046 | 0 | NORTH | north | North - 2025 | North |
| 8 | South | 1500 | 60 | 40 | 0.05 | 2 | 90 | 2 | 20 | 2025 | 20FEB2025 | 75.0 | Medium | 1 | 23801 | 0 | SOUTH | south | South - 2025 | South |
| 9 | South | 1400 | 60 | 40 | 0.03 | 2 | 92 | 6 | 30 | 2025 | 30JUN2025 | 70.0 | Medium | 5 | 23923 | 0 | SOUTH | south | South - 2025 | South |
| 10 | South | 1800 | 70 | 55 | 0.09 | 3 | 78 | 11 | 15 | 2025 | 15NOV2025 | 90.0 | High | 10 | 24076 | 0 | SOUTH | south | South - 2025 | South |
| 11 | West | 900 | 30 | 20 | 0.50 | 5 | 60 | 4 | 10 | 2025 | 10APR2025 | 45.0 | Low | 3 | 23862 | 1 | WEST | west | West - 2025 | West |
| 12 | West | 950 | 35 | 25 | 0.02 | 1 | 89 | 8 | 22 | 2025 | 22AUG2025 | 47.5 | Low | 7 | 23985 | 0 | WEST | west | West - 2025 | West |
17. Numeric Function Enhancements
data numeric_demo;
set grid_load_clean;
Demand_Squared = Demand_MW**2;
Demand_Log = log(Demand_MW);
Demand_Root = sqrt(Demand_MW);
Demand_Rounded = round(Demand_MW,10);
run;
proc print data=numeric_demo;
run;
OUTPUT:
| Obs | Grid_Region | Demand_MW | Renewable_Share | Storage_Usage | Frequency_Deviation | Outage_Events | Stability_Index | Month | Day | Year | Grid_Date | Utilization_Percentage | Utilization_Class | Months_Since_Start | Next_Month | Fraud_Flag | Demand_Squared | Demand_Log | Demand_Root | Demand_Rounded |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | Central | 1300 | 55 | 35 | 0.00 | 0 | 88 | 5 | 5 | 2025 | 05MAY2025 | 65.0 | Medium | 4 | 23893 | 0 | 1690000 | 7.17012 | 36.0555 | 1300 |
| 2 | Central | 1600 | 65 | 50 | 0.01 | 0 | 95 | 9 | 12 | 2025 | 12SEP2025 | 80.0 | Medium | 8 | 24015 | 0 | 2560000 | 7.37776 | 40.0000 | 1600 |
| 3 | East | 1800 | 100 | 50 | 0.10 | 3 | 70 | 3 | 25 | 2025 | 25MAR2025 | 90.0 | High | 2 | 23832 | 0 | 3240000 | 7.49554 | 42.4264 | 1800 |
| 4 | East | 2000 | 50 | 45 | 0.08 | 4 | 75 | 7 | 18 | 2025 | 18JUL2025 | 100.0 | High | 6 | 23954 | 0 | 4000000 | 7.60090 | 44.7214 | 2000 |
| 5 | East | 1900 | 52 | 42 | 0.04 | 1 | 91 | 12 | 5 | 2025 | 05DEC2025 | 95.0 | High | 11 | 24107 | 0 | 3610000 | 7.54961 | 43.5890 | 1900 |
| 6 | North | 1200 | 45 | 30 | 0.02 | 1 | 85 | 1 | 15 | 2025 | 15JAN2025 | 60.0 | Low | 0 | 23773 | 0 | 1440000 | 7.09008 | 34.6410 | 1200 |
| 7 | North | 1700 | 48 | 38 | 0.03 | 2 | 82 | 10 | 28 | 2025 | 28OCT2025 | 85.0 | Medium | 9 | 24046 | 0 | 2890000 | 7.43838 | 41.2311 | 1700 |
| 8 | South | 1500 | 60 | 40 | 0.05 | 2 | 90 | 2 | 20 | 2025 | 20FEB2025 | 75.0 | Medium | 1 | 23801 | 0 | 2250000 | 7.31322 | 38.7298 | 1500 |
| 9 | South | 1400 | 60 | 40 | 0.03 | 2 | 92 | 6 | 30 | 2025 | 30JUN2025 | 70.0 | Medium | 5 | 23923 | 0 | 1960000 | 7.24423 | 37.4166 | 1400 |
| 10 | South | 1800 | 70 | 55 | 0.09 | 3 | 78 | 11 | 15 | 2025 | 15NOV2025 | 90.0 | High | 10 | 24076 | 0 | 3240000 | 7.49554 | 42.4264 | 1800 |
| 11 | West | 900 | 30 | 20 | 0.50 | 5 | 60 | 4 | 10 | 2025 | 10APR2025 | 45.0 | Low | 3 | 23862 | 1 | 810000 | 6.80239 | 30.0000 | 900 |
| 12 | West | 950 | 35 | 25 | 0.02 | 1 | 89 | 8 | 22 | 2025 | 22AUG2025 | 47.5 | Low | 7 | 23985 | 0 | 902500 | 6.85646 | 30.8221 | 950 |
18. Final Corrected Dataset Code
data final_grid_system;
set merged_grid;
Region_Code = catx("-",substr(Grid_Region,1,3),Year);
Adjusted_Stability = mean(Stability_Index,100-Frequency_Deviation*100);
Performance_Score = sum(Utilization_Percentage,
Renewable_Share,
Adjusted_Stability);
if Fraud_Flag=1 then Status="Under Review";
else Status="Stable";
run;
proc print data=final_grid_system;
run;
OUTPUT:
| Obs | Grid_Region | Demand_MW | Renewable_Share | Storage_Usage | Frequency_Deviation | Outage_Events | Stability_Index | Month | Day | Year | Grid_Date | Utilization_Percentage | Utilization_Class | Months_Since_Start | Next_Month | Fraud_Flag | Capacity_MW | Region_Code | Adjusted_Stability | Performance_Score | Status |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | Central | 1300 | 55 | 35 | 0.00 | 0 | 88 | 5 | 5 | 2025 | 05MAY2025 | 65.0 | Medium | 4 | 23893 | 0 | 1800 | Cen-2025 | 94.0 | 214.0 | Stable |
| 2 | Central | 1600 | 65 | 50 | 0.01 | 0 | 95 | 9 | 12 | 2025 | 12SEP2025 | 80.0 | Medium | 8 | 24015 | 0 | 1800 | Cen-2025 | 97.0 | 242.0 | Stable |
| 3 | East | 1800 | 100 | 50 | 0.10 | 3 | 70 | 3 | 25 | 2025 | 25MAR2025 | 90.0 | High | 2 | 23832 | 0 | 2100 | Eas-2025 | 80.0 | 270.0 | Stable |
| 4 | East | 2000 | 50 | 45 | 0.08 | 4 | 75 | 7 | 18 | 2025 | 18JUL2025 | 100.0 | High | 6 | 23954 | 0 | 2100 | Eas-2025 | 83.5 | 233.5 | Stable |
| 5 | East | 1900 | 52 | 42 | 0.04 | 1 | 91 | 12 | 5 | 2025 | 05DEC2025 | 95.0 | High | 11 | 24107 | 0 | 2100 | Eas-2025 | 93.5 | 240.5 | Stable |
| 6 | North | 1200 | 45 | 30 | 0.02 | 1 | 85 | 1 | 15 | 2025 | 15JAN2025 | 60.0 | Low | 0 | 23773 | 0 | 2000 | Nor-2025 | 91.5 | 196.5 | Stable |
| 7 | North | 1700 | 48 | 38 | 0.03 | 2 | 82 | 10 | 28 | 2025 | 28OCT2025 | 85.0 | Medium | 9 | 24046 | 0 | 2000 | Nor-2025 | 89.5 | 222.5 | Stable |
| 8 | South | 1500 | 60 | 40 | 0.05 | 2 | 90 | 2 | 20 | 2025 | 20FEB2025 | 75.0 | Medium | 1 | 23801 | 0 | 2200 | Sou-2025 | 92.5 | 227.5 | Stable |
| 9 | South | 1400 | 60 | 40 | 0.03 | 2 | 92 | 6 | 30 | 2025 | 30JUN2025 | 70.0 | Medium | 5 | 23923 | 0 | 2200 | Sou-2025 | 94.5 | 224.5 | Stable |
| 10 | South | 1800 | 70 | 55 | 0.09 | 3 | 78 | 11 | 15 | 2025 | 15NOV2025 | 90.0 | High | 10 | 24076 | 0 | 2200 | Sou-2025 | 84.5 | 244.5 | Stable |
| 11 | West | 900 | 30 | 20 | 0.50 | 5 | 60 | 4 | 10 | 2025 | 10APR2025 | 45.0 | Low | 3 | 23862 | 1 | 1500 | Wes-2025 | 55.0 | 130.0 | Under Review |
| 12 | West | 950 | 35 | 25 | 0.02 | 1 | 89 | 8 | 22 | 2025 | 22AUG2025 | 47.5 | Low | 7 | 23985 | 0 | 1500 | Wes-2025 | 93.5 | 176.0 | Stable |
19. Delete Dataset
proc datasets library=work;
delete grid_backup;
quit;
LOG:
Intentional Error
Explanation
|
Error |
Why
Wrong |
Fix |
|
Month=13 |
Invalid
date |
Delete
record |
|
Renewable
110 |
Impossible
% |
Capped
to 100 |
|
Missing
frequency |
Null
issue |
Set to
0 |
|
June 31 |
Invalid
day |
Adjusted |
|
Duplicate
append |
Data
inflation |
Delete
backup |
🔤 Character Functions
·
strip()
·
trim()
·
cat()
·
catx()
·
propcase()
·
upcase()
·
lowcase()
·
coalesce()
🔢 Numeric Functions
·
round()
·
min()
·
max()
·
abs()
·
sum()
·
mean()
Key Learning
Points
·
Data cleaning is essential.
·
Date validation critical.
·
Macros improve reusability.
·
Character normalization prevents merge
errors.
·
Fraud detection requires logical
thresholds.
·
Append may cause duplication.
·
Transpose useful for reporting.
·
Numeric functions improve calculations.
Conclusion
Here we designed, debugged, and built a
fraud-detection-enabled Smart Grid Load Balancing system using SAS.
· Created
realistic dataset
·
Introduced errors intentionally
·
Corrected them
·
Applied macro logic
·
Used advanced date functions
·
Applied character & numeric transformations
·
Used SET, MERGE, APPEND, TRANSPOSE
·
Managed datasets using PROC DATASETS
·
Used RETAIN for rolling analytics
· Implemented
weighted fraud scoring
· Built
validation macro
· Performed
independent QC using PROC COMPARE
· Applied
statistical outlier detection
· Created
reusable reporting macros
· Maintained
audit trace dataset
This mirrors real-world energy analytics projects where grid stability is
monitored daily and fraud detection safeguards reporting integrity.
SAS INTERVIEW QUESTIONS
1. 1.What is PROC MEANS used for in SAS?
2. 2.What is the difference between SET and MERGE statements?
3. 3.Why are macros used in SAS?
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
About the Author:
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
About the Author:
SAS Learning Hub is a data analytics and SAS programming platform focused on clinical, financial, and real-world data analysis. The content is created by professionals with academic training in Pharmaceutics and hands-on experience in Base SAS, PROC SQL, Macros, SDTM, and ADaM, providing practical and industry-relevant SAS learning resources.
Disclaimer:
The datasets and analysis in this article are created for educational and demonstration purposes only. They do not represent MONITORING DEMAND data.
Our Mission:
This blog provides industry-focused SAS programming tutorials and analytics projects covering finance, healthcare, and technology.
This project is suitable for:
· Students learning SAS
· Data analysts building portfolios
· Professionals preparing for SAS interviews
· Bloggers writing about analytics
· Clinical SAS Programmer
· Research Data Analyst
· Regulatory Data Validator
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Comments
Post a Comment