414.Can We Design, Debug, and Detect Fraud in a Smart Grid Load Balancing System Using Advanced SAS Programming Techniques?

Designing, Debugging, and Detecting Fraud in a Smart Grid Load Balancing System Using Advanced SAS

--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

HERE IN THIS PROJECT WE USED THESE SAS STATEMENTS —DATA STEP | INPUT | DATALINES | IF-THEN-ELSE | OUTPUT | DELETE | SET | MERGE | PROC SORT | PROC APPEND | PROC TRANSPOSE | PROC MEANS | PROC FREQ | PROC SQL | PROC REPORT | PROC COMPARE | %MACRO | %MEND 

--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Introduction

In today’s energy-driven world, smart grid systems play a critical role in balancing electricity demand and supply. Grid load balancing ensures stable power distribution across regions by monitoring demand (MW), renewable contribution, storage usage, system frequency deviation, and outage events. However, real-world datasets often contain intentional or unintentional errors such as invalid date formats, incorrect merges, misclassified utilization levels, inconsistent character formats, and fraudulent manipulations in reporting, validation layers, QC comparisons, advanced macro automation, rolling metrics, anomaly scoring, and audit-ready reporting structure.

In this project, we will:

  • Create a Grid Load Balancing dataset
  • Introduce intentional errors
  • Detect and correct those errors
  • Use SAS macros for fraud detection
  • Apply date functions (MDY, INTCK, INTNX)
  • Use SET, MERGE, APPEND, TRANSPOSE
  • Apply numeric and character functions
  • Use PROC DATASETS DELETE
  • Provide full explanation of every step

Table of Contents

  1. Business Context
  2. Dataset Design
  3. Intentional Errors
  4. Full Corrected SAS Code
  5. Macro Fraud Detection Logic
  6. Date Handling (MDY, INTCK, INTNX)
  7. Data Manipulation (SET, MERGE, APPEND, TRANSPOSE)
  8. Character Functions
  9. Numeric Functions
  10. Classification Logic
  11. Error Debugging Section
  12. Rolling 3-Month Stability Trend
  13. Advanced Fraud Scoring Model
  14.  Duplicate Record Detection
  15. Data Validation Macro
  16. QC Comparison Program
  17. Risk Categorization Layer
  18. Outlier Detection Using Z-Score
  19. Monthly Aggregation Dataset
  20. Dynamic Reporting Macro
  21. Audit Trail Dataset
  22. Conclusion

Business Context

Electric grid operators monitor:

  • Demand in MW
  • Renewable percentage
  • Battery storage usage
  • Frequency stability (50 Hz standard in India)
  • Outage counts
  • Stability Index (calculated score)

Fraud or manipulation may occur when:

  • Renewable share is inflated
  • Storage usage misreported
  • Frequency deviation hidden
  • Dates manipulated
  • Duplicate records appended

Build system → Introduce errors → Detect → Correct → Final clean dataset.


1. Create Raw Dataset (With Intentional Errors)

data grid_load_raw;

input Grid_Region $ Demand_MW Renewable_Share Storage_Usage Frequency_Deviation 

      Outage_Events Stability_Index Month Day Year;

format Grid_Date date9.;

Grid_Date = mdy(Month, Day, Year);

Utilization_Percentage = Demand_MW / 1000 * 100;

datalines;

north 1200 45 30 0.02 1 85 01 15 2025

south 1500 60 40 0.05 2 90 02 20 2025

east 1800 110 50 0.10 3 70 03 25 2025

west 900 30 20 0.50 5 60 04 10 2025

central 1300 55 35 . 0 88 05 05 2025

north 1250 45 30 0.02 1 85 13 15 2025

south 1400 60 40 0.03 2 92 06 31 2025

east 2000 50 45 0.08 4 75 07 18 2025

west 950 35 25 0.02 1 89 08 22 2025

central 1600 65 50 0.01 0 95 09 12 2025

north 1700 48 38 0.03 2 82 10 28 2025

south 1800 70 55 0.09 3 78 11 15 2025

east 1900 52 42 0.04 1 91 12 05 2025

;

run;

proc print data=grid_load_raw;

run;

OUTPUT:

ObsGrid_RegionDemand_MWRenewable_ShareStorage_UsageFrequency_DeviationOutage_EventsStability_IndexMonthDayYearGrid_DateUtilization_Percentage
1north120045300.02185115202515JAN2025120
2south150060400.05290220202520FEB2025150
3east1800110500.10370325202525MAR2025180
4west90030200.50560410202510APR202590
5central13005535.08855202505MAY2025130
6north125045300.0218513152025.125
7south140060400.032926312025.140
8east200050450.08475718202518JUL2025200
9west95035250.02189822202522AUG202595
10central160065500.01095912202512SEP2025160
11north170048380.032821028202528OCT2025170
12south180070550.093781115202515NOV2025180
13east190052420.04191125202505DEC2025190

 Intentional Errors

·  Renewable_Share = 110 (Impossible)

·  Frequency_Deviation missing

·  Month = 13 (Invalid)

·  June 31 (Invalid date)

·  Utilization formula simplistic

·  No classification

·  Lowercase region inconsistencies

2. Clean & Correct Dataset

data grid_load_clean;

set grid_load_raw;

length Utilization_Class $12.;

Grid_Region = propcase(strip(Grid_Region));

Renewable_Share = min(Renewable_Share,100);

if missing(Frequency_Deviation) then Frequency_Deviation = 0;

if Month > 12 then delete;

if Month=6 and Day>30 then Day=30;

Grid_Date = mdy(Month, Day, Year);

Months_Since_Start = intck('month','01JAN2025'd,Grid_Date);

Next_Month = intnx('month',Grid_Date,1);

Utilization_Percentage = round((Demand_MW/2000)*100,0.1);

if Utilization_Percentage > 85 then Utilization_Class="High";

else if Utilization_Percentage > 60 then Utilization_Class="Medium";

else Utilization_Class="Low";

Fraud_Flag = 0;

if Renewable_Share > 95 and Stability_Index < 70 then Fraud_Flag=1;

if Frequency_Deviation > 0.3 then Fraud_Flag=1;

if Outage_Events > 4 and Stability_Index > 90 then Fraud_Flag=1;

run;

proc print data=grid_load_clean;

run;

OUTPUT:

ObsGrid_RegionDemand_MWRenewable_ShareStorage_UsageFrequency_DeviationOutage_EventsStability_IndexMonthDayYearGrid_DateUtilization_PercentageUtilization_ClassMonths_Since_StartNext_MonthFraud_Flag
1North120045300.02185115202515JAN202560.0Low0237730
2South150060400.05290220202520FEB202575.0Medium1238010
3East1800100500.10370325202525MAR202590.0High2238320
4West90030200.50560410202510APR202545.0Low3238621
5Central130055350.0008855202505MAY202565.0Medium4238930
6South140060400.03292630202530JUN202570.0Medium5239230
7East200050450.08475718202518JUL2025100.0High6239540
8West95035250.02189822202522AUG202547.5Low7239850
9Central160065500.01095912202512SEP202580.0Medium8240150
10North170048380.032821028202528OCT202585.0Medium9240460
11South180070550.093781115202515NOV202590.0High10240760
12East190052420.04191125202505DEC202595.0High11241070

set grid_load_raw;

Reads raw dataset.

propcase(strip())

Removes spaces and standardizes case.

min(Renewable_Share,100);

Caps impossible values.

missing()

Handles null frequency.

intck

Counts months difference.

intnx

Calculates next month date.

Classification logic

Categorizes grid load.

3. SeparateFraud Detection Macro

%macro fraud_check(dataset);

data fraud_flagged;

set &dataset;

Fraud_Flag = 0;

if Renewable_Share > 95 and Stability_Index < 70 then Fraud_Flag=1;

if Frequency_Deviation > 0.3 then Fraud_Flag=1;

if Outage_Events > 4 and Stability_Index > 90 then Fraud_Flag=1;

run;

proc print data=fraud_flagged;

run;

%mend;


%fraud_check(grid_load_clean);

OUTPUT:

ObsGrid_RegionDemand_MWRenewable_ShareStorage_UsageFrequency_DeviationOutage_EventsStability_IndexMonthDayYearGrid_DateUtilization_PercentageUtilization_ClassMonths_Since_StartNext_MonthFraud_Flag
1North120045300.02185115202515JAN202560.0Low0237730
2South150060400.05290220202520FEB202575.0Medium1238010
3East1800100500.10370325202525MAR202590.0High2238320
4West90030200.50560410202510APR202545.0Low3238621
5Central130055350.0008855202505MAY202565.0Medium4238930
6South140060400.03292630202530JUN202570.0Medium5239230
7East200050450.08475718202518JUL2025100.0High6239540
8West95035250.02189822202522AUG202547.5Low7239850
9Central160065500.01095912202512SEP202580.0Medium8240150
10North170048380.032821028202528OCT202585.0Medium9240460
11South180070550.093781115202515NOV202590.0High10240760
12East190052420.04191125202505DEC202595.0High11241070

·  Reusable logic

·  Scalable to multiple datasets

·  Centralized fraud rules

4. Append & Merge

data grid_backup;

set grid_load_clean;

run;

proc print data=grid_backup;

run;

OUTPUT:

ObsGrid_RegionDemand_MWRenewable_ShareStorage_UsageFrequency_DeviationOutage_EventsStability_IndexMonthDayYearGrid_DateUtilization_PercentageUtilization_ClassMonths_Since_StartNext_MonthFraud_Flag
1North120045300.02185115202515JAN202560.0Low0237730
2South150060400.05290220202520FEB202575.0Medium1238010
3East1800100500.10370325202525MAR202590.0High2238320
4West90030200.50560410202510APR202545.0Low3238621
5Central130055350.0008855202505MAY202565.0Medium4238930
6South140060400.03292630202530JUN202570.0Medium5239230
7East200050450.08475718202518JUL2025100.0High6239540
8West95035250.02189822202522AUG202547.5Low7239850
9Central160065500.01095912202512SEP202580.0Medium8240150
10North170048380.032821028202528OCT202585.0Medium9240460
11South180070550.093781115202515NOV202590.0High10240760
12East190052420.04191125202505DEC202595.0High11241070

proc append base=grid_backup

            data=grid_load_clean force;

run;

proc print data=grid_backup;

run;

OUTPUT:

ObsGrid_RegionDemand_MWRenewable_ShareStorage_UsageFrequency_DeviationOutage_EventsStability_IndexMonthDayYearGrid_DateUtilization_PercentageUtilization_ClassMonths_Since_StartNext_MonthFraud_Flag
1North120045300.02185115202515JAN202560.0Low0237730
2South150060400.05290220202520FEB202575.0Medium1238010
3East1800100500.10370325202525MAR202590.0High2238320
4West90030200.50560410202510APR202545.0Low3238621
5Central130055350.0008855202505MAY202565.0Medium4238930
6South140060400.03292630202530JUN202570.0Medium5239230
7East200050450.08475718202518JUL2025100.0High6239540
8West95035250.02189822202522AUG202547.5Low7239850
9Central160065500.01095912202512SEP202580.0Medium8240150
10North170048380.032821028202528OCT202585.0Medium9240460
11South180070550.093781115202515NOV202590.0High10240760
12East190052420.04191125202505DEC202595.0High11241070
13North120045300.02185115202515JAN202560.0Low0237730
14South150060400.05290220202520FEB202575.0Medium1238010
15East1800100500.10370325202525MAR202590.0High2238320
16West90030200.50560410202510APR202545.0Low3238621
17Central130055350.0008855202505MAY202565.0Medium4238930
18South140060400.03292630202530JUN202570.0Medium5239230
19East200050450.08475718202518JUL2025100.0High6239540
20West95035250.02189822202522AUG202547.5Low7239850
21Central160065500.01095912202512SEP202580.0Medium8240150
22North170048380.032821028202528OCT202585.0Medium9240460
23South180070550.093781115202515NOV202590.0High10240760
24East190052420.04191125202505DEC202595.0High11241070
·  Demonstrates duplication risk.

Merge Example

data region_capacity;

input Grid_Region $ Capacity_MW;

datalines;

North 2000

South 2200

East 2100

West 1500

Central 1800

;

run;

proc print data=region_capacity;

run;

OUTPUT:

ObsGrid_RegionCapacity_MW
1North2000
2South2200
3East2100
4West1500
5Central1800

proc sort data=grid_load_clean; by Grid_Region; run;

proc print data=grid_load_clean;

run;

OUTPUT:

ObsGrid_RegionDemand_MWRenewable_ShareStorage_UsageFrequency_DeviationOutage_EventsStability_IndexMonthDayYearGrid_DateUtilization_PercentageUtilization_ClassMonths_Since_StartNext_MonthFraud_Flag
1Central130055350.0008855202505MAY202565.0Medium4238930
2Central160065500.01095912202512SEP202580.0Medium8240150
3East1800100500.10370325202525MAR202590.0High2238320
4East200050450.08475718202518JUL2025100.0High6239540
5East190052420.04191125202505DEC202595.0High11241070
6North120045300.02185115202515JAN202560.0Low0237730
7North170048380.032821028202528OCT202585.0Medium9240460
8South150060400.05290220202520FEB202575.0Medium1238010
9South140060400.03292630202530JUN202570.0Medium5239230
10South180070550.093781115202515NOV202590.0High10240760
11West90030200.50560410202510APR202545.0Low3238621
12West95035250.02189822202522AUG202547.5Low7239850

proc sort data=region_capacity; by Grid_Region; run;

proc print data=region_capacity;

run;

OUTPUT:

ObsGrid_RegionCapacity_MW
1Central1800
2East2100
3North2000
4South2200
5West1500

data merged_grid;

merge grid_load_clean region_capacity;

by Grid_Region;

run;

proc print data=merged_grid;

run;

OUTPUT:

ObsGrid_RegionDemand_MWRenewable_ShareStorage_UsageFrequency_DeviationOutage_EventsStability_IndexMonthDayYearGrid_DateUtilization_PercentageUtilization_ClassMonths_Since_StartNext_MonthFraud_FlagCapacity_MW
1Central130055350.0008855202505MAY202565.0Medium42389301800
2Central160065500.01095912202512SEP202580.0Medium82401501800
3East1800100500.10370325202525MAR202590.0High22383202100
4East200050450.08475718202518JUL2025100.0High62395402100
5East190052420.04191125202505DEC202595.0High112410702100
6North120045300.02185115202515JAN202560.0Low02377302000
7North170048380.032821028202528OCT202585.0Medium92404602000
8South150060400.05290220202520FEB202575.0Medium12380102200
9South140060400.03292630202530JUN202570.0Medium52392302200
10South180070550.093781115202515NOV202590.0High102407602200
11West90030200.50560410202510APR202545.0Low32386211500
12West95035250.02189822202522AUG202547.5Low72398501500

5. Transpose

proc transpose data=merged_grid out=transposed_grid;

by Grid_Region;

var Demand_MW Renewable_Share;

run;

proc print data=transposed_grid;

run;

OUTPUT:

ObsGrid_Region_NAME_COL1COL2COL3
1CentralDemand_MW13001600.
2CentralRenewable_Share5565.
3EastDemand_MW180020001900
4EastRenewable_Share1005052
5NorthDemand_MW12001700.
6NorthRenewable_Share4548.
7SouthDemand_MW150014001800
8SouthRenewable_Share606070
9WestDemand_MW900950.
10WestRenewable_Share3035.

6. Rolling 3-Month Stability Trend

proc sort data=grid_load_clean;

by Grid_Region Grid_Date;

run;

proc print data=grid_load_clean;

run;

OUTPUT:

ObsGrid_RegionDemand_MWRenewable_ShareStorage_UsageFrequency_DeviationOutage_EventsStability_IndexMonthDayYearGrid_DateUtilization_PercentageUtilization_ClassMonths_Since_StartNext_MonthFraud_Flag
1Central130055350.0008855202505MAY202565.0Medium4238930
2Central160065500.01095912202512SEP202580.0Medium8240150
3East1800100500.10370325202525MAR202590.0High2238320
4East200050450.08475718202518JUL2025100.0High6239540
5East190052420.04191125202505DEC202595.0High11241070
6North120045300.02185115202515JAN202560.0Low0237730
7North170048380.032821028202528OCT202585.0Medium9240460
8South150060400.05290220202520FEB202575.0Medium1238010
9South140060400.03292630202530JUN202570.0Medium5239230
10South180070550.093781115202515NOV202590.0High10240760
11West90030200.50560410202510APR202545.0Low3238621
12West95035250.02189822202522AUG202547.5Low7239850

data rolling_stability;

set grid_load_clean;

by Grid_Region;

retain stab_lag1 stab_lag2;

Rolling_3M_Stability = mean(Stability_Index, stab_lag1, stab_lag2);

stab_lag2 = stab_lag1;

stab_lag1 = Stability_Index;

run;

proc print data=rolling_stability;

run;

OUTPUT:

ObsGrid_RegionDemand_MWRenewable_ShareStorage_UsageFrequency_DeviationOutage_EventsStability_IndexMonthDayYearGrid_DateUtilization_PercentageUtilization_ClassMonths_Since_StartNext_MonthFraud_Flagstab_lag1stab_lag2Rolling_3M_Stability
1Central130055350.0008855202505MAY202565.0Medium423893088.88.0000
2Central160065500.01095912202512SEP202580.0Medium8240150958891.5000
3East1800100500.10370325202525MAR202590.0High2238320709584.3333
4East200050450.08475718202518JUL2025100.0High6239540757080.0000
5East190052420.04191125202505DEC202595.0High11241070917578.6667
6North120045300.02185115202515JAN202560.0Low0237730859183.6667
7North170048380.032821028202528OCT202585.0Medium9240460828586.0000
8South150060400.05290220202520FEB202575.0Medium1238010908285.6667
9South140060400.03292630202530JUN202570.0Medium5239230929088.0000
10South180070550.093781115202515NOV202590.0High10240760789286.6667
11West90030200.50560410202510APR202545.0Low3238621607876.6667
12West95035250.02189822202522AUG202547.5Low7239850896075.6667

·  RETAIN preserves prior values

·  Computes moving average

·  Detects gradual deterioration in grid stability

7. Advanced Fraud Risk Scoring

data fraud_scoring;

set grid_load_clean;

Fraud_Score = 0;

Fraud_Score + (Renewable_Share > 95) * 20;

Fraud_Score + (Frequency_Deviation > 0.3) * 30;

Fraud_Score + (Outage_Events > 4) * 25;

Fraud_Score + (Stability_Index < 65) * 25;

length Fraud_Level $12.;

if Fraud_Score >= 60 then Fraud_Level="High Risk";

else if Fraud_Score >= 30 then Fraud_Level="Medium Risk";

else Fraud_Level="Low Risk";

run;

proc print data=fraud_scoring;

run;

OUTPUT:

ObsGrid_RegionDemand_MWRenewable_ShareStorage_UsageFrequency_DeviationOutage_EventsStability_IndexMonthDayYearGrid_DateUtilization_PercentageUtilization_ClassMonths_Since_StartNext_MonthFraud_FlagFraud_ScoreFraud_Level
1Central130055350.0008855202505MAY202565.0Medium42389300Low Risk
2Central160065500.01095912202512SEP202580.0Medium82401500Low Risk
3East1800100500.10370325202525MAR202590.0High223832020Low Risk
4East200050450.08475718202518JUL2025100.0High62395400Low Risk
5East190052420.04191125202505DEC202595.0High112410700Low Risk
6North120045300.02185115202515JAN202560.0Low02377300Low Risk
7North170048380.032821028202528OCT202585.0Medium92404600Low Risk
8South150060400.05290220202520FEB202575.0Medium12380100Low Risk
9South140060400.03292630202530JUN202570.0Medium52392300Low Risk
10South180070550.093781115202515NOV202590.0High102407600Low Risk
11West90030200.50560410202510APR202545.0Low323862180High Risk
12West95035250.02189822202522AUG202547.5Low72398500Low Risk

·  Weighted scoring is more realistic

·  Helps prioritize investigations

8. Duplicate Detection Logic

proc sort data=grid_load_clean out=dup_check nodupkey dupout=duplicates;

by Grid_Region Grid_Date;

run;

proc print data=dup_check;

run;

OUTPUT:

ObsGrid_RegionDemand_MWRenewable_ShareStorage_UsageFrequency_DeviationOutage_EventsStability_IndexMonthDayYearGrid_DateUtilization_PercentageUtilization_ClassMonths_Since_StartNext_MonthFraud_Flag
1Central130055350.0008855202505MAY202565.0Medium4238930
2Central160065500.01095912202512SEP202580.0Medium8240150
3East1800100500.10370325202525MAR202590.0High2238320
4East200050450.08475718202518JUL2025100.0High6239540
5East190052420.04191125202505DEC202595.0High11241070
6North120045300.02185115202515JAN202560.0Low0237730
7North170048380.032821028202528OCT202585.0Medium9240460
8South150060400.05290220202520FEB202575.0Medium1238010
9South140060400.03292630202530JUN202570.0Medium5239230
10South180070550.093781115202515NOV202590.0High10240760
11West90030200.50560410202510APR202545.0Low3238621
12West95035250.02189822202522AUG202547.5Low7239850

·  NODUPKEY removes duplicates

·  DUPOUT captures suspicious duplicates

·  Important after PROC APPEND operations

9. Data Validation Macro

%macro validate_data(dataset);

proc means data=&dataset n nmiss min max;

var Demand_MW Renewable_Share Frequency_Deviation Stability_Index;

run;


proc freq data=&dataset;

tables Utilization_Class Fraud_Flag / missing;

run;

%mend;


%validate_data(grid_load_clean);

OUTPUT:

The MEANS Procedure

VariableNN MissMinimumMaximum
Demand_MW
Renewable_Share
Frequency_Deviation
Stability_Index
12
12
12
12
0
0
0
0
900.0000000
30.0000000
0
60.0000000
2000.00
100.0000000
0.5000000
95.0000000

The FREQ Procedure

Utilization_ClassFrequencyPercentCumulative
Frequency
Cumulative
Percent
High433.33433.33
Low325.00758.33
Medium541.6712100.00
Fraud_FlagFrequencyPercentCumulative
Frequency
Cumulative
Percent
01191.671191.67
118.3312100.00

·  Quick QC summary

·  Detects missing values

·  Validates ranges

10. Independent QC Comparison Program

proc compare base=grid_load_clean

             compare=fraud_scoring

             out=compare_results

             outnoequal;

id Grid_Region Grid_Date;

run;

OUTPUT:

The COMPARE Procedure                                                                                                               
Comparison of WORK.GRID_LOAD_CLEAN with WORK.FRAUD_SCORING                                                                          
(Method=EXACT)                                                                                                                      
                                                                                                                                    
Data Set Summary                                                                                                                    
                                                                                                                                    
Dataset                        Created          Modified  NVar    NObs                                                              
                                                                                                                                    
WORK.GRID_LOAD_CLEAN  28FEB26:00:41:29  28FEB26:00:41:29    16      12                                                              
WORK.FRAUD_SCORING    28FEB26:00:41:45  28FEB26:00:41:45    18      12                                                              
                                                                                                                                    
                                                                                                                                    
Variables Summary                                                                                                                   
                                                                                                                                    
Number of Variables in Common: 16.                                                                                                  
Number of Variables in WORK.FRAUD_SCORING but not in WORK.GRID_LOAD_CLEAN: 2.                                                       
Number of ID Variables: 2.                                                                                                          
                                                                                                                                    
                                                                                                                                    
Observation Summary                                                                                                                 
                                                                                                                                    
Observation      Base  Compare  ID                                                                                                  
                                                                                                                                    
First Obs           1        1  Grid_Region=Central Grid_Date=05MAY2025                                                             
Last  Obs          12       12  Grid_Region=West Grid_Date=22AUG2025                                                                
                                                                                                                                    
Number of Observations in Common: 12.                                                                                               
Total Number of Observations Read from WORK.GRID_LOAD_CLEAN: 12.                                                                    
Total Number of Observations Read from WORK.FRAUD_SCORING: 12.                                                                      
                                                                                                                                    
Number of Observations with Some Compared Variables Unequal: 0.                                                                     
Number of Observations with All Compared Variables Equal: 12.                                                                       
                                                                                                                                    
NOTE: No unequal values were found. All values compared are exactly equal.                                                          
                                                                                                                                    

·  Standard industry QC practice

·  Detects derivation mismatches

11. Z-Score Outlier Detection

proc means data=grid_load_clean mean std noprint;

var Demand_MW;

output out=stats mean=mean_d std=std_d;

run;

proc print data=stats;

run;

OUTPUT:

Obs_TYPE__FREQ_mean_dstd_d
10121504.17362.101

data zscore_check;

if _n_=1 then set stats;

set grid_load_clean;

Z_Demand = (Demand_MW - mean_d) / std_d;

if abs(Z_Demand) > 1 then Outlier_Flag="Y";

else Outlier_Flag="N";

run;

proc print data=zscore_check;

run;

OUTPUT:

Obs_TYPE__FREQ_mean_dstd_dGrid_RegionDemand_MWRenewable_ShareStorage_UsageFrequency_DeviationOutage_EventsStability_IndexMonthDayYearGrid_DateUtilization_PercentageUtilization_ClassMonths_Since_StartNext_MonthFraud_FlagZ_DemandOutlier_Flag
10121504.17362.101Central130055350.0008855202505MAY202565.0Medium4238930-0.56384N
20121504.17362.101Central160065500.01095912202512SEP202580.0Medium82401500.26466N
30121504.17362.101East1800100500.10370325202525MAR202590.0High22383200.81699N
40121504.17362.101East200050450.08475718202518JUL2025100.0High62395401.36932Y
50121504.17362.101East190052420.04191125202505DEC202595.0High112410701.09316Y
60121504.17362.101North120045300.02185115202515JAN202560.0Low0237730-0.84000N
70121504.17362.101North170048380.032821028202528OCT202585.0Medium92404600.54082N
80121504.17362.101South150060400.05290220202520FEB202575.0Medium1238010-0.01151N
90121504.17362.101South140060400.03292630202530JUN202570.0Medium5239230-0.28767N
100121504.17362.101South180070550.093781115202515NOV202590.0High102407600.81699N
110121504.17362.101West90030200.50560410202510APR202545.0Low3238621-1.66850Y
120121504.17362.101West95035250.02189822202522AUG202547.5Low7239850-1.53042Y

·  Identifies statistical anomalies

·  Important for fraud analytics

12. Monthly Aggregation Dataset

proc sql;

create table monthly_summary as

select Grid_Region,

       intnx('month', Grid_Date, 0, 'b') format=date9. as Month_Start,

       mean(Demand_MW) as Avg_Demand,

       mean(Stability_Index) as Avg_Stability,

       sum(Outage_Events) as Total_Outages

from grid_load_clean

group by Grid_Region, calculated Month_Start;

quit;

proc print data=monthly_summary;

run;

OUTPUT:

ObsGrid_RegionMonth_StartAvg_DemandAvg_StabilityTotal_Outages
1Central01MAY20251300880
2Central01SEP20251600950
3East01MAR20251800703
4East01JUL20252000754
5East01DEC20251900911
6North01JAN20251200851
7North01OCT20251700822
8South01FEB20251500902
9South01JUN20251400922
10South01NOV20251800783
11West01APR2025900605
12West01AUG2025950891

·  Supports management dashboards

·  Groups data by month

·  Uses INTNX with beginning option

13. Risk Categorization Layer

data risk_profile;

set fraud_scoring;

Risk_Index = mean(Fraud_Score, Utilization_Percentage);

length Risk_Category $12.;

if Risk_Index > 80 then Risk_Category="Critical";

else if Risk_Index > 60 then Risk_Category="Severe";

else if Risk_Index > 40 then Risk_Category="Moderate";

else Risk_Category="Stable";

run;

proc print data=risk_profile;

run;

OUTPUT:

ObsGrid_RegionDemand_MWRenewable_ShareStorage_UsageFrequency_DeviationOutage_EventsStability_IndexMonthDayYearGrid_DateUtilization_PercentageUtilization_ClassMonths_Since_StartNext_MonthFraud_FlagFraud_ScoreFraud_LevelRisk_IndexRisk_Category
1Central130055350.0008855202505MAY202565.0Medium42389300Low Risk32.50Stable
2Central160065500.01095912202512SEP202580.0Medium82401500Low Risk40.00Stable
3East1800100500.10370325202525MAR202590.0High223832020Low Risk55.00Moderate
4East200050450.08475718202518JUL2025100.0High62395400Low Risk50.00Moderate
5East190052420.04191125202505DEC202595.0High112410700Low Risk47.50Moderate
6North120045300.02185115202515JAN202560.0Low02377300Low Risk30.00Stable
7North170048380.032821028202528OCT202585.0Medium92404600Low Risk42.50Moderate
8South150060400.05290220202520FEB202575.0Medium12380100Low Risk37.50Stable
9South140060400.03292630202530JUN202570.0Medium52392300Low Risk35.00Stable
10South180070550.093781115202515NOV202590.0High102407600Low Risk45.00Moderate
11West90030200.50560410202510APR202545.0Low323862180High Risk62.50Severe
12West95035250.02189822202522AUG202547.5Low72398500Low Risk23.75Stable

14. Dynamic Reporting Macro

%macro region_report(region_name);

proc report data=grid_load_clean nowd;

where Grid_Region="&region_name";

column Grid_Date Demand_MW Stability_Index Fraud_Flag;

define Grid_Date / display;

define Demand_MW / analysis mean;

define Stability_Index / analysis mean;

run;

%mend;


%region_report(North);

OUTPUT:

Grid_DateDemand_MWStability_IndexFraud_Flag
15JAN20251200850
28OCT20251700820

·  Generates region-specific reporting

·  Parameterized macro

·  Reusable across regions

15. Audit Trail Dataset

data audit_trail;

length Action $50 User $20 Timestamp 8;

format Timestamp datetime20.;

Action="Fraud Score Generated";

User="SYSTEM";

Timestamp=datetime();

output;

run;

proc print data=audit_trail;

run;

OUTPUT:

ObsActionUserTimestamp
1Fraud Score GeneratedSYSTEM28FEB2026:06:13:53

·  Maintains audit trace

·  Important in regulatory environments

16. Character Function Advanced Usage

data character_demo;

set grid_load_clean;

Region_Upper = upcase(Grid_Region);

Region_Lower = lowcase(Grid_Region);

Region_Label = catx(" - ", propcase(Grid_Region), put(Year,4.));

Region_Clean = strip(trim(Grid_Region));

run;

proc print data=character_demo;

run;

OUTPUT:

ObsGrid_RegionDemand_MWRenewable_ShareStorage_UsageFrequency_DeviationOutage_EventsStability_IndexMonthDayYearGrid_DateUtilization_PercentageUtilization_ClassMonths_Since_StartNext_MonthFraud_FlagRegion_UpperRegion_LowerRegion_LabelRegion_Clean
1Central130055350.0008855202505MAY202565.0Medium4238930CENTRALcentralCentral - 2025Central
2Central160065500.01095912202512SEP202580.0Medium8240150CENTRALcentralCentral - 2025Central
3East1800100500.10370325202525MAR202590.0High2238320EASTeastEast - 2025East
4East200050450.08475718202518JUL2025100.0High6239540EASTeastEast - 2025East
5East190052420.04191125202505DEC202595.0High11241070EASTeastEast - 2025East
6North120045300.02185115202515JAN202560.0Low0237730NORTHnorthNorth - 2025North
7North170048380.032821028202528OCT202585.0Medium9240460NORTHnorthNorth - 2025North
8South150060400.05290220202520FEB202575.0Medium1238010SOUTHsouthSouth - 2025South
9South140060400.03292630202530JUN202570.0Medium5239230SOUTHsouthSouth - 2025South
10South180070550.093781115202515NOV202590.0High10240760SOUTHsouthSouth - 2025South
11West90030200.50560410202510APR202545.0Low3238621WESTwestWest - 2025West
12West95035250.02189822202522AUG202547.5Low7239850WESTwestWest - 2025West

17. Numeric Function Enhancements

data numeric_demo;

set grid_load_clean;

Demand_Squared = Demand_MW**2;

Demand_Log = log(Demand_MW);

Demand_Root = sqrt(Demand_MW);

Demand_Rounded = round(Demand_MW,10);

run;

proc print data=numeric_demo;

run;

OUTPUT:

ObsGrid_RegionDemand_MWRenewable_ShareStorage_UsageFrequency_DeviationOutage_EventsStability_IndexMonthDayYearGrid_DateUtilization_PercentageUtilization_ClassMonths_Since_StartNext_MonthFraud_FlagDemand_SquaredDemand_LogDemand_RootDemand_Rounded
1Central130055350.0008855202505MAY202565.0Medium423893016900007.1701236.05551300
2Central160065500.01095912202512SEP202580.0Medium824015025600007.3777640.00001600
3East1800100500.10370325202525MAR202590.0High223832032400007.4955442.42641800
4East200050450.08475718202518JUL2025100.0High623954040000007.6009044.72142000
5East190052420.04191125202505DEC202595.0High1124107036100007.5496143.58901900
6North120045300.02185115202515JAN202560.0Low023773014400007.0900834.64101200
7North170048380.032821028202528OCT202585.0Medium924046028900007.4383841.23111700
8South150060400.05290220202520FEB202575.0Medium123801022500007.3132238.72981500
9South140060400.03292630202530JUN202570.0Medium523923019600007.2442337.41661400
10South180070550.093781115202515NOV202590.0High1024076032400007.4955442.42641800
11West90030200.50560410202510APR202545.0Low32386218100006.8023930.0000900
12West95035250.02189822202522AUG202547.5Low72398509025006.8564630.8221950

18. Final Corrected Dataset Code

data final_grid_system;

set merged_grid;

Region_Code = catx("-",substr(Grid_Region,1,3),Year);

Adjusted_Stability = mean(Stability_Index,100-Frequency_Deviation*100);

Performance_Score = sum(Utilization_Percentage,

                        Renewable_Share,

                        Adjusted_Stability);

if Fraud_Flag=1 then Status="Under Review";

else Status="Stable";

run;

proc print data=final_grid_system;

run;

OUTPUT:

ObsGrid_RegionDemand_MWRenewable_ShareStorage_UsageFrequency_DeviationOutage_EventsStability_IndexMonthDayYearGrid_DateUtilization_PercentageUtilization_ClassMonths_Since_StartNext_MonthFraud_FlagCapacity_MWRegion_CodeAdjusted_StabilityPerformance_ScoreStatus
1Central130055350.0008855202505MAY202565.0Medium42389301800Cen-202594.0214.0Stable
2Central160065500.01095912202512SEP202580.0Medium82401501800Cen-202597.0242.0Stable
3East1800100500.10370325202525MAR202590.0High22383202100Eas-202580.0270.0Stable
4East200050450.08475718202518JUL2025100.0High62395402100Eas-202583.5233.5Stable
5East190052420.04191125202505DEC202595.0High112410702100Eas-202593.5240.5Stable
6North120045300.02185115202515JAN202560.0Low02377302000Nor-202591.5196.5Stable
7North170048380.032821028202528OCT202585.0Medium92404602000Nor-202589.5222.5Stable
8South150060400.05290220202520FEB202575.0Medium12380102200Sou-202592.5227.5Stable
9South140060400.03292630202530JUN202570.0Medium52392302200Sou-202594.5224.5Stable
10South180070550.093781115202515NOV202590.0High102407602200Sou-202584.5244.5Stable
11West90030200.50560410202510APR202545.0Low32386211500Wes-202555.0130.0Under Review
12West95035250.02189822202522AUG202547.5Low72398501500Wes-202593.5176.0Stable

19. Delete Dataset

proc datasets library=work;

delete grid_backup;

quit;

LOG:

NOTE: Deleting WORK.GRID_BACKUP (memtype=DATA).

Intentional Error Explanation

Error

Why Wrong

Fix

Month=13

Invalid date

Delete record

Renewable 110

Impossible %

Capped to 100

Missing frequency

Null issue

Set to 0

June 31

Invalid day

Adjusted

Duplicate append

Data inflation

Delete backup


🔤 Character Functions

·  strip()

·  trim()

·  cat()

·  catx()

·  propcase()

·  upcase()

·  lowcase()

·  coalesce()

🔢 Numeric Functions

·  round()

·  min()

·  max()

·  abs()

·  sum()

·  mean()

Key Learning Points

·  Data cleaning is essential.

·  Date validation critical.

·  Macros improve reusability.

·  Character normalization prevents merge errors.

·  Fraud detection requires logical thresholds.

·  Append may cause duplication.

·  Transpose useful for reporting.

·  Numeric functions improve calculations.

Conclusion

Here we designed, debugged, and built a fraud-detection-enabled Smart Grid Load Balancing system using SAS.

·       Created realistic dataset

·       Introduced errors intentionally

·       Corrected them

·       Applied macro logic

·       Used advanced date functions

·       Applied character & numeric transformations

·       Used SET, MERGE, APPEND, TRANSPOSE

·       Managed datasets using PROC DATASETS

·       Used RETAIN for rolling analytics

·       Implemented weighted fraud scoring

·       Built validation macro

·       Performed independent QC using PROC COMPARE

·       Applied statistical outlier detection

·       Created reusable reporting macros

·       Maintained audit trace dataset

This mirrors real-world energy analytics projects where grid stability is monitored daily and fraud detection safeguards reporting integrity.

SAS INTERVIEW QUESTIONS

1.       1.What is PROC MEANS used for in SAS?

2.     2.What is the difference between SET and MERGE statements?

3.       3.Why are macros used in SAS?

--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

About the Author:

SAS Learning Hub is a data analytics and SAS programming platform focused on clinical, financial, and real-world data analysis. The content is created by professionals with academic training in Pharmaceutics and hands-on experience in Base SAS, PROC SQL, Macros, SDTM, and ADaM, providing practical and industry-relevant SAS learning resources.


Disclaimer:

The datasets and analysis in this article are created for educational and demonstration purposes only. They do not represent MONITORING DEMAND data.


Our Mission:

This blog provides industry-focused SAS programming tutorials and analytics projects covering finance, healthcare, and technology.


This project is suitable for:

·  Students learning SAS

·  Data analysts building portfolios

·  Professionals preparing for SAS interviews

·  Bloggers writing about analytics 

·  Clinical SAS Programmer

·  Research Data Analyst

·  Regulatory Data Validator


--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Follow Us On : 


 
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

--->Follow our blog for more SAS-based analytics projects and industry data models.

---> Support Us By Following Our Blog..

To deepen your understanding of SAS analytics, please refer to our other data science and industry-focused projects listed below:

1.Which Country Truly Dominates the Olympics? – A Complete SAS Medal Efficiency Analytics Project

2.Which Airports Are Really the Busiest? – An End-to-End SAS Airport Traffic Analytics Project

3.Can Data Predict Election Outcomes? – A Complete SAS Voting Analytics Project

--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

About Us | Contact Privacy Policy

Comments

Popular posts from this blog

412.Can We Build And Clean A University Course Analytics & Fraud Detection System In Sas Using Only Macros While Intentionally Creating And Fixing Errors?

409.Can We Build a Reliable Emergency Services Analytics & Fraud Detection System in SAS While Identifying and Fixing Intentional Errors?

397.If a satellite has excellent signal strength but very high latency, can it still deliver good quality communication? Why or why not?A Sas Study