434.Can Advanced SAS Programming Detect, Clean, and Optimize Grid Stability Data While Identifying Fraud Patterns?
Advanced SAS Intelligence for Grid Stability Optimization and Fraud Detection
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
HERE IN THIS PROJECT WE USED THESE SAS STATEMENTS —DATA | SET | MERGE | INPUT | DATALINES | IF - ELSE | PROC SORT | PROC SQL | PROC APPEND | PROC TRANSPOSE | PROC DATASETS | RUN | QUIT | %MACRO | %MEND | FUNCTIONS ETC | R LANGUAGE
Introduction
Power
grid stability is one of the most critical components in modern energy systems.
With the increasing integration of renewable sources like wind and solar,
maintaining grid balance has become complex. Variations in load demand,
frequency deviation, and backup generation directly impact reliability.
In this
project, we simulate a Grid Stability Dataset using both R and SAS,
intentionally introduce errors, detect and fix them using advanced SAS
techniques, and implement fraud detection logic using macros. We also apply
real-world transformations like date handling, utilization classification,
merging, transposing, and dataset optimization.
This
project is designed for:
- SAS Programmer Interviews
- Real-world Data Cleaning
Scenarios
- Clinical/Non-clinical Data
Transformation Practice
Table of Contents
- Business Context
- Raw Dataset Creation (R
& SAS)
- Intentional Errors in
Dataset
- Error Identification &
Fixing
- Data Cleaning using SAS
Functions
- Date Handling (MDY, INTCK,
INTNX)
- Utilization Classification
- Data Transformation (SET,
MERGE, APPEND)
- PROC TRANSPOSE Usage
- Fraud Detection using MACROS
- Numeric & Character
Functions
- PROC DATASETS DELETE
- Final Clean Dataset
- Key Highlights
- 20 Key Points About The Project
- Summary & Conclusion
Business Context
Electric
grid operators monitor:
- Renewable output
(Wind/Solar)
- Demand fluctuations
- Frequency stability
- Backup generation usage
Fraud
scenarios:
- Inflated backup generation
costs
- Artificial frequency
manipulation
- Fake utilization reporting
1. Raw Dataset Creation in R (With Intentional Errors)
library(dplyr)
grid_data <- data.frame(
Grid_Region = c("north","south","east","west","central",
"north","south","east","west","central",
"north","south"),
Wind_Output = c(120,140,NA,160,180,200,210,220,NA,240,250,260),
Solar_Output = c(80,90,100,NA,120,130,140,150,160,NA,180,190),
Load_Demand = c(250,260,270,280,290,300,310,320,330,340,350,360),
Frequency_Deviation = c(0.02,-0.03,0.05,NA,0.01,0.04,0.06,0.02,NA,0.03,0.01,0.05),
Backup_Generation = c(50,60,70,80,90,100,110,120,130,140,150,160),
Percentage = c(90,85,88,92,95,97,99,91,89,93,96,98),
Fees = c(1000,2000,NA,1500,1800,2200,2500,2700,NA,3000,3200,3500),
Date = c("2025-01-01","2025-02-01","wrong","2025-04-01","2025-05-01",
"2025-06-01","2025-07-01","2025-08-01","2025-09-01","wrong",
"2025-11-01","2025-12-01")
)
grid_data
OUTPUT:
|
|
Grid_Region |
Wind_Output |
Solar_Output |
Load_Demand |
Frequency_Deviation |
Backup_Generation |
Percentage |
Fees |
Date |
|
1 |
north |
120 |
80 |
250 |
0.02 |
50 |
90 |
1000 |
01-01-2025 |
|
2 |
south |
140 |
90 |
260 |
-0.03 |
60 |
85 |
2000 |
01-02-2025 |
|
3 |
east |
NA |
100 |
270 |
0.05 |
70 |
88 |
NA |
wrong |
|
4 |
west |
160 |
NA |
280 |
NA |
80 |
92 |
1500 |
01-04-2025 |
|
5 |
central |
180 |
120 |
290 |
0.01 |
90 |
95 |
1800 |
01-05-2025 |
|
6 |
north |
200 |
130 |
300 |
0.04 |
100 |
97 |
2200 |
01-06-2025 |
|
7 |
south |
210 |
140 |
310 |
0.06 |
110 |
99 |
2500 |
01-07-2025 |
|
8 |
east |
220 |
150 |
320 |
0.02 |
120 |
91 |
2700 |
01-08-2025 |
|
9 |
west |
NA |
160 |
330 |
NA |
130 |
89 |
NA |
01-09-2025 |
|
10 |
central |
240 |
NA |
340 |
0.03 |
140 |
93 |
3000 |
wrong |
|
11 |
north |
250 |
180 |
350 |
0.01 |
150 |
96 |
3200 |
01-11-2025 |
|
12 |
south |
260 |
190 |
360 |
0.05 |
160 |
98 |
3500 |
01-12-2025 |
Errors Introduced
· Missing
values (NA)
· Wrong date
format (“wrong”)
· Inconsistent casing (north vs North)
2. Raw Dataset Creation in SAS (With Intentional Errors)
data grid_raw;
input Grid_Region $ Wind_Output Solar_Output Load_Demand Frequency_Deviation
Backup_Generation Percentage Fees Date :$20.;
datalines;
north 120 80 250 0.02 50 90 1000 2025-01-01
south 140 90 260 -0.03 60 85 2000 2025-02-01
east . 100 270 0.05 70 88 . wrong
west 160 . 280 . 80 92 1500 2025-04-01
central 180 120 290 0.01 90 95 1800 2025-05-01
north 200 130 300 0.04 100 97 2200 2025-06-01
south 210 140 310 0.06 110 99 2500 2025-07-01
east 220 150 320 0.02 120 91 2700 2025-08-01
west . 160 330 . 130 89 . 2025-09-01
central 240 . 340 0.03 140 93 3000 wrong
north 250 180 350 0.01 150 96 3200 2025-11-01
south 260 190 360 0.05 160 98 3500 2025-12-01
;
run;
proc print data=grid_raw;
run;
OUTPUT:
| Obs | Grid_Region | Wind_Output | Solar_Output | Load_Demand | Frequency_Deviation | Backup_Generation | Percentage | Fees | Date |
|---|---|---|---|---|---|---|---|---|---|
| 1 | north | 120 | 80 | 250 | 0.02 | 50 | 90 | 1000 | 2025-01-01 |
| 2 | south | 140 | 90 | 260 | -0.03 | 60 | 85 | 2000 | 2025-02-01 |
| 3 | east | . | 100 | 270 | 0.05 | 70 | 88 | . | wrong |
| 4 | west | 160 | . | 280 | . | 80 | 92 | 1500 | 2025-04-01 |
| 5 | central | 180 | 120 | 290 | 0.01 | 90 | 95 | 1800 | 2025-05-01 |
| 6 | north | 200 | 130 | 300 | 0.04 | 100 | 97 | 2200 | 2025-06-01 |
| 7 | south | 210 | 140 | 310 | 0.06 | 110 | 99 | 2500 | 2025-07-01 |
| 8 | east | 220 | 150 | 320 | 0.02 | 120 | 91 | 2700 | 2025-08-01 |
| 9 | west | . | 160 | 330 | . | 130 | 89 | . | 2025-09-01 |
| 10 | central | 240 | . | 340 | 0.03 | 140 | 93 | 3000 | wrong |
| 11 | north | 250 | 180 | 350 | 0.01 | 150 | 96 | 3200 | 2025-11-01 |
| 12 | south | 260 | 190 | 360 | 0.05 | 160 | 98 | 3500 | 2025-12-01 |
3. Error Identification
Issues:
· Missing values
(.)
· Invalid
date (“wrong”)
· Lowercase
region names
· Null fees
· Missing solar/wind output
4. Data Cleaning Using SAS Functions
data grid_clean;
set grid_raw;
Grid_Region = propcase(strip(Grid_Region));
Wind_Output = coalesce(Wind_Output, 150);
Solar_Output = coalesce(Solar_Output, 100);
Fees = coalesce(Fees, 1700);
run;
proc print data=grid_clean;
run;
OUTPUT:
| Obs | Grid_Region | Wind_Output | Solar_Output | Load_Demand | Frequency_Deviation | Backup_Generation | Percentage | Fees | Date |
|---|---|---|---|---|---|---|---|---|---|
| 1 | North | 120 | 80 | 250 | 0.02 | 50 | 90 | 1000 | 2025-01-01 |
| 2 | South | 140 | 90 | 260 | -0.03 | 60 | 85 | 2000 | 2025-02-01 |
| 3 | East | 150 | 100 | 270 | 0.05 | 70 | 88 | 1700 | wrong |
| 4 | West | 160 | 100 | 280 | . | 80 | 92 | 1500 | 2025-04-01 |
| 5 | Central | 180 | 120 | 290 | 0.01 | 90 | 95 | 1800 | 2025-05-01 |
| 6 | North | 200 | 130 | 300 | 0.04 | 100 | 97 | 2200 | 2025-06-01 |
| 7 | South | 210 | 140 | 310 | 0.06 | 110 | 99 | 2500 | 2025-07-01 |
| 8 | East | 220 | 150 | 320 | 0.02 | 120 | 91 | 2700 | 2025-08-01 |
| 9 | West | 150 | 160 | 330 | . | 130 | 89 | 1700 | 2025-09-01 |
| 10 | Central | 240 | 100 | 340 | 0.03 | 140 | 93 | 3000 | wrong |
| 11 | North | 250 | 180 | 350 | 0.01 | 150 | 96 | 3200 | 2025-11-01 |
| 12 | South | 260 | 190 | 360 | 0.05 | 160 | 98 | 3500 | 2025-12-01 |
Why Used
· STRIP →
removes spaces
· PROPCASE →
standard format
· COALESCE → handles missing values
5. Date Handling
data grid_dates;
set grid_clean;
SAS_Date = input(Date, yymmdd10.);
if Date = "wrong" then Date = "01MAR2025";
format SAS_Date date9.;
Month_Start = intnx('month', SAS_Date, 0, 'b');
Months_Diff = intck('month', '01JAN2025'd, SAS_Date);
run;
proc print data=grid_dates;
run;
OUTPUT:
| Obs | Grid_Region | Wind_Output | Solar_Output | Load_Demand | Frequency_Deviation | Backup_Generation | Percentage | Fees | Date | SAS_Date | Month_Start | Months_Diff |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | North | 120 | 80 | 250 | 0.02 | 50 | 90 | 1000 | 2025-01-01 | 01JAN2025 | 23742 | 0 |
| 2 | South | 140 | 90 | 260 | -0.03 | 60 | 85 | 2000 | 2025-02-01 | 01FEB2025 | 23773 | 1 |
| 3 | East | 150 | 100 | 270 | 0.05 | 70 | 88 | 1700 | 01MAR2025 | . | . | . |
| 4 | West | 160 | 100 | 280 | . | 80 | 92 | 1500 | 2025-04-01 | 01APR2025 | 23832 | 3 |
| 5 | Central | 180 | 120 | 290 | 0.01 | 90 | 95 | 1800 | 2025-05-01 | 01MAY2025 | 23862 | 4 |
| 6 | North | 200 | 130 | 300 | 0.04 | 100 | 97 | 2200 | 2025-06-01 | 01JUN2025 | 23893 | 5 |
| 7 | South | 210 | 140 | 310 | 0.06 | 110 | 99 | 2500 | 2025-07-01 | 01JUL2025 | 23923 | 6 |
| 8 | East | 220 | 150 | 320 | 0.02 | 120 | 91 | 2700 | 2025-08-01 | 01AUG2025 | 23954 | 7 |
| 9 | West | 150 | 160 | 330 | . | 130 | 89 | 1700 | 2025-09-01 | 01SEP2025 | 23985 | 8 |
| 10 | Central | 240 | 100 | 340 | 0.03 | 140 | 93 | 3000 | 01MAR2025 | . | . | . |
| 11 | North | 250 | 180 | 350 | 0.01 | 150 | 96 | 3200 | 2025-11-01 | 01NOV2025 | 24046 | 10 |
| 12 | South | 260 | 190 | 360 | 0.05 | 160 | 98 | 3500 | 2025-12-01 | 01DEC2025 | 24076 | 11 |
Functions
· MDY() →
create dates
· INTNX() →
shift intervals
· INTCK() → calculate differences
6. Utilization Classification
data grid_util;
set grid_dates;
Utilization = (Wind_Output + Solar_Output) / Load_Demand;
length Util_Class $10.;
if Utilization > 0.9 then Util_Class = "High";
else if Utilization > 0.8 then Util_Class = "Medium";
else Util_Class = "Low";
run;
proc print data=grid_util;
run;
OUTPUT:
| Obs | Grid_Region | Wind_Output | Solar_Output | Load_Demand | Frequency_Deviation | Backup_Generation | Percentage | Fees | Date | SAS_Date | Month_Start | Months_Diff | Utilization | Util_Class |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | North | 120 | 80 | 250 | 0.02 | 50 | 90 | 1000 | 2025-01-01 | 01JAN2025 | 23742 | 0 | 0.80000 | Low |
| 2 | South | 140 | 90 | 260 | -0.03 | 60 | 85 | 2000 | 2025-02-01 | 01FEB2025 | 23773 | 1 | 0.88462 | Medium |
| 3 | East | 150 | 100 | 270 | 0.05 | 70 | 88 | 1700 | 01MAR2025 | . | . | . | 0.92593 | High |
| 4 | West | 160 | 100 | 280 | . | 80 | 92 | 1500 | 2025-04-01 | 01APR2025 | 23832 | 3 | 0.92857 | High |
| 5 | Central | 180 | 120 | 290 | 0.01 | 90 | 95 | 1800 | 2025-05-01 | 01MAY2025 | 23862 | 4 | 1.03448 | High |
| 6 | North | 200 | 130 | 300 | 0.04 | 100 | 97 | 2200 | 2025-06-01 | 01JUN2025 | 23893 | 5 | 1.10000 | High |
| 7 | South | 210 | 140 | 310 | 0.06 | 110 | 99 | 2500 | 2025-07-01 | 01JUL2025 | 23923 | 6 | 1.12903 | High |
| 8 | East | 220 | 150 | 320 | 0.02 | 120 | 91 | 2700 | 2025-08-01 | 01AUG2025 | 23954 | 7 | 1.15625 | High |
| 9 | West | 150 | 160 | 330 | . | 130 | 89 | 1700 | 2025-09-01 | 01SEP2025 | 23985 | 8 | 0.93939 | High |
| 10 | Central | 240 | 100 | 340 | 0.03 | 140 | 93 | 3000 | 01MAR2025 | . | . | . | 1.00000 | High |
| 11 | North | 250 | 180 | 350 | 0.01 | 150 | 96 | 3200 | 2025-11-01 | 01NOV2025 | 24046 | 10 | 1.22857 | High |
| 12 | South | 260 | 190 | 360 | 0.05 | 160 | 98 | 3500 | 2025-12-01 | 01DEC2025 | 24076 | 11 | 1.25000 | High |
7. DATA STEP (SET)
data grid_copy;
set grid_util;
run;
proc print data=grid_copy;
run;
OUTPUT:
| Obs | Grid_Region | Wind_Output | Solar_Output | Load_Demand | Frequency_Deviation | Backup_Generation | Percentage | Fees | Date | SAS_Date | Month_Start | Months_Diff | Utilization | Util_Class |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | North | 120 | 80 | 250 | 0.02 | 50 | 90 | 1000 | 2025-01-01 | 01JAN2025 | 23742 | 0 | 0.80000 | Low |
| 2 | South | 140 | 90 | 260 | -0.03 | 60 | 85 | 2000 | 2025-02-01 | 01FEB2025 | 23773 | 1 | 0.88462 | Medium |
| 3 | East | 150 | 100 | 270 | 0.05 | 70 | 88 | 1700 | 01MAR2025 | . | . | . | 0.92593 | High |
| 4 | West | 160 | 100 | 280 | . | 80 | 92 | 1500 | 2025-04-01 | 01APR2025 | 23832 | 3 | 0.92857 | High |
| 5 | Central | 180 | 120 | 290 | 0.01 | 90 | 95 | 1800 | 2025-05-01 | 01MAY2025 | 23862 | 4 | 1.03448 | High |
| 6 | North | 200 | 130 | 300 | 0.04 | 100 | 97 | 2200 | 2025-06-01 | 01JUN2025 | 23893 | 5 | 1.10000 | High |
| 7 | South | 210 | 140 | 310 | 0.06 | 110 | 99 | 2500 | 2025-07-01 | 01JUL2025 | 23923 | 6 | 1.12903 | High |
| 8 | East | 220 | 150 | 320 | 0.02 | 120 | 91 | 2700 | 2025-08-01 | 01AUG2025 | 23954 | 7 | 1.15625 | High |
| 9 | West | 150 | 160 | 330 | . | 130 | 89 | 1700 | 2025-09-01 | 01SEP2025 | 23985 | 8 | 0.93939 | High |
| 10 | Central | 240 | 100 | 340 | 0.03 | 140 | 93 | 3000 | 01MAR2025 | . | . | . | 1.00000 | High |
| 11 | North | 250 | 180 | 350 | 0.01 | 150 | 96 | 3200 | 2025-11-01 | 01NOV2025 | 24046 | 10 | 1.22857 | High |
| 12 | South | 260 | 190 | 360 | 0.05 | 160 | 98 | 3500 | 2025-12-01 | 01DEC2025 | 24076 | 11 | 1.25000 | High |
Why
· Creates
duplicate dataset
· Used for pipeline processing
8. MERGE Example
data fees_update;
input Grid_Region $ Extra_Fees;
datalines;
North 100
South 200
West 50
East 25
Central 0
;
run;
proc print data=fees_update;
run;
OUTPUT:
| Obs | Grid_Region | Extra_Fees |
|---|---|---|
| 1 | North | 100 |
| 2 | South | 200 |
| 3 | West | 50 |
| 4 | East | 25 |
| 5 | Central | 0 |
proc sort data=grid_copy;by Grid_Region;run;
proc print data=grid_copy;
run;
OUTPUT:
| Obs | Grid_Region | Wind_Output | Solar_Output | Load_Demand | Frequency_Deviation | Backup_Generation | Percentage | Fees | Date | SAS_Date | Month_Start | Months_Diff | Utilization | Util_Class |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | Central | 180 | 120 | 290 | 0.01 | 90 | 95 | 1800 | 2025-05-01 | 01MAY2025 | 23862 | 4 | 1.03448 | High |
| 2 | Central | 240 | 100 | 340 | 0.03 | 140 | 93 | 3000 | 01MAR2025 | . | . | . | 1.00000 | High |
| 3 | East | 150 | 100 | 270 | 0.05 | 70 | 88 | 1700 | 01MAR2025 | . | . | . | 0.92593 | High |
| 4 | East | 220 | 150 | 320 | 0.02 | 120 | 91 | 2700 | 2025-08-01 | 01AUG2025 | 23954 | 7 | 1.15625 | High |
| 5 | North | 120 | 80 | 250 | 0.02 | 50 | 90 | 1000 | 2025-01-01 | 01JAN2025 | 23742 | 0 | 0.80000 | Low |
| 6 | North | 200 | 130 | 300 | 0.04 | 100 | 97 | 2200 | 2025-06-01 | 01JUN2025 | 23893 | 5 | 1.10000 | High |
| 7 | North | 250 | 180 | 350 | 0.01 | 150 | 96 | 3200 | 2025-11-01 | 01NOV2025 | 24046 | 10 | 1.22857 | High |
| 8 | South | 140 | 90 | 260 | -0.03 | 60 | 85 | 2000 | 2025-02-01 | 01FEB2025 | 23773 | 1 | 0.88462 | Medium |
| 9 | South | 210 | 140 | 310 | 0.06 | 110 | 99 | 2500 | 2025-07-01 | 01JUL2025 | 23923 | 6 | 1.12903 | High |
| 10 | South | 260 | 190 | 360 | 0.05 | 160 | 98 | 3500 | 2025-12-01 | 01DEC2025 | 24076 | 11 | 1.25000 | High |
| 11 | West | 160 | 100 | 280 | . | 80 | 92 | 1500 | 2025-04-01 | 01APR2025 | 23832 | 3 | 0.92857 | High |
| 12 | West | 150 | 160 | 330 | . | 130 | 89 | 1700 | 2025-09-01 | 01SEP2025 | 23985 | 8 | 0.93939 | High |
proc sort data=fees_update;by Grid_Region;run;
proc print data=fees_update;
run;
OUTPUT:
| Obs | Grid_Region | Extra_Fees |
|---|---|---|
| 1 | Central | 0 |
| 2 | East | 25 |
| 3 | North | 100 |
| 4 | South | 200 |
| 5 | West | 50 |
data merged_data;
merge grid_copy
fees_update;
by Grid_Region;
run;
proc print data=merged_data;
run;
OUTPUT:
| Obs | Grid_Region | Wind_Output | Solar_Output | Load_Demand | Frequency_Deviation | Backup_Generation | Percentage | Fees | Date | SAS_Date | Month_Start | Months_Diff | Utilization | Util_Class | Extra_Fees |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | Central | 180 | 120 | 290 | 0.01 | 90 | 95 | 1800 | 2025-05-01 | 01MAY2025 | 23862 | 4 | 1.03448 | High | 0 |
| 2 | Central | 240 | 100 | 340 | 0.03 | 140 | 93 | 3000 | 01MAR2025 | . | . | . | 1.00000 | High | 0 |
| 3 | East | 150 | 100 | 270 | 0.05 | 70 | 88 | 1700 | 01MAR2025 | . | . | . | 0.92593 | High | 25 |
| 4 | East | 220 | 150 | 320 | 0.02 | 120 | 91 | 2700 | 2025-08-01 | 01AUG2025 | 23954 | 7 | 1.15625 | High | 25 |
| 5 | North | 120 | 80 | 250 | 0.02 | 50 | 90 | 1000 | 2025-01-01 | 01JAN2025 | 23742 | 0 | 0.80000 | Low | 100 |
| 6 | North | 200 | 130 | 300 | 0.04 | 100 | 97 | 2200 | 2025-06-01 | 01JUN2025 | 23893 | 5 | 1.10000 | High | 100 |
| 7 | North | 250 | 180 | 350 | 0.01 | 150 | 96 | 3200 | 2025-11-01 | 01NOV2025 | 24046 | 10 | 1.22857 | High | 100 |
| 8 | South | 140 | 90 | 260 | -0.03 | 60 | 85 | 2000 | 2025-02-01 | 01FEB2025 | 23773 | 1 | 0.88462 | Medium | 200 |
| 9 | South | 210 | 140 | 310 | 0.06 | 110 | 99 | 2500 | 2025-07-01 | 01JUL2025 | 23923 | 6 | 1.12903 | High | 200 |
| 10 | South | 260 | 190 | 360 | 0.05 | 160 | 98 | 3500 | 2025-12-01 | 01DEC2025 | 24076 | 11 | 1.25000 | High | 200 |
| 11 | West | 160 | 100 | 280 | . | 80 | 92 | 1500 | 2025-04-01 | 01APR2025 | 23832 | 3 | 0.92857 | High | 50 |
| 12 | West | 150 | 160 | 330 | . | 130 | 89 | 1700 | 2025-09-01 | 01SEP2025 | 23985 | 8 | 0.93939 | High | 50 |
9. APPEND
proc append base=grid_copy
data=grid_util force;
run;
proc print data=grid_copy;
run;
OUTPUT:
| Obs | Grid_Region | Wind_Output | Solar_Output | Load_Demand | Frequency_Deviation | Backup_Generation | Percentage | Fees | Date | SAS_Date | Month_Start | Months_Diff | Utilization | Util_Class |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | Central | 180 | 120 | 290 | 0.01 | 90 | 95 | 1800 | 2025-05-01 | 01MAY2025 | 23862 | 4 | 1.03448 | High |
| 2 | Central | 240 | 100 | 340 | 0.03 | 140 | 93 | 3000 | 01MAR2025 | . | . | . | 1.00000 | High |
| 3 | East | 150 | 100 | 270 | 0.05 | 70 | 88 | 1700 | 01MAR2025 | . | . | . | 0.92593 | High |
| 4 | East | 220 | 150 | 320 | 0.02 | 120 | 91 | 2700 | 2025-08-01 | 01AUG2025 | 23954 | 7 | 1.15625 | High |
| 5 | North | 120 | 80 | 250 | 0.02 | 50 | 90 | 1000 | 2025-01-01 | 01JAN2025 | 23742 | 0 | 0.80000 | Low |
| 6 | North | 200 | 130 | 300 | 0.04 | 100 | 97 | 2200 | 2025-06-01 | 01JUN2025 | 23893 | 5 | 1.10000 | High |
| 7 | North | 250 | 180 | 350 | 0.01 | 150 | 96 | 3200 | 2025-11-01 | 01NOV2025 | 24046 | 10 | 1.22857 | High |
| 8 | South | 140 | 90 | 260 | -0.03 | 60 | 85 | 2000 | 2025-02-01 | 01FEB2025 | 23773 | 1 | 0.88462 | Medium |
| 9 | South | 210 | 140 | 310 | 0.06 | 110 | 99 | 2500 | 2025-07-01 | 01JUL2025 | 23923 | 6 | 1.12903 | High |
| 10 | South | 260 | 190 | 360 | 0.05 | 160 | 98 | 3500 | 2025-12-01 | 01DEC2025 | 24076 | 11 | 1.25000 | High |
| 11 | West | 160 | 100 | 280 | . | 80 | 92 | 1500 | 2025-04-01 | 01APR2025 | 23832 | 3 | 0.92857 | High |
| 12 | West | 150 | 160 | 330 | . | 130 | 89 | 1700 | 2025-09-01 | 01SEP2025 | 23985 | 8 | 0.93939 | High |
| 13 | North | 120 | 80 | 250 | 0.02 | 50 | 90 | 1000 | 2025-01-01 | 01JAN2025 | 23742 | 0 | 0.80000 | Low |
| 14 | South | 140 | 90 | 260 | -0.03 | 60 | 85 | 2000 | 2025-02-01 | 01FEB2025 | 23773 | 1 | 0.88462 | Medium |
| 15 | East | 150 | 100 | 270 | 0.05 | 70 | 88 | 1700 | 01MAR2025 | . | . | . | 0.92593 | High |
| 16 | West | 160 | 100 | 280 | . | 80 | 92 | 1500 | 2025-04-01 | 01APR2025 | 23832 | 3 | 0.92857 | High |
| 17 | Central | 180 | 120 | 290 | 0.01 | 90 | 95 | 1800 | 2025-05-01 | 01MAY2025 | 23862 | 4 | 1.03448 | High |
| 18 | North | 200 | 130 | 300 | 0.04 | 100 | 97 | 2200 | 2025-06-01 | 01JUN2025 | 23893 | 5 | 1.10000 | High |
| 19 | South | 210 | 140 | 310 | 0.06 | 110 | 99 | 2500 | 2025-07-01 | 01JUL2025 | 23923 | 6 | 1.12903 | High |
| 20 | East | 220 | 150 | 320 | 0.02 | 120 | 91 | 2700 | 2025-08-01 | 01AUG2025 | 23954 | 7 | 1.15625 | High |
| 21 | West | 150 | 160 | 330 | . | 130 | 89 | 1700 | 2025-09-01 | 01SEP2025 | 23985 | 8 | 0.93939 | High |
| 22 | Central | 240 | 100 | 340 | 0.03 | 140 | 93 | 3000 | 01MAR2025 | . | . | . | 1.00000 | High |
| 23 | North | 250 | 180 | 350 | 0.01 | 150 | 96 | 3200 | 2025-11-01 | 01NOV2025 | 24046 | 10 | 1.22857 | High |
| 24 | South | 260 | 190 | 360 | 0.05 | 160 | 98 | 3500 | 2025-12-01 | 01DEC2025 | 24076 | 11 | 1.25000 | High |
10. PROC TRANSPOSE
proc transpose data=grid_copy out=grid_trans;
by Grid_Region NotSorted;
var Wind_Output Solar_Output Load_Demand;
run;
proc print data=grid_trans;
run;
OUTPUT:
| Obs | Grid_Region | _NAME_ | COL1 | COL2 | COL3 |
|---|---|---|---|---|---|
| 1 | Central | Wind_Output | 180 | 240 | . |
| 2 | Central | Solar_Output | 120 | 100 | . |
| 3 | Central | Load_Demand | 290 | 340 | . |
| 4 | East | Wind_Output | 150 | 220 | . |
| 5 | East | Solar_Output | 100 | 150 | . |
| 6 | East | Load_Demand | 270 | 320 | . |
| 7 | North | Wind_Output | 120 | 200 | 250 |
| 8 | North | Solar_Output | 80 | 130 | 180 |
| 9 | North | Load_Demand | 250 | 300 | 350 |
| 10 | South | Wind_Output | 140 | 210 | 260 |
| 11 | South | Solar_Output | 90 | 140 | 190 |
| 12 | South | Load_Demand | 260 | 310 | 360 |
| 13 | West | Wind_Output | 160 | 150 | . |
| 14 | West | Solar_Output | 100 | 160 | . |
| 15 | West | Load_Demand | 280 | 330 | . |
| 16 | North | Wind_Output | 120 | . | . |
| 17 | North | Solar_Output | 80 | . | . |
| 18 | North | Load_Demand | 250 | . | . |
| 19 | South | Wind_Output | 140 | . | . |
| 20 | South | Solar_Output | 90 | . | . |
| 21 | South | Load_Demand | 260 | . | . |
| 22 | East | Wind_Output | 150 | . | . |
| 23 | East | Solar_Output | 100 | . | . |
| 24 | East | Load_Demand | 270 | . | . |
| 25 | West | Wind_Output | 160 | . | . |
| 26 | West | Solar_Output | 100 | . | . |
| 27 | West | Load_Demand | 280 | . | . |
| 28 | Central | Wind_Output | 180 | . | . |
| 29 | Central | Solar_Output | 120 | . | . |
| 30 | Central | Load_Demand | 290 | . | . |
| 31 | North | Wind_Output | 200 | . | . |
| 32 | North | Solar_Output | 130 | . | . |
| 33 | North | Load_Demand | 300 | . | . |
| 34 | South | Wind_Output | 210 | . | . |
| 35 | South | Solar_Output | 140 | . | . |
| 36 | South | Load_Demand | 310 | . | . |
| 37 | East | Wind_Output | 220 | . | . |
| 38 | East | Solar_Output | 150 | . | . |
| 39 | East | Load_Demand | 320 | . | . |
| 40 | West | Wind_Output | 150 | . | . |
| 41 | West | Solar_Output | 160 | . | . |
| 42 | West | Load_Demand | 330 | . | . |
| 43 | Central | Wind_Output | 240 | . | . |
| 44 | Central | Solar_Output | 100 | . | . |
| 45 | Central | Load_Demand | 340 | . | . |
| 46 | North | Wind_Output | 250 | . | . |
| 47 | North | Solar_Output | 180 | . | . |
| 48 | North | Load_Demand | 350 | . | . |
| 49 | South | Wind_Output | 260 | . | . |
| 50 | South | Solar_Output | 190 | . | . |
| 51 | South | Load_Demand | 360 | . | . |
11. Numeric Functions
data numeric_ops;
set grid_copy;
Total_Output = sum(Wind_Output, Solar_Output);
Avg_Output = mean(Wind_Output, Solar_Output);
run;
proc print data=numeric_ops;
run;
OUTPUT:
| Obs | Grid_Region | Wind_Output | Solar_Output | Load_Demand | Frequency_Deviation | Backup_Generation | Percentage | Fees | Date | SAS_Date | Month_Start | Months_Diff | Utilization | Util_Class | Total_Output | Avg_Output |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | Central | 180 | 120 | 290 | 0.01 | 90 | 95 | 1800 | 2025-05-01 | 01MAY2025 | 23862 | 4 | 1.03448 | High | 300 | 150 |
| 2 | Central | 240 | 100 | 340 | 0.03 | 140 | 93 | 3000 | 01MAR2025 | . | . | . | 1.00000 | High | 340 | 170 |
| 3 | East | 150 | 100 | 270 | 0.05 | 70 | 88 | 1700 | 01MAR2025 | . | . | . | 0.92593 | High | 250 | 125 |
| 4 | East | 220 | 150 | 320 | 0.02 | 120 | 91 | 2700 | 2025-08-01 | 01AUG2025 | 23954 | 7 | 1.15625 | High | 370 | 185 |
| 5 | North | 120 | 80 | 250 | 0.02 | 50 | 90 | 1000 | 2025-01-01 | 01JAN2025 | 23742 | 0 | 0.80000 | Low | 200 | 100 |
| 6 | North | 200 | 130 | 300 | 0.04 | 100 | 97 | 2200 | 2025-06-01 | 01JUN2025 | 23893 | 5 | 1.10000 | High | 330 | 165 |
| 7 | North | 250 | 180 | 350 | 0.01 | 150 | 96 | 3200 | 2025-11-01 | 01NOV2025 | 24046 | 10 | 1.22857 | High | 430 | 215 |
| 8 | South | 140 | 90 | 260 | -0.03 | 60 | 85 | 2000 | 2025-02-01 | 01FEB2025 | 23773 | 1 | 0.88462 | Medium | 230 | 115 |
| 9 | South | 210 | 140 | 310 | 0.06 | 110 | 99 | 2500 | 2025-07-01 | 01JUL2025 | 23923 | 6 | 1.12903 | High | 350 | 175 |
| 10 | South | 260 | 190 | 360 | 0.05 | 160 | 98 | 3500 | 2025-12-01 | 01DEC2025 | 24076 | 11 | 1.25000 | High | 450 | 225 |
| 11 | West | 160 | 100 | 280 | . | 80 | 92 | 1500 | 2025-04-01 | 01APR2025 | 23832 | 3 | 0.92857 | High | 260 | 130 |
| 12 | West | 150 | 160 | 330 | . | 130 | 89 | 1700 | 2025-09-01 | 01SEP2025 | 23985 | 8 | 0.93939 | High | 310 | 155 |
| 13 | North | 120 | 80 | 250 | 0.02 | 50 | 90 | 1000 | 2025-01-01 | 01JAN2025 | 23742 | 0 | 0.80000 | Low | 200 | 100 |
| 14 | South | 140 | 90 | 260 | -0.03 | 60 | 85 | 2000 | 2025-02-01 | 01FEB2025 | 23773 | 1 | 0.88462 | Medium | 230 | 115 |
| 15 | East | 150 | 100 | 270 | 0.05 | 70 | 88 | 1700 | 01MAR2025 | . | . | . | 0.92593 | High | 250 | 125 |
| 16 | West | 160 | 100 | 280 | . | 80 | 92 | 1500 | 2025-04-01 | 01APR2025 | 23832 | 3 | 0.92857 | High | 260 | 130 |
| 17 | Central | 180 | 120 | 290 | 0.01 | 90 | 95 | 1800 | 2025-05-01 | 01MAY2025 | 23862 | 4 | 1.03448 | High | 300 | 150 |
| 18 | North | 200 | 130 | 300 | 0.04 | 100 | 97 | 2200 | 2025-06-01 | 01JUN2025 | 23893 | 5 | 1.10000 | High | 330 | 165 |
| 19 | South | 210 | 140 | 310 | 0.06 | 110 | 99 | 2500 | 2025-07-01 | 01JUL2025 | 23923 | 6 | 1.12903 | High | 350 | 175 |
| 20 | East | 220 | 150 | 320 | 0.02 | 120 | 91 | 2700 | 2025-08-01 | 01AUG2025 | 23954 | 7 | 1.15625 | High | 370 | 185 |
| 21 | West | 150 | 160 | 330 | . | 130 | 89 | 1700 | 2025-09-01 | 01SEP2025 | 23985 | 8 | 0.93939 | High | 310 | 155 |
| 22 | Central | 240 | 100 | 340 | 0.03 | 140 | 93 | 3000 | 01MAR2025 | . | . | . | 1.00000 | High | 340 | 170 |
| 23 | North | 250 | 180 | 350 | 0.01 | 150 | 96 | 3200 | 2025-11-01 | 01NOV2025 | 24046 | 10 | 1.22857 | High | 430 | 215 |
| 24 | South | 260 | 190 | 360 | 0.05 | 160 | 98 | 3500 | 2025-12-01 | 01DEC2025 | 24076 | 11 | 1.25000 | High | 450 | 225 |
12. Character Functions
data char_ops;
set grid_copy;
Region_Upper = upcase(Grid_Region);
Region_Lower = lowcase(Grid_Region);
Region_Concat = catx("-", Grid_Region, Util_Class);
run;
proc print data=char_ops;
run;
OUTPUT:
| Obs | Grid_Region | Wind_Output | Solar_Output | Load_Demand | Frequency_Deviation | Backup_Generation | Percentage | Fees | Date | SAS_Date | Month_Start | Months_Diff | Utilization | Util_Class | Region_Upper | Region_Lower | Region_Concat |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | Central | 180 | 120 | 290 | 0.01 | 90 | 95 | 1800 | 2025-05-01 | 01MAY2025 | 23862 | 4 | 1.03448 | High | CENTRAL | central | Central-High |
| 2 | Central | 240 | 100 | 340 | 0.03 | 140 | 93 | 3000 | 01MAR2025 | . | . | . | 1.00000 | High | CENTRAL | central | Central-High |
| 3 | East | 150 | 100 | 270 | 0.05 | 70 | 88 | 1700 | 01MAR2025 | . | . | . | 0.92593 | High | EAST | east | East-High |
| 4 | East | 220 | 150 | 320 | 0.02 | 120 | 91 | 2700 | 2025-08-01 | 01AUG2025 | 23954 | 7 | 1.15625 | High | EAST | east | East-High |
| 5 | North | 120 | 80 | 250 | 0.02 | 50 | 90 | 1000 | 2025-01-01 | 01JAN2025 | 23742 | 0 | 0.80000 | Low | NORTH | north | North-Low |
| 6 | North | 200 | 130 | 300 | 0.04 | 100 | 97 | 2200 | 2025-06-01 | 01JUN2025 | 23893 | 5 | 1.10000 | High | NORTH | north | North-High |
| 7 | North | 250 | 180 | 350 | 0.01 | 150 | 96 | 3200 | 2025-11-01 | 01NOV2025 | 24046 | 10 | 1.22857 | High | NORTH | north | North-High |
| 8 | South | 140 | 90 | 260 | -0.03 | 60 | 85 | 2000 | 2025-02-01 | 01FEB2025 | 23773 | 1 | 0.88462 | Medium | SOUTH | south | South-Medium |
| 9 | South | 210 | 140 | 310 | 0.06 | 110 | 99 | 2500 | 2025-07-01 | 01JUL2025 | 23923 | 6 | 1.12903 | High | SOUTH | south | South-High |
| 10 | South | 260 | 190 | 360 | 0.05 | 160 | 98 | 3500 | 2025-12-01 | 01DEC2025 | 24076 | 11 | 1.25000 | High | SOUTH | south | South-High |
| 11 | West | 160 | 100 | 280 | . | 80 | 92 | 1500 | 2025-04-01 | 01APR2025 | 23832 | 3 | 0.92857 | High | WEST | west | West-High |
| 12 | West | 150 | 160 | 330 | . | 130 | 89 | 1700 | 2025-09-01 | 01SEP2025 | 23985 | 8 | 0.93939 | High | WEST | west | West-High |
| 13 | North | 120 | 80 | 250 | 0.02 | 50 | 90 | 1000 | 2025-01-01 | 01JAN2025 | 23742 | 0 | 0.80000 | Low | NORTH | north | North-Low |
| 14 | South | 140 | 90 | 260 | -0.03 | 60 | 85 | 2000 | 2025-02-01 | 01FEB2025 | 23773 | 1 | 0.88462 | Medium | SOUTH | south | South-Medium |
| 15 | East | 150 | 100 | 270 | 0.05 | 70 | 88 | 1700 | 01MAR2025 | . | . | . | 0.92593 | High | EAST | east | East-High |
| 16 | West | 160 | 100 | 280 | . | 80 | 92 | 1500 | 2025-04-01 | 01APR2025 | 23832 | 3 | 0.92857 | High | WEST | west | West-High |
| 17 | Central | 180 | 120 | 290 | 0.01 | 90 | 95 | 1800 | 2025-05-01 | 01MAY2025 | 23862 | 4 | 1.03448 | High | CENTRAL | central | Central-High |
| 18 | North | 200 | 130 | 300 | 0.04 | 100 | 97 | 2200 | 2025-06-01 | 01JUN2025 | 23893 | 5 | 1.10000 | High | NORTH | north | North-High |
| 19 | South | 210 | 140 | 310 | 0.06 | 110 | 99 | 2500 | 2025-07-01 | 01JUL2025 | 23923 | 6 | 1.12903 | High | SOUTH | south | South-High |
| 20 | East | 220 | 150 | 320 | 0.02 | 120 | 91 | 2700 | 2025-08-01 | 01AUG2025 | 23954 | 7 | 1.15625 | High | EAST | east | East-High |
| 21 | West | 150 | 160 | 330 | . | 130 | 89 | 1700 | 2025-09-01 | 01SEP2025 | 23985 | 8 | 0.93939 | High | WEST | west | West-High |
| 22 | Central | 240 | 100 | 340 | 0.03 | 140 | 93 | 3000 | 01MAR2025 | . | . | . | 1.00000 | High | CENTRAL | central | Central-High |
| 23 | North | 250 | 180 | 350 | 0.01 | 150 | 96 | 3200 | 2025-11-01 | 01NOV2025 | 24046 | 10 | 1.22857 | High | NORTH | north | North-High |
| 24 | South | 260 | 190 | 360 | 0.05 | 160 | 98 | 3500 | 2025-12-01 | 01DEC2025 | 24076 | 11 | 1.25000 | High | SOUTH | south | South-High |
13. Fraud Detection Macro
%macro fraud_check;
data fraud_flag;
set grid_copy;
if Backup_Generation > 100 and Fees > 2000 then Fraud = "Yes";
else Fraud = "No";
run;
proc print data=fraud_flag;
run;
%mend;
%fraud_check;
OUTPUT:
| Obs | Grid_Region | Wind_Output | Solar_Output | Load_Demand | Frequency_Deviation | Backup_Generation | Percentage | Fees | Date | SAS_Date | Month_Start | Months_Diff | Utilization | Util_Class | Fraud |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | Central | 180 | 120 | 290 | 0.01 | 90 | 95 | 1800 | 2025-05-01 | 01MAY2025 | 23862 | 4 | 1.03448 | High | No |
| 2 | Central | 240 | 100 | 340 | 0.03 | 140 | 93 | 3000 | 01MAR2025 | . | . | . | 1.00000 | High | Yes |
| 3 | East | 150 | 100 | 270 | 0.05 | 70 | 88 | 1700 | 01MAR2025 | . | . | . | 0.92593 | High | No |
| 4 | East | 220 | 150 | 320 | 0.02 | 120 | 91 | 2700 | 2025-08-01 | 01AUG2025 | 23954 | 7 | 1.15625 | High | Yes |
| 5 | North | 120 | 80 | 250 | 0.02 | 50 | 90 | 1000 | 2025-01-01 | 01JAN2025 | 23742 | 0 | 0.80000 | Low | No |
| 6 | North | 200 | 130 | 300 | 0.04 | 100 | 97 | 2200 | 2025-06-01 | 01JUN2025 | 23893 | 5 | 1.10000 | High | No |
| 7 | North | 250 | 180 | 350 | 0.01 | 150 | 96 | 3200 | 2025-11-01 | 01NOV2025 | 24046 | 10 | 1.22857 | High | Yes |
| 8 | South | 140 | 90 | 260 | -0.03 | 60 | 85 | 2000 | 2025-02-01 | 01FEB2025 | 23773 | 1 | 0.88462 | Medium | No |
| 9 | South | 210 | 140 | 310 | 0.06 | 110 | 99 | 2500 | 2025-07-01 | 01JUL2025 | 23923 | 6 | 1.12903 | High | Yes |
| 10 | South | 260 | 190 | 360 | 0.05 | 160 | 98 | 3500 | 2025-12-01 | 01DEC2025 | 24076 | 11 | 1.25000 | High | Yes |
| 11 | West | 160 | 100 | 280 | . | 80 | 92 | 1500 | 2025-04-01 | 01APR2025 | 23832 | 3 | 0.92857 | High | No |
| 12 | West | 150 | 160 | 330 | . | 130 | 89 | 1700 | 2025-09-01 | 01SEP2025 | 23985 | 8 | 0.93939 | High | No |
| 13 | North | 120 | 80 | 250 | 0.02 | 50 | 90 | 1000 | 2025-01-01 | 01JAN2025 | 23742 | 0 | 0.80000 | Low | No |
| 14 | South | 140 | 90 | 260 | -0.03 | 60 | 85 | 2000 | 2025-02-01 | 01FEB2025 | 23773 | 1 | 0.88462 | Medium | No |
| 15 | East | 150 | 100 | 270 | 0.05 | 70 | 88 | 1700 | 01MAR2025 | . | . | . | 0.92593 | High | No |
| 16 | West | 160 | 100 | 280 | . | 80 | 92 | 1500 | 2025-04-01 | 01APR2025 | 23832 | 3 | 0.92857 | High | No |
| 17 | Central | 180 | 120 | 290 | 0.01 | 90 | 95 | 1800 | 2025-05-01 | 01MAY2025 | 23862 | 4 | 1.03448 | High | No |
| 18 | North | 200 | 130 | 300 | 0.04 | 100 | 97 | 2200 | 2025-06-01 | 01JUN2025 | 23893 | 5 | 1.10000 | High | No |
| 19 | South | 210 | 140 | 310 | 0.06 | 110 | 99 | 2500 | 2025-07-01 | 01JUL2025 | 23923 | 6 | 1.12903 | High | Yes |
| 20 | East | 220 | 150 | 320 | 0.02 | 120 | 91 | 2700 | 2025-08-01 | 01AUG2025 | 23954 | 7 | 1.15625 | High | Yes |
| 21 | West | 150 | 160 | 330 | . | 130 | 89 | 1700 | 2025-09-01 | 01SEP2025 | 23985 | 8 | 0.93939 | High | No |
| 22 | Central | 240 | 100 | 340 | 0.03 | 140 | 93 | 3000 | 01MAR2025 | . | . | . | 1.00000 | High | Yes |
| 23 | North | 250 | 180 | 350 | 0.01 | 150 | 96 | 3200 | 2025-11-01 | 01NOV2025 | 24046 | 10 | 1.22857 | High | Yes |
| 24 | South | 260 | 190 | 360 | 0.05 | 160 | 98 | 3500 | 2025-12-01 | 01DEC2025 | 24076 | 11 | 1.25000 | High | Yes |
14. Intentional Error Fix (Example)
data fraud_flag;
set grid_copy;
length flag $10.;
if 1500< Fees < 2000 then flag="Error";
run;
proc print data=fraud_flag;
run;
OUTPUT:
| Obs | Grid_Region | Wind_Output | Solar_Output | Load_Demand | Frequency_Deviation | Backup_Generation | Percentage | Fees | Date | SAS_Date | Month_Start | Months_Diff | Utilization | Util_Class | flag |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | Central | 180 | 120 | 290 | 0.01 | 90 | 95 | 1800 | 2025-05-01 | 01MAY2025 | 23862 | 4 | 1.03448 | High | Error |
| 2 | Central | 240 | 100 | 340 | 0.03 | 140 | 93 | 3000 | 01MAR2025 | . | . | . | 1.00000 | High | |
| 3 | East | 150 | 100 | 270 | 0.05 | 70 | 88 | 1700 | 01MAR2025 | . | . | . | 0.92593 | High | Error |
| 4 | East | 220 | 150 | 320 | 0.02 | 120 | 91 | 2700 | 2025-08-01 | 01AUG2025 | 23954 | 7 | 1.15625 | High | |
| 5 | North | 120 | 80 | 250 | 0.02 | 50 | 90 | 1000 | 2025-01-01 | 01JAN2025 | 23742 | 0 | 0.80000 | Low | |
| 6 | North | 200 | 130 | 300 | 0.04 | 100 | 97 | 2200 | 2025-06-01 | 01JUN2025 | 23893 | 5 | 1.10000 | High | |
| 7 | North | 250 | 180 | 350 | 0.01 | 150 | 96 | 3200 | 2025-11-01 | 01NOV2025 | 24046 | 10 | 1.22857 | High | |
| 8 | South | 140 | 90 | 260 | -0.03 | 60 | 85 | 2000 | 2025-02-01 | 01FEB2025 | 23773 | 1 | 0.88462 | Medium | |
| 9 | South | 210 | 140 | 310 | 0.06 | 110 | 99 | 2500 | 2025-07-01 | 01JUL2025 | 23923 | 6 | 1.12903 | High | |
| 10 | South | 260 | 190 | 360 | 0.05 | 160 | 98 | 3500 | 2025-12-01 | 01DEC2025 | 24076 | 11 | 1.25000 | High | |
| 11 | West | 160 | 100 | 280 | . | 80 | 92 | 1500 | 2025-04-01 | 01APR2025 | 23832 | 3 | 0.92857 | High | |
| 12 | West | 150 | 160 | 330 | . | 130 | 89 | 1700 | 2025-09-01 | 01SEP2025 | 23985 | 8 | 0.93939 | High | Error |
| 13 | North | 120 | 80 | 250 | 0.02 | 50 | 90 | 1000 | 2025-01-01 | 01JAN2025 | 23742 | 0 | 0.80000 | Low | |
| 14 | South | 140 | 90 | 260 | -0.03 | 60 | 85 | 2000 | 2025-02-01 | 01FEB2025 | 23773 | 1 | 0.88462 | Medium | |
| 15 | East | 150 | 100 | 270 | 0.05 | 70 | 88 | 1700 | 01MAR2025 | . | . | . | 0.92593 | High | Error |
| 16 | West | 160 | 100 | 280 | . | 80 | 92 | 1500 | 2025-04-01 | 01APR2025 | 23832 | 3 | 0.92857 | High | |
| 17 | Central | 180 | 120 | 290 | 0.01 | 90 | 95 | 1800 | 2025-05-01 | 01MAY2025 | 23862 | 4 | 1.03448 | High | Error |
| 18 | North | 200 | 130 | 300 | 0.04 | 100 | 97 | 2200 | 2025-06-01 | 01JUN2025 | 23893 | 5 | 1.10000 | High | |
| 19 | South | 210 | 140 | 310 | 0.06 | 110 | 99 | 2500 | 2025-07-01 | 01JUL2025 | 23923 | 6 | 1.12903 | High | |
| 20 | East | 220 | 150 | 320 | 0.02 | 120 | 91 | 2700 | 2025-08-01 | 01AUG2025 | 23954 | 7 | 1.15625 | High | |
| 21 | West | 150 | 160 | 330 | . | 130 | 89 | 1700 | 2025-09-01 | 01SEP2025 | 23985 | 8 | 0.93939 | High | Error |
| 22 | Central | 240 | 100 | 340 | 0.03 | 140 | 93 | 3000 | 01MAR2025 | . | . | . | 1.00000 | High | |
| 23 | North | 250 | 180 | 350 | 0.01 | 150 | 96 | 3200 | 2025-11-01 | 01NOV2025 | 24046 | 10 | 1.22857 | High | |
| 24 | South | 260 | 190 | 360 | 0.05 | 160 | 98 | 3500 | 2025-12-01 | 01DEC2025 | 24076 | 11 | 1.25000 | High |
data fraud_flag;
set grid_copy;
if Fees < 2000 and Fees > 1500 then flag="Valid";
else flag = "Error";
run;
proc print data=fraud_flag;
run;
OUTPUT:
| Obs | Grid_Region | Wind_Output | Solar_Output | Load_Demand | Frequency_Deviation | Backup_Generation | Percentage | Fees | Date | SAS_Date | Month_Start | Months_Diff | Utilization | Util_Class | flag |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | Central | 180 | 120 | 290 | 0.01 | 90 | 95 | 1800 | 2025-05-01 | 01MAY2025 | 23862 | 4 | 1.03448 | High | Valid |
| 2 | Central | 240 | 100 | 340 | 0.03 | 140 | 93 | 3000 | 01MAR2025 | . | . | . | 1.00000 | High | Error |
| 3 | East | 150 | 100 | 270 | 0.05 | 70 | 88 | 1700 | 01MAR2025 | . | . | . | 0.92593 | High | Valid |
| 4 | East | 220 | 150 | 320 | 0.02 | 120 | 91 | 2700 | 2025-08-01 | 01AUG2025 | 23954 | 7 | 1.15625 | High | Error |
| 5 | North | 120 | 80 | 250 | 0.02 | 50 | 90 | 1000 | 2025-01-01 | 01JAN2025 | 23742 | 0 | 0.80000 | Low | Error |
| 6 | North | 200 | 130 | 300 | 0.04 | 100 | 97 | 2200 | 2025-06-01 | 01JUN2025 | 23893 | 5 | 1.10000 | High | Error |
| 7 | North | 250 | 180 | 350 | 0.01 | 150 | 96 | 3200 | 2025-11-01 | 01NOV2025 | 24046 | 10 | 1.22857 | High | Error |
| 8 | South | 140 | 90 | 260 | -0.03 | 60 | 85 | 2000 | 2025-02-01 | 01FEB2025 | 23773 | 1 | 0.88462 | Medium | Error |
| 9 | South | 210 | 140 | 310 | 0.06 | 110 | 99 | 2500 | 2025-07-01 | 01JUL2025 | 23923 | 6 | 1.12903 | High | Error |
| 10 | South | 260 | 190 | 360 | 0.05 | 160 | 98 | 3500 | 2025-12-01 | 01DEC2025 | 24076 | 11 | 1.25000 | High | Error |
| 11 | West | 160 | 100 | 280 | . | 80 | 92 | 1500 | 2025-04-01 | 01APR2025 | 23832 | 3 | 0.92857 | High | Error |
| 12 | West | 150 | 160 | 330 | . | 130 | 89 | 1700 | 2025-09-01 | 01SEP2025 | 23985 | 8 | 0.93939 | High | Valid |
| 13 | North | 120 | 80 | 250 | 0.02 | 50 | 90 | 1000 | 2025-01-01 | 01JAN2025 | 23742 | 0 | 0.80000 | Low | Error |
| 14 | South | 140 | 90 | 260 | -0.03 | 60 | 85 | 2000 | 2025-02-01 | 01FEB2025 | 23773 | 1 | 0.88462 | Medium | Error |
| 15 | East | 150 | 100 | 270 | 0.05 | 70 | 88 | 1700 | 01MAR2025 | . | . | . | 0.92593 | High | Valid |
| 16 | West | 160 | 100 | 280 | . | 80 | 92 | 1500 | 2025-04-01 | 01APR2025 | 23832 | 3 | 0.92857 | High | Error |
| 17 | Central | 180 | 120 | 290 | 0.01 | 90 | 95 | 1800 | 2025-05-01 | 01MAY2025 | 23862 | 4 | 1.03448 | High | Valid |
| 18 | North | 200 | 130 | 300 | 0.04 | 100 | 97 | 2200 | 2025-06-01 | 01JUN2025 | 23893 | 5 | 1.10000 | High | Error |
| 19 | South | 210 | 140 | 310 | 0.06 | 110 | 99 | 2500 | 2025-07-01 | 01JUL2025 | 23923 | 6 | 1.12903 | High | Error |
| 20 | East | 220 | 150 | 320 | 0.02 | 120 | 91 | 2700 | 2025-08-01 | 01AUG2025 | 23954 | 7 | 1.15625 | High | Error |
| 21 | West | 150 | 160 | 330 | . | 130 | 89 | 1700 | 2025-09-01 | 01SEP2025 | 23985 | 8 | 0.93939 | High | Valid |
| 22 | Central | 240 | 100 | 340 | 0.03 | 140 | 93 | 3000 | 01MAR2025 | . | . | . | 1.00000 | High | Error |
| 23 | North | 250 | 180 | 350 | 0.01 | 150 | 96 | 3200 | 2025-11-01 | 01NOV2025 | 24046 | 10 | 1.22857 | High | Error |
| 24 | South | 260 | 190 | 360 | 0.05 | 160 | 98 | 3500 | 2025-12-01 | 01DEC2025 | 24076 | 11 | 1.25000 | High | Error |
15. PROC DATASETS DELETE
proc datasets library=work;
delete grid_raw grid_copy;
run;
quit;
LOG:
16. Final Corrected Dataset Code
data final_grid;
set grid_clean;
Total_Output = sum(Wind_Output, Solar_Output);
Utilization = Total_Output / Load_Demand;
length Util_Class $10. Risk $10.;
if Utilization > 0.9 then Util_Class="High";
else if Utilization > 0.8 then Util_Class="Medium";
else Util_Class="Low";
if Backup_Generation > 100 then Risk="High";
else Risk="Normal";
if not missing(Date) and Date ne "wrong" then
Clean_Date = input(Date, yymmdd10.);
else
Clean_Date = .;
format Clean_Date date9.;
if missing(Clean_Date) then Clean_Date="01MAR2025"d;
run;
proc print data=final_grid;
run;
OUTPUT:
| Obs | Grid_Region | Wind_Output | Solar_Output | Load_Demand | Frequency_Deviation | Backup_Generation | Percentage | Fees | Date | Total_Output | Utilization | Util_Class | Risk | Clean_Date |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | North | 120 | 80 | 250 | 0.02 | 50 | 90 | 1000 | 2025-01-01 | 200 | 0.80000 | Low | Normal | 01JAN2025 |
| 2 | South | 140 | 90 | 260 | -0.03 | 60 | 85 | 2000 | 2025-02-01 | 230 | 0.88462 | Medium | Normal | 01FEB2025 |
| 3 | East | 150 | 100 | 270 | 0.05 | 70 | 88 | 1700 | wrong | 250 | 0.92593 | High | Normal | 01MAR2025 |
| 4 | West | 160 | 100 | 280 | . | 80 | 92 | 1500 | 2025-04-01 | 260 | 0.92857 | High | Normal | 01APR2025 |
| 5 | Central | 180 | 120 | 290 | 0.01 | 90 | 95 | 1800 | 2025-05-01 | 300 | 1.03448 | High | Normal | 01MAY2025 |
| 6 | North | 200 | 130 | 300 | 0.04 | 100 | 97 | 2200 | 2025-06-01 | 330 | 1.10000 | High | Normal | 01JUN2025 |
| 7 | South | 210 | 140 | 310 | 0.06 | 110 | 99 | 2500 | 2025-07-01 | 350 | 1.12903 | High | High | 01JUL2025 |
| 8 | East | 220 | 150 | 320 | 0.02 | 120 | 91 | 2700 | 2025-08-01 | 370 | 1.15625 | High | High | 01AUG2025 |
| 9 | West | 150 | 160 | 330 | . | 130 | 89 | 1700 | 2025-09-01 | 310 | 0.93939 | High | High | 01SEP2025 |
| 10 | Central | 240 | 100 | 340 | 0.03 | 140 | 93 | 3000 | wrong | 340 | 1.00000 | High | High | 01MAR2025 |
| 11 | North | 250 | 180 | 350 | 0.01 | 150 | 96 | 3200 | 2025-11-01 | 430 | 1.22857 | High | High | 01NOV2025 |
| 12 | South | 260 | 190 | 360 | 0.05 | 160 | 98 | 3500 | 2025-12-01 | 450 | 1.25000 | High | High | 01DEC2025 |
/*Alternate*/
data final_grid;
set grid_clean;
Total_Output = sum(Wind_Output, Solar_Output);
Utilization = Total_Output / Load_Demand;
if Utilization > 0.9 then Util_Class="High";
else if Utilization > 0.8 then Util_Class="Medium";
else Util_Class="Low";
if Backup_Generation > 100 then Risk="High";
else Risk="Normal";
if not missing(Date) and Date ne "wrong" then
Clean_Date = input(Date, yymmdd10.);
else
Clean_Date = .;
format Clean_Date date9.;
if missing(Clean_Date) then Clean_Date = mdy(3,1,2025);
run;
proc print data=final_grid;
run;
OUTPUT:
| Obs | Grid_Region | Wind_Output | Solar_Output | Load_Demand | Frequency_Deviation | Backup_Generation | Percentage | Fees | Date | Total_Output | Utilization | Util_Class | Risk | Clean_Date |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | North | 120 | 80 | 250 | 0.02 | 50 | 90 | 1000 | 2025-01-01 | 200 | 0.80000 | Low | Normal | 01JAN2025 |
| 2 | South | 140 | 90 | 260 | -0.03 | 60 | 85 | 2000 | 2025-02-01 | 230 | 0.88462 | Medium | Normal | 01FEB2025 |
| 3 | East | 150 | 100 | 270 | 0.05 | 70 | 88 | 1700 | wrong | 250 | 0.92593 | High | Normal | 01MAR2025 |
| 4 | West | 160 | 100 | 280 | . | 80 | 92 | 1500 | 2025-04-01 | 260 | 0.92857 | High | Normal | 01APR2025 |
| 5 | Central | 180 | 120 | 290 | 0.01 | 90 | 95 | 1800 | 2025-05-01 | 300 | 1.03448 | High | Normal | 01MAY2025 |
| 6 | North | 200 | 130 | 300 | 0.04 | 100 | 97 | 2200 | 2025-06-01 | 330 | 1.10000 | High | Normal | 01JUN2025 |
| 7 | South | 210 | 140 | 310 | 0.06 | 110 | 99 | 2500 | 2025-07-01 | 350 | 1.12903 | High | High | 01JUL2025 |
| 8 | East | 220 | 150 | 320 | 0.02 | 120 | 91 | 2700 | 2025-08-01 | 370 | 1.15625 | High | High | 01AUG2025 |
| 9 | West | 150 | 160 | 330 | . | 130 | 89 | 1700 | 2025-09-01 | 310 | 0.93939 | High | High | 01SEP2025 |
| 10 | Central | 240 | 100 | 340 | 0.03 | 140 | 93 | 3000 | wrong | 340 | 1.00000 | High | High | 01MAR2025 |
| 11 | North | 250 | 180 | 350 | 0.01 | 150 | 96 | 3200 | 2025-11-01 | 430 | 1.22857 | High | High | 01NOV2025 |
| 12 | South | 260 | 190 | 360 | 0.05 | 160 | 98 | 3500 | 2025-12-01 | 450 | 1.25000 | High | High | 01DEC2025 |
|
Variable |
Type |
Example |
|
Clean_Date |
Numeric |
23375 |
|
Display |
Format |
01MAR2025 |
SAS Date Storage
· Dates are numeric
values
· Formats only change display
|
Method |
Example |
|
Date Literal |
'01MAR2025'd |
|
MDY |
mdy(3,1,2025) |
Key Highlights
· Data
cleaning using COALESCENCE
· Date
derivation using INTNX, INTCK
· Fraud
detection using macros
· Data
reshaping with PROC TRANSPOSE
· Dataset
lifecycle using PROC DATASETS
· Handling missing values and logic errors
20 Key Points About The Project
1. Designed a grid stability dataset capturing renewable energy output, demand, and system deviations.2. Integrated both R and SAS environments to ensure cross-platform data consistency and validation.
3. Introduced intentional data quality issues such as missing values, invalid dates, and inconsistent formats to simulate real-world scenarios.
4. Applied data cleaning techniques using COALESCE to handle missing numeric values efficiently.
5. Standardized categorical variables using character functions like PROPCASE, STRIP, UPCASE, and LOWCASE.
6. Implemented date conversion logic using INPUT, MDY, INTNX, and INTCK for accurate temporal analysis.
7. Developed robust error-handling logic to safely process invalid entries like "wrong" dates without breaking execution.
8. Derived Total_Output and Utilization metrics to measure grid efficiency and performance.
9. Created utilization classification (High/Medium/Low) using conditional logic for operational insights.
10. Performed data transformation using SET, MERGE, and APPEND to combine and extend datasets.
11. Reshaped data using PROC TRANSPOSE for reporting and analytical flexibility.
12. Leveraged numeric functions like SUM and MEAN for aggregated calculations.
13. Built character-based composite variables using CAT and CATX for better reporting readability.
14. Designed a fraud detection framework using SAS MACROS to dynamically flag suspicious patterns.
15. Identified fraud scenarios such as abnormally high backup generation and inflated operational fees.
16. Ensured complete classification logic by covering all edge cases including missing and extreme values.
17. Used PROC DATASETS DELETE for efficient dataset lifecycle management and workspace cleanup.
18. Debugged common SAS errors including type mismatches, invalid INPUT conversions, and logical condition flaws.
19. Applied defensive programming techniques to make the code robust, scalable, and production-ready.
Summary
This
project demonstrates how SAS can efficiently handle real-world grid stability
data. By integrating multiple techniques such as data cleaning, transformation,
merging, and fraud detection, we created a robust pipeline. Intentional errors
were introduced to simulate real scenarios, and each error was carefully
corrected with explanations.
The use
of macros improves scalability, while functions like INTNX, COALESCE, and CATX
enhance data quality. This project highlights how SAS is not just a programming
language but a complete data engineering tool capable of ensuring accuracy and
reliability in critical systems like energy grids.
Conclusion
In this
project, we worked on grid stability data and learned how to handle real-life
data problems step by step. First, we created raw datasets with errors to
understand how messy real data can be. Then we cleaned the data using SAS
functions and fixed missing values, incorrect formats, and logical mistakes. We
also learned how to work with dates, calculate utilization, and classify data
into meaningful categories.
One of
the most important parts was detecting fraud using macros, which is very useful
in real-world scenarios. We also practiced using different SAS techniques like
merge, append, transpose, and dataset deletion. Overall, this project gives a
complete understanding of how SAS can be used in real-time data processing and
analysis. It is very helpful for interviews and practical work.
SAS INTERVIEW QUESTIONS
1. What is the difference between
LENGTH and FORMAT in SAS?
Answer:
LENGTHdefines how much storage space a variable occupies.FORMATcontrols how the value is displayed, not stored.
LENGTH affects data structure, FORMAT affects only appearance.
2. What is the difference between
PROC SORT with NODUPKEY and NODUP?
Answer:
NODUPKEYremoves duplicates based on BY variables only.NODUPremoves completely identical observations.
UseNODUPKEYfor key-based deduplication.
3. What is the role of RETAIN
statement in SAS?
Answer:
- By default, SAS resets
variables each iteration.
RETAINkeeps the previous value across rows.
Commonly used for cumulative sums and carrying forward values.
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
About the Author:
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
About the Author:
SAS Learning Hub is a data analytics and SAS programming platform focused on clinical, financial, and real-world data analysis. The content is created by professionals with academic training in Pharmaceutics and hands-on experience in Base SAS, PROC SQL, Macros, SDTM, and ADaM, providing practical and industry-relevant SAS learning resources.
Disclaimer:
The datasets and analysis in this article are created for educational and demonstration purposes only. They do not represent GRID STABILITY data.
Our Mission:
This blog provides industry-focused SAS programming tutorials and analytics projects covering finance, healthcare, and technology.
This project is suitable for:
· Students learning SAS
· Data analysts building portfolios
· Professionals preparing for SAS interviews
· Bloggers writing about analytics
· Clinical SAS Programmer
· Research Data Analyst
· Regulatory Data Validator
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Comments
Post a Comment