434.Can Advanced SAS Programming Detect, Clean, and Optimize Grid Stability Data While Identifying Fraud Patterns?

Advanced SAS Intelligence for Grid Stability Optimization and Fraud Detection

--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

HERE IN THIS PROJECT WE USED THESE SAS STATEMENTS —DATA | SET | MERGE | INPUT | DATALINES | IF - ELSE |  PROC SORT | PROC SQL | PROC APPEND | PROC TRANSPOSE | PROC DATASETS | RUN | QUIT | %MACRO | %MEND | FUNCTIONS ETC | R LANGUAGE

--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Introduction

Power grid stability is one of the most critical components in modern energy systems. With the increasing integration of renewable sources like wind and solar, maintaining grid balance has become complex. Variations in load demand, frequency deviation, and backup generation directly impact reliability.

In this project, we simulate a Grid Stability Dataset using both R and SAS, intentionally introduce errors, detect and fix them using advanced SAS techniques, and implement fraud detection logic using macros. We also apply real-world transformations like date handling, utilization classification, merging, transposing, and dataset optimization.

This project is designed for:

  • SAS Programmer Interviews
  • Real-world Data Cleaning Scenarios
  • Clinical/Non-clinical Data Transformation Practice

Table of Contents

  1. Business Context
  2. Raw Dataset Creation (R & SAS)
  3. Intentional Errors in Dataset
  4. Error Identification & Fixing
  5. Data Cleaning using SAS Functions
  6. Date Handling (MDY, INTCK, INTNX)
  7. Utilization Classification
  8. Data Transformation (SET, MERGE, APPEND)
  9. PROC TRANSPOSE Usage
  10. Fraud Detection using MACROS
  11. Numeric & Character Functions
  12. PROC DATASETS DELETE
  13. Final Clean Dataset
  14. Key Highlights
  15. 20 Key Points About The Project
  16. Summary & Conclusion

Business Context

Electric grid operators monitor:

  • Renewable output (Wind/Solar)
  • Demand fluctuations
  • Frequency stability
  • Backup generation usage

Fraud scenarios:

  • Inflated backup generation costs
  • Artificial frequency manipulation
  • Fake utilization reporting 

1. Raw Dataset Creation in R (With Intentional Errors)

library(dplyr)

grid_data <- data.frame(

  Grid_Region = c("north","south","east","west","central",

                  "north","south","east","west","central",

                  "north","south"),

  Wind_Output = c(120,140,NA,160,180,200,210,220,NA,240,250,260),

  Solar_Output = c(80,90,100,NA,120,130,140,150,160,NA,180,190),

  Load_Demand = c(250,260,270,280,290,300,310,320,330,340,350,360),

  Frequency_Deviation = c(0.02,-0.03,0.05,NA,0.01,0.04,0.06,0.02,NA,0.03,0.01,0.05),

  Backup_Generation = c(50,60,70,80,90,100,110,120,130,140,150,160),

  Percentage = c(90,85,88,92,95,97,99,91,89,93,96,98),

  Fees = c(1000,2000,NA,1500,1800,2200,2500,2700,NA,3000,3200,3500),

  Date = c("2025-01-01","2025-02-01","wrong","2025-04-01","2025-05-01",

           "2025-06-01","2025-07-01","2025-08-01","2025-09-01","wrong",

           "2025-11-01","2025-12-01")

)

grid_data

OUTPUT:

 

Grid_Region

Wind_Output

Solar_Output

Load_Demand

Frequency_Deviation

Backup_Generation

Percentage

Fees

Date

1

north

120

80

250

0.02

50

90

1000

01-01-2025

2

south

140

90

260

-0.03

60

85

2000

01-02-2025

3

east

NA

100

270

0.05

70

88

NA

wrong

4

west

160

NA

280

NA

80

92

1500

01-04-2025

5

central

180

120

290

0.01

90

95

1800

01-05-2025

6

north

200

130

300

0.04

100

97

2200

01-06-2025

7

south

210

140

310

0.06

110

99

2500

01-07-2025

8

east

220

150

320

0.02

120

91

2700

01-08-2025

9

west

NA

160

330

NA

130

89

NA

01-09-2025

10

central

240

NA

340

0.03

140

93

3000

wrong

11

north

250

180

350

0.01

150

96

3200

01-11-2025

12

south

260

190

360

0.05

160

98

3500

01-12-2025

Errors Introduced

·  Missing values (NA)

·  Wrong date format (“wrong”)

·  Inconsistent casing (north vs North)

2. Raw Dataset Creation in SAS (With Intentional Errors)

data grid_raw;

 input Grid_Region $ Wind_Output Solar_Output Load_Demand Frequency_Deviation

       Backup_Generation Percentage Fees Date :$20.;

 datalines;

 north 120 80 250 0.02 50 90 1000 2025-01-01

 south 140 90 260 -0.03 60 85 2000 2025-02-01

 east . 100 270 0.05 70 88 . wrong

 west 160 . 280 . 80 92 1500 2025-04-01

 central 180 120 290 0.01 90 95 1800 2025-05-01

 north 200 130 300 0.04 100 97 2200 2025-06-01

 south 210 140 310 0.06 110 99 2500 2025-07-01

 east 220 150 320 0.02 120 91 2700 2025-08-01

 west . 160 330 . 130 89 . 2025-09-01

 central 240 . 340 0.03 140 93 3000 wrong

 north 250 180 350 0.01 150 96 3200 2025-11-01

 south 260 190 360 0.05 160 98 3500 2025-12-01

;

run;

proc print data=grid_raw;

run;

OUTPUT:

ObsGrid_RegionWind_OutputSolar_OutputLoad_DemandFrequency_DeviationBackup_GenerationPercentageFeesDate
1north120802500.02509010002025-01-01
2south14090260-0.03608520002025-02-01
3east.1002700.057088.wrong
4west160.280.809215002025-04-01
5central1801202900.01909518002025-05-01
6north2001303000.041009722002025-06-01
7south2101403100.061109925002025-07-01
8east2201503200.021209127002025-08-01
9west.160330.13089.2025-09-01
10central240.3400.03140933000wrong
11north2501803500.011509632002025-11-01
12south2601903600.051609835002025-12-01

3. Error Identification

Issues:

·  Missing values (.)

·  Invalid date (“wrong”)

·  Lowercase region names

·  Null fees

·  Missing solar/wind output

4. Data Cleaning Using SAS Functions

data grid_clean;

set grid_raw;

Grid_Region = propcase(strip(Grid_Region));

Wind_Output = coalesce(Wind_Output, 150);

Solar_Output = coalesce(Solar_Output, 100);

Fees = coalesce(Fees, 1700);

run;

proc print data=grid_clean;

run;

OUTPUT:

ObsGrid_RegionWind_OutputSolar_OutputLoad_DemandFrequency_DeviationBackup_GenerationPercentageFeesDate
1North120802500.02509010002025-01-01
2South14090260-0.03608520002025-02-01
3East1501002700.0570881700wrong
4West160100280.809215002025-04-01
5Central1801202900.01909518002025-05-01
6North2001303000.041009722002025-06-01
7South2101403100.061109925002025-07-01
8East2201503200.021209127002025-08-01
9West150160330.1308917002025-09-01
10Central2401003400.03140933000wrong
11North2501803500.011509632002025-11-01
12South2601903600.051609835002025-12-01

Why Used

·  STRIP → removes spaces

·  PROPCASE → standard format

·  COALESCE → handles missing values

5. Date Handling

data grid_dates;

set grid_clean;

SAS_Date = input(Date, yymmdd10.);

if Date = "wrong" then Date = "01MAR2025";

format SAS_Date date9.;

Month_Start = intnx('month', SAS_Date, 0, 'b');

Months_Diff = intck('month', '01JAN2025'd, SAS_Date);

run;

proc print data=grid_dates;

run;

OUTPUT:

ObsGrid_RegionWind_OutputSolar_OutputLoad_DemandFrequency_DeviationBackup_GenerationPercentageFeesDateSAS_DateMonth_StartMonths_Diff
1North120802500.02509010002025-01-0101JAN2025237420
2South14090260-0.03608520002025-02-0101FEB2025237731
3East1501002700.057088170001MAR2025...
4West160100280.809215002025-04-0101APR2025238323
5Central1801202900.01909518002025-05-0101MAY2025238624
6North2001303000.041009722002025-06-0101JUN2025238935
7South2101403100.061109925002025-07-0101JUL2025239236
8East2201503200.021209127002025-08-0101AUG2025239547
9West150160330.1308917002025-09-0101SEP2025239858
10Central2401003400.0314093300001MAR2025...
11North2501803500.011509632002025-11-0101NOV20252404610
12South2601903600.051609835002025-12-0101DEC20252407611

Functions

·  MDY() → create dates

·  INTNX() → shift intervals

·  INTCK() → calculate differences

6. Utilization Classification

data grid_util;

set grid_dates;

Utilization = (Wind_Output + Solar_Output) / Load_Demand;

length Util_Class $10.;

if Utilization > 0.9 then Util_Class = "High";

else if Utilization > 0.8 then Util_Class = "Medium";

else Util_Class = "Low";

run;

proc print data=grid_util;

run;

OUTPUT:

ObsGrid_RegionWind_OutputSolar_OutputLoad_DemandFrequency_DeviationBackup_GenerationPercentageFeesDateSAS_DateMonth_StartMonths_DiffUtilizationUtil_Class
1North120802500.02509010002025-01-0101JAN20252374200.80000Low
2South14090260-0.03608520002025-02-0101FEB20252377310.88462Medium
3East1501002700.057088170001MAR2025...0.92593High
4West160100280.809215002025-04-0101APR20252383230.92857High
5Central1801202900.01909518002025-05-0101MAY20252386241.03448High
6North2001303000.041009722002025-06-0101JUN20252389351.10000High
7South2101403100.061109925002025-07-0101JUL20252392361.12903High
8East2201503200.021209127002025-08-0101AUG20252395471.15625High
9West150160330.1308917002025-09-0101SEP20252398580.93939High
10Central2401003400.0314093300001MAR2025...1.00000High
11North2501803500.011509632002025-11-0101NOV202524046101.22857High
12South2601903600.051609835002025-12-0101DEC202524076111.25000High

7. DATA STEP (SET)

data grid_copy;

set grid_util;

run;

proc print data=grid_copy;

run;

OUTPUT:

ObsGrid_RegionWind_OutputSolar_OutputLoad_DemandFrequency_DeviationBackup_GenerationPercentageFeesDateSAS_DateMonth_StartMonths_DiffUtilizationUtil_Class
1North120802500.02509010002025-01-0101JAN20252374200.80000Low
2South14090260-0.03608520002025-02-0101FEB20252377310.88462Medium
3East1501002700.057088170001MAR2025...0.92593High
4West160100280.809215002025-04-0101APR20252383230.92857High
5Central1801202900.01909518002025-05-0101MAY20252386241.03448High
6North2001303000.041009722002025-06-0101JUN20252389351.10000High
7South2101403100.061109925002025-07-0101JUL20252392361.12903High
8East2201503200.021209127002025-08-0101AUG20252395471.15625High
9West150160330.1308917002025-09-0101SEP20252398580.93939High
10Central2401003400.0314093300001MAR2025...1.00000High
11North2501803500.011509632002025-11-0101NOV202524046101.22857High
12South2601903600.051609835002025-12-0101DEC202524076111.25000High

Why

·  Creates duplicate dataset

·  Used for pipeline processing

8. MERGE Example

data fees_update;

input Grid_Region $ Extra_Fees;

datalines;

North 100

South 200

West 50

East 25

Central 0

;

run;

proc print data=fees_update;

run;

OUTPUT:

ObsGrid_RegionExtra_Fees
1North100
2South200
3West50
4East25
5Central0

proc sort data=grid_copy;by Grid_Region;run;

proc print data=grid_copy;

run;

OUTPUT:

ObsGrid_RegionWind_OutputSolar_OutputLoad_DemandFrequency_DeviationBackup_GenerationPercentageFeesDateSAS_DateMonth_StartMonths_DiffUtilizationUtil_Class
1Central1801202900.01909518002025-05-0101MAY20252386241.03448High
2Central2401003400.0314093300001MAR2025...1.00000High
3East1501002700.057088170001MAR2025...0.92593High
4East2201503200.021209127002025-08-0101AUG20252395471.15625High
5North120802500.02509010002025-01-0101JAN20252374200.80000Low
6North2001303000.041009722002025-06-0101JUN20252389351.10000High
7North2501803500.011509632002025-11-0101NOV202524046101.22857High
8South14090260-0.03608520002025-02-0101FEB20252377310.88462Medium
9South2101403100.061109925002025-07-0101JUL20252392361.12903High
10South2601903600.051609835002025-12-0101DEC202524076111.25000High
11West160100280.809215002025-04-0101APR20252383230.92857High
12West150160330.1308917002025-09-0101SEP20252398580.93939High

proc sort data=fees_update;by Grid_Region;run;

proc print data=fees_update;

run;

OUTPUT:

ObsGrid_RegionExtra_Fees
1Central0
2East25
3North100
4South200
5West50

data merged_data;

merge grid_copy

      fees_update;

by Grid_Region;

run;

proc print data=merged_data;

run;

OUTPUT:

ObsGrid_RegionWind_OutputSolar_OutputLoad_DemandFrequency_DeviationBackup_GenerationPercentageFeesDateSAS_DateMonth_StartMonths_DiffUtilizationUtil_ClassExtra_Fees
1Central1801202900.01909518002025-05-0101MAY20252386241.03448High0
2Central2401003400.0314093300001MAR2025...1.00000High0
3East1501002700.057088170001MAR2025...0.92593High25
4East2201503200.021209127002025-08-0101AUG20252395471.15625High25
5North120802500.02509010002025-01-0101JAN20252374200.80000Low100
6North2001303000.041009722002025-06-0101JUN20252389351.10000High100
7North2501803500.011509632002025-11-0101NOV202524046101.22857High100
8South14090260-0.03608520002025-02-0101FEB20252377310.88462Medium200
9South2101403100.061109925002025-07-0101JUL20252392361.12903High200
10South2601903600.051609835002025-12-0101DEC202524076111.25000High200
11West160100280.809215002025-04-0101APR20252383230.92857High50
12West150160330.1308917002025-09-0101SEP20252398580.93939High50

9. APPEND

proc append base=grid_copy 

            data=grid_util force;

run;

proc print data=grid_copy;

run;

OUTPUT:

ObsGrid_RegionWind_OutputSolar_OutputLoad_DemandFrequency_DeviationBackup_GenerationPercentageFeesDateSAS_DateMonth_StartMonths_DiffUtilizationUtil_Class
1Central1801202900.01909518002025-05-0101MAY20252386241.03448High
2Central2401003400.0314093300001MAR2025...1.00000High
3East1501002700.057088170001MAR2025...0.92593High
4East2201503200.021209127002025-08-0101AUG20252395471.15625High
5North120802500.02509010002025-01-0101JAN20252374200.80000Low
6North2001303000.041009722002025-06-0101JUN20252389351.10000High
7North2501803500.011509632002025-11-0101NOV202524046101.22857High
8South14090260-0.03608520002025-02-0101FEB20252377310.88462Medium
9South2101403100.061109925002025-07-0101JUL20252392361.12903High
10South2601903600.051609835002025-12-0101DEC202524076111.25000High
11West160100280.809215002025-04-0101APR20252383230.92857High
12West150160330.1308917002025-09-0101SEP20252398580.93939High
13North120802500.02509010002025-01-0101JAN20252374200.80000Low
14South14090260-0.03608520002025-02-0101FEB20252377310.88462Medium
15East1501002700.057088170001MAR2025...0.92593High
16West160100280.809215002025-04-0101APR20252383230.92857High
17Central1801202900.01909518002025-05-0101MAY20252386241.03448High
18North2001303000.041009722002025-06-0101JUN20252389351.10000High
19South2101403100.061109925002025-07-0101JUL20252392361.12903High
20East2201503200.021209127002025-08-0101AUG20252395471.15625High
21West150160330.1308917002025-09-0101SEP20252398580.93939High
22Central2401003400.0314093300001MAR2025...1.00000High
23North2501803500.011509632002025-11-0101NOV202524046101.22857High
24South2601903600.051609835002025-12-0101DEC202524076111.25000High

10. PROC TRANSPOSE

proc transpose data=grid_copy out=grid_trans;

by Grid_Region NotSorted;

var Wind_Output Solar_Output Load_Demand;

run;

proc print data=grid_trans;

run;

OUTPUT:

ObsGrid_Region_NAME_COL1COL2COL3
1CentralWind_Output180240.
2CentralSolar_Output120100.
3CentralLoad_Demand290340.
4EastWind_Output150220.
5EastSolar_Output100150.
6EastLoad_Demand270320.
7NorthWind_Output120200250
8NorthSolar_Output80130180
9NorthLoad_Demand250300350
10SouthWind_Output140210260
11SouthSolar_Output90140190
12SouthLoad_Demand260310360
13WestWind_Output160150.
14WestSolar_Output100160.
15WestLoad_Demand280330.
16NorthWind_Output120..
17NorthSolar_Output80..
18NorthLoad_Demand250..
19SouthWind_Output140..
20SouthSolar_Output90..
21SouthLoad_Demand260..
22EastWind_Output150..
23EastSolar_Output100..
24EastLoad_Demand270..
25WestWind_Output160..
26WestSolar_Output100..
27WestLoad_Demand280..
28CentralWind_Output180..
29CentralSolar_Output120..
30CentralLoad_Demand290..
31NorthWind_Output200..
32NorthSolar_Output130..
33NorthLoad_Demand300..
34SouthWind_Output210..
35SouthSolar_Output140..
36SouthLoad_Demand310..
37EastWind_Output220..
38EastSolar_Output150..
39EastLoad_Demand320..
40WestWind_Output150..
41WestSolar_Output160..
42WestLoad_Demand330..
43CentralWind_Output240..
44CentralSolar_Output100..
45CentralLoad_Demand340..
46NorthWind_Output250..
47NorthSolar_Output180..
48NorthLoad_Demand350..
49SouthWind_Output260..
50SouthSolar_Output190..
51SouthLoad_Demand360..

11. Numeric Functions

data numeric_ops;

set grid_copy;

Total_Output = sum(Wind_Output, Solar_Output);

Avg_Output = mean(Wind_Output, Solar_Output);

run;

proc print data=numeric_ops;

run;

OUTPUT:

ObsGrid_RegionWind_OutputSolar_OutputLoad_DemandFrequency_DeviationBackup_GenerationPercentageFeesDateSAS_DateMonth_StartMonths_DiffUtilizationUtil_ClassTotal_OutputAvg_Output
1Central1801202900.01909518002025-05-0101MAY20252386241.03448High300150
2Central2401003400.0314093300001MAR2025...1.00000High340170
3East1501002700.057088170001MAR2025...0.92593High250125
4East2201503200.021209127002025-08-0101AUG20252395471.15625High370185
5North120802500.02509010002025-01-0101JAN20252374200.80000Low200100
6North2001303000.041009722002025-06-0101JUN20252389351.10000High330165
7North2501803500.011509632002025-11-0101NOV202524046101.22857High430215
8South14090260-0.03608520002025-02-0101FEB20252377310.88462Medium230115
9South2101403100.061109925002025-07-0101JUL20252392361.12903High350175
10South2601903600.051609835002025-12-0101DEC202524076111.25000High450225
11West160100280.809215002025-04-0101APR20252383230.92857High260130
12West150160330.1308917002025-09-0101SEP20252398580.93939High310155
13North120802500.02509010002025-01-0101JAN20252374200.80000Low200100
14South14090260-0.03608520002025-02-0101FEB20252377310.88462Medium230115
15East1501002700.057088170001MAR2025...0.92593High250125
16West160100280.809215002025-04-0101APR20252383230.92857High260130
17Central1801202900.01909518002025-05-0101MAY20252386241.03448High300150
18North2001303000.041009722002025-06-0101JUN20252389351.10000High330165
19South2101403100.061109925002025-07-0101JUL20252392361.12903High350175
20East2201503200.021209127002025-08-0101AUG20252395471.15625High370185
21West150160330.1308917002025-09-0101SEP20252398580.93939High310155
22Central2401003400.0314093300001MAR2025...1.00000High340170
23North2501803500.011509632002025-11-0101NOV202524046101.22857High430215
24South2601903600.051609835002025-12-0101DEC202524076111.25000High450225

12. Character Functions

data char_ops;

set grid_copy;

Region_Upper = upcase(Grid_Region);

Region_Lower = lowcase(Grid_Region);

Region_Concat = catx("-", Grid_Region, Util_Class);

run;

proc print data=char_ops;

run;

OUTPUT:

ObsGrid_RegionWind_OutputSolar_OutputLoad_DemandFrequency_DeviationBackup_GenerationPercentageFeesDateSAS_DateMonth_StartMonths_DiffUtilizationUtil_ClassRegion_UpperRegion_LowerRegion_Concat
1Central1801202900.01909518002025-05-0101MAY20252386241.03448HighCENTRALcentralCentral-High
2Central2401003400.0314093300001MAR2025...1.00000HighCENTRALcentralCentral-High
3East1501002700.057088170001MAR2025...0.92593HighEASTeastEast-High
4East2201503200.021209127002025-08-0101AUG20252395471.15625HighEASTeastEast-High
5North120802500.02509010002025-01-0101JAN20252374200.80000LowNORTHnorthNorth-Low
6North2001303000.041009722002025-06-0101JUN20252389351.10000HighNORTHnorthNorth-High
7North2501803500.011509632002025-11-0101NOV202524046101.22857HighNORTHnorthNorth-High
8South14090260-0.03608520002025-02-0101FEB20252377310.88462MediumSOUTHsouthSouth-Medium
9South2101403100.061109925002025-07-0101JUL20252392361.12903HighSOUTHsouthSouth-High
10South2601903600.051609835002025-12-0101DEC202524076111.25000HighSOUTHsouthSouth-High
11West160100280.809215002025-04-0101APR20252383230.92857HighWESTwestWest-High
12West150160330.1308917002025-09-0101SEP20252398580.93939HighWESTwestWest-High
13North120802500.02509010002025-01-0101JAN20252374200.80000LowNORTHnorthNorth-Low
14South14090260-0.03608520002025-02-0101FEB20252377310.88462MediumSOUTHsouthSouth-Medium
15East1501002700.057088170001MAR2025...0.92593HighEASTeastEast-High
16West160100280.809215002025-04-0101APR20252383230.92857HighWESTwestWest-High
17Central1801202900.01909518002025-05-0101MAY20252386241.03448HighCENTRALcentralCentral-High
18North2001303000.041009722002025-06-0101JUN20252389351.10000HighNORTHnorthNorth-High
19South2101403100.061109925002025-07-0101JUL20252392361.12903HighSOUTHsouthSouth-High
20East2201503200.021209127002025-08-0101AUG20252395471.15625HighEASTeastEast-High
21West150160330.1308917002025-09-0101SEP20252398580.93939HighWESTwestWest-High
22Central2401003400.0314093300001MAR2025...1.00000HighCENTRALcentralCentral-High
23North2501803500.011509632002025-11-0101NOV202524046101.22857HighNORTHnorthNorth-High
24South2601903600.051609835002025-12-0101DEC202524076111.25000HighSOUTHsouthSouth-High

13. Fraud Detection Macro

%macro fraud_check;

data fraud_flag;

set grid_copy;

if Backup_Generation > 100 and Fees > 2000 then Fraud = "Yes";

else Fraud = "No";

run;

proc print data=fraud_flag;

run;

%mend;


%fraud_check;

OUTPUT:

ObsGrid_RegionWind_OutputSolar_OutputLoad_DemandFrequency_DeviationBackup_GenerationPercentageFeesDateSAS_DateMonth_StartMonths_DiffUtilizationUtil_ClassFraud
1Central1801202900.01909518002025-05-0101MAY20252386241.03448HighNo
2Central2401003400.0314093300001MAR2025...1.00000HighYes
3East1501002700.057088170001MAR2025...0.92593HighNo
4East2201503200.021209127002025-08-0101AUG20252395471.15625HighYes
5North120802500.02509010002025-01-0101JAN20252374200.80000LowNo
6North2001303000.041009722002025-06-0101JUN20252389351.10000HighNo
7North2501803500.011509632002025-11-0101NOV202524046101.22857HighYes
8South14090260-0.03608520002025-02-0101FEB20252377310.88462MediumNo
9South2101403100.061109925002025-07-0101JUL20252392361.12903HighYes
10South2601903600.051609835002025-12-0101DEC202524076111.25000HighYes
11West160100280.809215002025-04-0101APR20252383230.92857HighNo
12West150160330.1308917002025-09-0101SEP20252398580.93939HighNo
13North120802500.02509010002025-01-0101JAN20252374200.80000LowNo
14South14090260-0.03608520002025-02-0101FEB20252377310.88462MediumNo
15East1501002700.057088170001MAR2025...0.92593HighNo
16West160100280.809215002025-04-0101APR20252383230.92857HighNo
17Central1801202900.01909518002025-05-0101MAY20252386241.03448HighNo
18North2001303000.041009722002025-06-0101JUN20252389351.10000HighNo
19South2101403100.061109925002025-07-0101JUL20252392361.12903HighYes
20East2201503200.021209127002025-08-0101AUG20252395471.15625HighYes
21West150160330.1308917002025-09-0101SEP20252398580.93939HighNo
22Central2401003400.0314093300001MAR2025...1.00000HighYes
23North2501803500.011509632002025-11-0101NOV202524046101.22857HighYes
24South2601903600.051609835002025-12-0101DEC202524076111.25000HighYes

14. Intentional Error Fix (Example)

data fraud_flag;

set grid_copy;

length flag $10.;

if 1500< Fees < 2000 then flag="Error";

run;

proc print data=fraud_flag;

run;

OUTPUT:

ObsGrid_RegionWind_OutputSolar_OutputLoad_DemandFrequency_DeviationBackup_GenerationPercentageFeesDateSAS_DateMonth_StartMonths_DiffUtilizationUtil_Classflag
1Central1801202900.01909518002025-05-0101MAY20252386241.03448HighError
2Central2401003400.0314093300001MAR2025...1.00000High 
3East1501002700.057088170001MAR2025...0.92593HighError
4East2201503200.021209127002025-08-0101AUG20252395471.15625High 
5North120802500.02509010002025-01-0101JAN20252374200.80000Low 
6North2001303000.041009722002025-06-0101JUN20252389351.10000High 
7North2501803500.011509632002025-11-0101NOV202524046101.22857High 
8South14090260-0.03608520002025-02-0101FEB20252377310.88462Medium 
9South2101403100.061109925002025-07-0101JUL20252392361.12903High 
10South2601903600.051609835002025-12-0101DEC202524076111.25000High 
11West160100280.809215002025-04-0101APR20252383230.92857High 
12West150160330.1308917002025-09-0101SEP20252398580.93939HighError
13North120802500.02509010002025-01-0101JAN20252374200.80000Low 
14South14090260-0.03608520002025-02-0101FEB20252377310.88462Medium 
15East1501002700.057088170001MAR2025...0.92593HighError
16West160100280.809215002025-04-0101APR20252383230.92857High 
17Central1801202900.01909518002025-05-0101MAY20252386241.03448HighError
18North2001303000.041009722002025-06-0101JUN20252389351.10000High 
19South2101403100.061109925002025-07-0101JUL20252392361.12903High 
20East2201503200.021209127002025-08-0101AUG20252395471.15625High 
21West150160330.1308917002025-09-0101SEP20252398580.93939HighError
22Central2401003400.0314093300001MAR2025...1.00000High 
23North2501803500.011509632002025-11-0101NOV202524046101.22857High 
24South2601903600.051609835002025-12-0101DEC202524076111.25000High 

data fraud_flag;

set grid_copy;

if Fees < 2000 and Fees > 1500 then flag="Valid";

else flag = "Error";

run;

proc print data=fraud_flag;

run;

OUTPUT:

ObsGrid_RegionWind_OutputSolar_OutputLoad_DemandFrequency_DeviationBackup_GenerationPercentageFeesDateSAS_DateMonth_StartMonths_DiffUtilizationUtil_Classflag
1Central1801202900.01909518002025-05-0101MAY20252386241.03448HighValid
2Central2401003400.0314093300001MAR2025...1.00000HighError
3East1501002700.057088170001MAR2025...0.92593HighValid
4East2201503200.021209127002025-08-0101AUG20252395471.15625HighError
5North120802500.02509010002025-01-0101JAN20252374200.80000LowError
6North2001303000.041009722002025-06-0101JUN20252389351.10000HighError
7North2501803500.011509632002025-11-0101NOV202524046101.22857HighError
8South14090260-0.03608520002025-02-0101FEB20252377310.88462MediumError
9South2101403100.061109925002025-07-0101JUL20252392361.12903HighError
10South2601903600.051609835002025-12-0101DEC202524076111.25000HighError
11West160100280.809215002025-04-0101APR20252383230.92857HighError
12West150160330.1308917002025-09-0101SEP20252398580.93939HighValid
13North120802500.02509010002025-01-0101JAN20252374200.80000LowError
14South14090260-0.03608520002025-02-0101FEB20252377310.88462MediumError
15East1501002700.057088170001MAR2025...0.92593HighValid
16West160100280.809215002025-04-0101APR20252383230.92857HighError
17Central1801202900.01909518002025-05-0101MAY20252386241.03448HighValid
18North2001303000.041009722002025-06-0101JUN20252389351.10000HighError
19South2101403100.061109925002025-07-0101JUL20252392361.12903HighError
20East2201503200.021209127002025-08-0101AUG20252395471.15625HighError
21West150160330.1308917002025-09-0101SEP20252398580.93939HighValid
22Central2401003400.0314093300001MAR2025...1.00000HighError
23North2501803500.011509632002025-11-0101NOV202524046101.22857HighError
24South2601903600.051609835002025-12-0101DEC202524076111.25000HighError

15. PROC DATASETS DELETE

proc datasets library=work;

delete grid_raw grid_copy;

run;

quit;

LOG:

NOTE: Deleting WORK.GRID_RAW (memtype=DATA).
NOTE: Deleting WORK.GRID_COPY (memtype=DATA).

16. Final Corrected Dataset Code

data final_grid;

set grid_clean;

Total_Output = sum(Wind_Output, Solar_Output);

Utilization = Total_Output / Load_Demand;

length Util_Class $10. Risk $10.;

if Utilization > 0.9 then Util_Class="High";

else if Utilization > 0.8 then Util_Class="Medium";

else Util_Class="Low";

if Backup_Generation > 100 then Risk="High";

else Risk="Normal";

if not missing(Date) and Date ne "wrong" then 

    Clean_Date = input(Date, yymmdd10.);

else 

    Clean_Date = .;

format Clean_Date date9.;

if missing(Clean_Date) then Clean_Date="01MAR2025"d;

run;

proc print data=final_grid;

run;

OUTPUT:

ObsGrid_RegionWind_OutputSolar_OutputLoad_DemandFrequency_DeviationBackup_GenerationPercentageFeesDateTotal_OutputUtilizationUtil_ClassRiskClean_Date
1North120802500.02509010002025-01-012000.80000LowNormal01JAN2025
2South14090260-0.03608520002025-02-012300.88462MediumNormal01FEB2025
3East1501002700.0570881700wrong2500.92593HighNormal01MAR2025
4West160100280.809215002025-04-012600.92857HighNormal01APR2025
5Central1801202900.01909518002025-05-013001.03448HighNormal01MAY2025
6North2001303000.041009722002025-06-013301.10000HighNormal01JUN2025
7South2101403100.061109925002025-07-013501.12903HighHigh01JUL2025
8East2201503200.021209127002025-08-013701.15625HighHigh01AUG2025
9West150160330.1308917002025-09-013100.93939HighHigh01SEP2025
10Central2401003400.03140933000wrong3401.00000HighHigh01MAR2025
11North2501803500.011509632002025-11-014301.22857HighHigh01NOV2025
12South2601903600.051609835002025-12-014501.25000HighHigh01DEC2025

/*Alternate*/

data final_grid;

set grid_clean;

Total_Output = sum(Wind_Output, Solar_Output);

Utilization = Total_Output / Load_Demand;

if Utilization > 0.9 then Util_Class="High";

else if Utilization > 0.8 then Util_Class="Medium";

else Util_Class="Low";

if Backup_Generation > 100 then Risk="High";

else Risk="Normal";

if not missing(Date) and Date ne "wrong" then 

    Clean_Date = input(Date, yymmdd10.);

else 

    Clean_Date = .;

format Clean_Date date9.;

if missing(Clean_Date) then Clean_Date = mdy(3,1,2025);

run;

proc print data=final_grid;

run;

OUTPUT:

ObsGrid_RegionWind_OutputSolar_OutputLoad_DemandFrequency_DeviationBackup_GenerationPercentageFeesDateTotal_OutputUtilizationUtil_ClassRiskClean_Date
1North120802500.02509010002025-01-012000.80000LowNormal01JAN2025
2South14090260-0.03608520002025-02-012300.88462MediumNormal01FEB2025
3East1501002700.0570881700wrong2500.92593HighNormal01MAR2025
4West160100280.809215002025-04-012600.92857HighNormal01APR2025
5Central1801202900.01909518002025-05-013001.03448HighNormal01MAY2025
6North2001303000.041009722002025-06-013301.10000HighNormal01JUN2025
7South2101403100.061109925002025-07-013501.12903HighHigh01JUL2025
8East2201503200.021209127002025-08-013701.15625HighHigh01AUG2025
9West150160330.1308917002025-09-013100.93939HighHigh01SEP2025
10Central2401003400.03140933000wrong3401.00000HighHigh01MAR2025
11North2501803500.011509632002025-11-014301.22857HighHigh01NOV2025
12South2601903600.051609835002025-12-014501.25000HighHigh01DEC2025

Variable

Type

Example

Clean_Date

Numeric

23375

Display

Format

01MAR2025

 SAS Date Storage

·  Dates are numeric values

·  Formats only change display

Method

Example

Date Literal

'01MAR2025'd

MDY

mdy(3,1,2025)




Key Highlights

·  Data cleaning using COALESCENCE

·  Date derivation using INTNX, INTCK

·  Fraud detection using macros

·  Data reshaping with PROC TRANSPOSE

·  Dataset lifecycle using PROC DATASETS

·  Handling missing values and logic errors


20 Key Points About The Project

1.     Designed a grid stability dataset capturing renewable energy output, demand, and system deviations.
2.     Integrated both R and SAS environments to ensure cross-platform data consistency and validation.
3.     Introduced intentional data quality issues such as missing values, invalid dates, and inconsistent formats to simulate real-world scenarios.
4.     Applied data cleaning techniques using COALESCE to handle missing numeric values efficiently.
5.     Standardized categorical variables using character functions like PROPCASE, STRIP, UPCASE, and LOWCASE.
6.     Implemented date conversion logic using INPUT, MDY, INTNX, and INTCK for accurate temporal analysis.
7.     Developed robust error-handling logic to safely process invalid entries like "wrong" dates without breaking execution.
8.     Derived Total_Output and Utilization metrics to measure grid efficiency and performance.
9.     Created utilization classification (High/Medium/Low) using conditional logic for operational insights.
10.  Performed data transformation using SET, MERGE, and APPEND to combine and extend datasets.
11.  Reshaped data using PROC TRANSPOSE for reporting and analytical flexibility.
12.  Leveraged numeric functions like SUM and MEAN for aggregated calculations.
13.  Built character-based composite variables using CAT and CATX for better reporting readability.
14.  Designed a fraud detection framework using SAS MACROS to dynamically flag suspicious patterns.
15.  Identified fraud scenarios such as abnormally high backup generation and inflated operational fees.
16.  Ensured complete classification logic by covering all edge cases including missing and extreme values.
17.  Used PROC DATASETS DELETE for efficient dataset lifecycle management and workspace cleanup.
18.  Debugged common SAS errors including type mismatches, invalid INPUT conversions, and logical condition flaws.
19.  Applied defensive programming techniques to make the code robust, scalable, and production-ready.
20.  Delivered an end-to-end data pipeline demonstrating SAS capabilities in data engineering, validation, and fraud analytics within energy systems.

Summary

This project demonstrates how SAS can efficiently handle real-world grid stability data. By integrating multiple techniques such as data cleaning, transformation, merging, and fraud detection, we created a robust pipeline. Intentional errors were introduced to simulate real scenarios, and each error was carefully corrected with explanations.

The use of macros improves scalability, while functions like INTNX, COALESCE, and CATX enhance data quality. This project highlights how SAS is not just a programming language but a complete data engineering tool capable of ensuring accuracy and reliability in critical systems like energy grids.

Conclusion

In this project, we worked on grid stability data and learned how to handle real-life data problems step by step. First, we created raw datasets with errors to understand how messy real data can be. Then we cleaned the data using SAS functions and fixed missing values, incorrect formats, and logical mistakes. We also learned how to work with dates, calculate utilization, and classify data into meaningful categories.

One of the most important parts was detecting fraud using macros, which is very useful in real-world scenarios. We also practiced using different SAS techniques like merge, append, transpose, and dataset deletion. Overall, this project gives a complete understanding of how SAS can be used in real-time data processing and analysis. It is very helpful for interviews and practical work.

SAS INTERVIEW QUESTIONS

1. What is the difference between LENGTH and FORMAT in SAS?

Answer:

  • LENGTH defines how much storage space a variable occupies.
  • FORMAT controls how the value is displayed, not stored.
    LENGTH affects data structure, FORMAT affects only appearance.

2. What is the difference between PROC SORT with NODUPKEY and NODUP?

Answer:

  • NODUPKEY removes duplicates based on BY variables only.
  • NODUP removes completely identical observations.
    Use NODUPKEY for key-based deduplication.

3. What is the role of RETAIN statement in SAS?

Answer:

  • By default, SAS resets variables each iteration.
  • RETAIN keeps the previous value across rows.
    Commonly used for cumulative sums and carrying forward values.

--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

About the Author:

SAS Learning Hub is a data analytics and SAS programming platform focused on clinical, financial, and real-world data analysis. The content is created by professionals with academic training in Pharmaceutics and hands-on experience in Base SAS, PROC SQL, Macros, SDTM, and ADaM, providing practical and industry-relevant SAS learning resources.


Disclaimer:

The datasets and analysis in this article are created for educational and demonstration purposes only. They do not represent GRID STABILITY data.


Our Mission:

This blog provides industry-focused SAS programming tutorials and analytics projects covering finance, healthcare, and technology.


This project is suitable for:

·  Students learning SAS

·  Data analysts building portfolios

·  Professionals preparing for SAS interviews

·  Bloggers writing about analytics 

·  Clinical SAS Programmer

·  Research Data Analyst

·  Regulatory Data Validator

--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Follow Us On : 


 
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

--->Follow our blog for more SAS-based analytics projects and industry data models.

---> Support Us By Following Our Blog..

To deepen your understanding of SAS analytics, please refer to our other data science and industry-focused projects listed below:



--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

About Us | Contact Privacy Policy



Comments

Popular posts from this blog

412.Can We Build And Clean A University Course Analytics & Fraud Detection System In Sas Using Only Macros While Intentionally Creating And Fixing Errors?

420.Can We Detect Errors, Prevent Fraud, And Optimize Biometric Access System Security Using Advanced SAS Programming?

418.Can We Design, Debug, Detect Fraud, And Optimize A Smart Parking System Using Advanced SAS Programming Techniques?