324.RENEWABLE POWER PLANT ANALYSIS USING DATA STEP | PROC FORMAT | PROC SQL | PROC MEANS | PROC TABULATE | PROC CORR | PROC UNIVARIATE | PROC SGPLOT | MACROS

RENEWABLE POWER PLANT ANALYSIS USING DATA STEP | PROC FORMAT | PROC SQL | PROC MEANS | PROC TABULATE | PROC CORR | PROC UNIVARIATE | PROC SGPLOT | MACROS

 options nocenter;

1) CREATE RAW DATASET: renewable_plants_raw

data work.renewable_plants_raw;

    informat Installed_Date date9.;

    format   Installed_Date date9.;

    input Plant_Name :& $40. Country    :& $20. Technology :$8. Capacity_MW

        Installed_Date :date9. Cost_USD_Million Carbon_Reduction_TonsPerYear 

        Efficiency_PCT;

    Year_Installed = year(Installed_Date);

datalines;

Sunrise Solar        India        Solar     150 01JAN2016 120.5 180000 20.5

Azure Ridge         USA          Wind      200 15JUN2014 250.0 250000 33.8

RiverFlow           Brazil       Hydro     300 10MAR2010 480.0 400000 45.2

DesertSun           UAE          Solar     120 20SEP2019 95.0 150000 22.7

CoastalBreeze       UK           Wind       90 05NOV2018 110.0 90000  29.1

HighlandHydro       Canada       Hydro      75 12MAY2015 140.0 90000  42.0

ValleySun           Spain        Solar      60 01APR2020 45.0 70000   19.4

GaleFarm            Germany      Wind      130 30AUG2012 180.0 160000 31.6

MountainFlow        Norway       Hydro     110 22FEB2011 220.0 160000 46.5

SunPeak             Australia    Solar      45 17JUL2017 36.0 50000   21.0

WindEdge            India        Wind      160 03OCT2013 190.0 210000 34.2

BlueDam             China        Hydro     210 29DEC2008 400.0 320000 44.1

;

run;

proc print data=work.renewable_plants_raw(obs=12) noobs label;

    title "RENEWABLE PLANTS - RAW DATA (12 OBS)";

    var Plant_Name Country Technology Capacity_MW Installed_Date

        Year_Installed Cost_USD_Million Carbon_Reduction_TonsPerYear Efficiency_PCT;

run;

OUTPUT:

RENEWABLE PLANTS - RAW DATA (12 OBS)

Plant_NameCountryTechnologyCapacity_MWInstalled_DateYear_InstalledCost_USD_MillionCarbon_Reduction_TonsPerYearEfficiency_PCT
Sunrise SolarIndiaSolar15001JAN20162016120.518000020.5
Azure RidgeUSAWind20015JUN20142014250.025000033.8
RiverFlowBrazilHydro30010MAR20102010480.040000045.2
DesertSunUAESolar12020SEP2019201995.015000022.7
CoastalBreezeUKWind9005NOV20182018110.09000029.1
HighlandHydroCanadaHydro7512MAY20152015140.09000042.0
ValleySunSpainSolar6001APR2020202045.07000019.4
GaleFarmGermanyWind13030AUG20122012180.016000031.6
MountainFlowNorwayHydro11022FEB20112011220.016000046.5
SunPeakAustraliaSolar4517JUL2017201736.05000021.0
WindEdgeIndiaWind16003OCT20132013190.021000034.2
BlueDamChinaHydro21029DEC20082008400.032000044.1

2) DERIVATIONS: Age, Next_Maintenance_Date using INTCK / INTNX

data work.renewable_plants;

  set work.renewable_plants_raw;

  /* Age in completed years */

  Age_Years = intck('year', Installed_Date, today(), 'c'); /* 'c' count completed boundaries */

  /* Days since installed (useful for more precise calculations) */

  Days_Since_Installed = intck('day', Installed_Date, today());

  /* Next scheduled maintenance (example every 5 years from installation) */

  Next_Maintenance_Date = intnx('year', Installed_Date, ceil(Age_Years/5)*5, 'same');

  format Next_Maintenance_Date date9.;

  /* Round numeric variables to sensible precision */

  Capacity_MW = round(Capacity_MW, 0.1);

  Efficiency_PCT = round(Efficiency_PCT, 0.1);

run;

/* Validate derived dates and ages */

proc print data=work.renewable_plants(obs=12) label noobs;

  title "RENEWABLE PLANTS - DERIVED FIELDS";

  var Plant_Name Installed_Date Age_Years Days_Since_Installed Next_Maintenance_Date;

run;

OUTPUT:

RENEWABLE PLANTS - DERIVED FIELDS

Plant_NameInstalled_DateAge_YearsDays_Since_InstalledNext_Maintenance_Date
Sunrise Solar01JAN20169362101JAN2026
Azure Ridge15JUN201411418615JUN2029
RiverFlow10MAR201015574410MAR2025
DesertSun20SEP20196226320SEP2029
CoastalBreeze05NOV20187258205NOV2028
HighlandHydro12MAY201510385512MAY2025
ValleySun01APR20205206901APR2025
GaleFarm30AUG201213484030AUG2027
MountainFlow22FEB201114539522FEB2026
SunPeak17JUL20178305817JUL2027
WindEdge03OCT201312444103OCT2028
BlueDam29DEC200816618029DEC2028

3) MACRO: Efficiency classification

      - Efficiency_Class: 'Low', 'Medium', 'High'

%macro classify_efficiency(in_ds=work.renewable_plants, out_ds=work.renewable_plants_class,

                           low_thr=20, high_thr=40);

  proc format;

    value efffmt

      low - < &low_thr = 'Low'

      &low_thr - < &high_thr = 'Medium'

      &high_thr - high = 'High'

    ;

  run;


  data &out_ds;

    set &in_ds;

    /* apply thresholds: careful about boundary values */

    if Efficiency_PCT < &low_thr then Efficiency_Class = 'Low';

    else if Efficiency_PCT < &high_thr then Efficiency_Class = 'Medium';

    else Efficiency_Class = 'High';


    /* also stash a formatted value */

    Efficiency_Class_F = put(Efficiency_PCT, efffmt.);

  run;

  proc print data=&out_ds;

  run;

%mend classify_efficiency;


%classify_efficiency();

OUTPUT:

ObsInstalled_DatePlant_NameCountryTechnologyCapacity_MWCost_USD_MillionCarbon_Reduction_TonsPerYearEfficiency_PCTYear_InstalledAge_YearsDays_Since_InstalledNext_Maintenance_DateEfficiency_ClassEfficiency_Class_F
101JAN2016Sunrise SolarIndiaSolar150120.518000020.520169362101JAN2026MedMedium
215JUN2014Azure RidgeUSAWind200250.025000033.8201411418615JUN2029MedMedium
310MAR2010RiverFlowBrazilHydro300480.040000045.2201015574410MAR2025HigHigh
420SEP2019DesertSunUAESolar12095.015000022.720196226320SEP2029MedMedium
505NOV2018CoastalBreezeUKWind90110.09000029.120187258205NOV2028MedMedium
612MAY2015HighlandHydroCanadaHydro75140.09000042.0201510385512MAY2025HigHigh
701APR2020ValleySunSpainSolar6045.07000019.420205206901APR2025LowLow
830AUG2012GaleFarmGermanyWind130180.016000031.6201213484030AUG2027MedMedium
922FEB2011MountainFlowNorwayHydro110220.016000046.5201114539522FEB2026HigHigh
1017JUL2017SunPeakAustraliaSolar4536.05000021.020178305817JUL2027MedMedium
1103OCT2013WindEdgeIndiaWind160190.021000034.2201312444103OCT2028MedMedium
1229DEC2008BlueDamChinaHydro210400.032000044.1200816618029DEC2028HigHigh


/* Check classification */

proc freq data=work.renewable_plants_class;

  tables Efficiency_Class Efficiency_Class_F / nocum nopercent;

  title "EFFICIENCY CLASSIFICATION - FREQUENCY";

run;

OUTPUT:

EFFICIENCY CLASSIFICATION - FREQUENCY

The FREQ Procedure

Efficiency_ClassFrequency
Hig4
Low1
Med7
Efficiency_Class_FFrequency
High4
Low1
Medium7

4) PROC SQL: summary tables by Technology and by Country

proc sql;

  create table work.summary_by_tech as

  select Technology,

         count(*) as N_PowerPlants,

         sum(Capacity_MW) as Total_Capacity_MW format=comma12.1,

         mean(Efficiency_PCT) as Mean_Efficiency_PCT format=8.2,

         sum(Carbon_Reduction_TonsPerYear) as Total_Carbon_Reduction format=comma12.,

         mean(Cost_USD_Million) as Mean_Cost_Million format=comma12.2

  from work.renewable_plants_class

  group by Technology

  order by Total_Capacity_MW desc;

quit;

proc print data=work.summary_by_tech noobs label;

  title "SUMMARY BY TECHNOLOGY (PROC SQL)";

run;

OUTPUT:

SUMMARY BY TECHNOLOGY (PROC SQL)

TechnologyN_PowerPlantsTotal_Capacity_MWMean_Efficiency_PCTTotal_Carbon_ReductionMean_Cost_Million
Hydro4695.044.45970,000310.00
Wind4580.032.18710,000182.50
Solar4375.020.90450,00074.13

/* Summary by country */

proc sql;

  create table work.summary_by_country as

  select Country,

         count(*) as N_PowerPlants,

         sum(Capacity_MW) as Total_Capacity_MW format=comma12.1,

         mean(Efficiency_PCT) as Mean_Efficiency_PCT format=8.2

  from work.renewable_plants_class

  group by Country

  order by Total_Capacity_MW desc;

quit;

proc print data=work.summary_by_country noobs label;

  title "SUMMARY BY COUNTRY (PROC SQL)";

run;

OUTPUT:

SUMMARY BY COUNTRY (PROC SQL)

CountryN_PowerPlantsTotal_Capacity_MWMean_Efficiency_PCT
India2310.027.35
Brazil1300.045.20
China1210.044.10
USA1200.033.80
Germany1130.031.60
UAE1120.022.70
Norway1110.046.50
UK190.029.10
Canada175.042.00
Spain160.019.40
Australia145.021.00

5) PROC MEANS: Descriptive statistics by Technology

proc means data=work.renewable_plants_class n mean std min max median maxdec=2;

  class Technology;

  var Capacity_MW Efficiency_PCT Cost_USD_Million Carbon_Reduction_TonsPerYear Age_Years;

  title "DESCRIPTIVE STATISTICS BY TECHNOLOGY (PROC MEANS)";

run;

OUTPUT:

DESCRIPTIVE STATISTICS BY TECHNOLOGY (PROC MEANS)

The MEANS Procedure

TechnologyN ObsVariableNMeanStd DevMinimumMaximumMedian
Hydro4
Capacity_MW
Efficiency_PCT
Cost_USD_Million
Carbon_Reduction_TonsPerYear
Age_Years
4
4
4
4
4
173.75
44.45
310.00
242500.00
13.75
101.77
1.91
157.06
142448.82
2.63
75.00
42.00
140.00
90000.00
10.00
300.00
46.50
480.00
400000.00
16.00
160.00
44.65
310.00
240000.00
14.50
Solar4
Capacity_MW
Efficiency_PCT
Cost_USD_Million
Carbon_Reduction_TonsPerYear
Age_Years
4
4
4
4
4
93.75
20.90
74.13
112500.00
7.00
49.56
1.37
40.37
62383.22
1.83
45.00
19.40
36.00
50000.00
5.00
150.00
22.70
120.50
180000.00
9.00
90.00
20.75
70.00
110000.00
7.00
Wind4
Capacity_MW
Efficiency_PCT
Cost_USD_Million
Carbon_Reduction_TonsPerYear
Age_Years
4
4
4
4
4
145.00
32.18
182.50
177500.00
10.75
46.55
2.35
57.37
68980.67
2.63
90.00
29.10
110.00
90000.00
7.00
200.00
34.20
250.00
250000.00
13.00
145.00
32.70
185.00
185000.00
11.50

6) PROC TABULATE: Nicely formatted report

     - Cross-tab: Technology x Efficiency_Class with sums and means

proc tabulate data=work.renewable_plants_class format=8.2 noseps missing;

  class Technology Efficiency_Class;

  var Capacity_MW Efficiency_PCT Cost_USD_Million Carbon_Reduction_TonsPerYear;

  table Technology,

        (Capacity_MW*sum='Total Capacity (MW)'

         Efficiency_PCT*mean='Avg Efficiency (%)'

         Cost_USD_Million*mean='Avg Cost (M$)')*f=comma12.2

        / rts=20;

  keylabel sum='Sum' mean='Mean';

  title "TABULATED SUMMARY: TECHNOLOGY x METRICS (PROC TABULATE)";

run;

OUTPUT:

TABULATED SUMMARY: TECHNOLOGY x METRICS (PROC TABULATE)

 Capacity_MWEfficiency_PCTCost_USD_Million
Total Capacity (MW)Avg Efficiency (%)Avg Cost (M$)
Technology695.0044.45310.00
Hydro
Solar375.0020.9074.13
Wind580.0032.18182.50

7) PROC SGPLOT: Visualizations

     - Bar chart: Total Capacity by Country

     - Scatter: Efficiency vs Capacity, sized by Carbon_Reduction

     - Series or vbar for count by Efficiency_Class

/* Prepare aggregated dataset for plotting capacity by country */

proc sql;

  create table work.plot_capacity_by_country as

  select Country, sum(Capacity_MW) as Total_Capacity_MW

  from work.renewable_plants_class

  group by Country;

quit;

proc print data=work.plot_capacity_by_country;

run;

OUTPUT:

ObsCountryTotal_Capacity_MW
1Australia45
2Brazil300
3Canada75
4China210
5Germany130
6India310
7Norway110
8Spain60
9UAE120
10UK90
11USA200


proc sgplot data=work.plot_capacity_by_country;

  vbar Country / response=Total_Capacity_MW datalabel;

  yaxis label="Total Capacity (MW)";

  title "TOTAL CAPACITY BY COUNTRY (MW)";

run;

OUTPUT:

The SGPlot Procedure


proc sgplot data=work.renewable_plants_class;

  bubble x=Capacity_MW y=Efficiency_PCT size=Carbon_Reduction_TonsPerYear / datalabel=Plant_Name;

  xaxis label="Capacity (MW)";

  yaxis label="Efficiency (%)";

  title "EFFICIENCY vs CAPACITY (BUBBLE = CARBON REDUCTION)";

run;

OUTPUT:

The SGPlot Procedure


proc sgplot data=work.renewable_plants_class;

  vbar Efficiency_Class / stat=freq datalabel;

  xaxis label="Efficiency Class";

  yaxis label="Number of Plants";

  title "NUMBER OF PLANTS BY EFFICIENCY CLASS";

run;

OUTPUT:

The SGPlot Procedure


8) PROC CORR: Check correlation between key performance metrics

proc corr data=work.renewable_plants_class plots=matrix(histogram);

   var Capacity_MW Cost_USD_Million Efficiency_PCT Carbon_Reduction_TonsPerYear Age_Years;

   title "CORRELATION ANALYSIS OF RENEWABLE PLANT METRICS (PROC CORR)";

run;

OUTPUT:

CORRELATION ANALYSIS OF RENEWABLE PLANT METRICS (PROC CORR)

The CORR Procedure

5 Variables:Capacity_MW Cost_USD_Million Efficiency_PCT Carbon_Reduction_TonsPerYear Age_Years
Simple Statistics
VariableNMeanStd DevSumMinimumMaximum
Capacity_MW12137.5000072.66048165045.00000300.00000
Cost_USD_Million12188.87500134.93469226736.00000480.00000
Efficiency_PCT1232.5083310.19335390.1000019.4000046.50000
Carbon_Reduction_TonsPerYear12177500104718213000050000400000
Age_Years1210.500003.60555126.000005.0000016.00000
Pearson Correlation Coefficients, N = 12
Prob > |r| under H0: Rho=0
 Capacity_MWCost_USD_MillionEfficiency_PCTCarbon_Reduction_TonsPerYearAge_Years
Capacity_MW
1.00000
 
0.91379
<.0001
0.52954
0.0766
0.98718
<.0001
0.69228
0.0126
Cost_USD_Million
0.91379
<.0001
1.00000
 
0.78379
0.0026
0.94888
<.0001
0.85754
0.0004
Efficiency_PCT
0.52954
0.0766
0.78379
0.0026
1.00000
 
0.59440
0.0415
0.84335
0.0006
Carbon_Reduction_TonsPerYear
0.98718
<.0001
0.94888
<.0001
0.59440
0.0415
1.00000
 
0.75724
0.0043
Age_Years
0.69228
0.0126
0.85754
0.0004
0.84335
0.0006
0.75724
0.0043
1.00000
 
Scatter Plot Matrix


9) PROC UNIVARIATE: Distribution, normality, outliers Using OUTPUT statement for percentiles

proc univariate data=work.renewable_plants_class noprint;

   var Capacity_MW Efficiency_PCT Carbon_Reduction_TonsPerYear;

   histogram Capacity_MW / normal;

   inset n mean std min max / position=ne;

   title "DISTRIBUTION AND OUTLIER ANALYSIS (PROC UNIVARIATE)";

run;

OUTPUT:

DISTRIBUTION AND OUTLIER ANALYSIS (PROC UNIVARIATE)

The UNIVARIATE Procedure

Histogram for Capacity_MW

DISTRIBUTION AND OUTLIER ANALYSIS (PROC UNIVARIATE)

The UNIVARIATE Procedure

Fitted Normal Distribution for Capacity_MW

Parameters for Normal Distribution
ParameterSymbolEstimate
MeanMu137.5
Std DevSigma72.66048
Goodness-of-Fit Tests for Normal Distribution
TestStatisticp Value
Kolmogorov-SmirnovD0.12840999Pr > D>0.150
Cramer-von MisesW-Sq0.03540932Pr > W-Sq>0.250
Anderson-DarlingA-Sq0.26628388Pr > A-Sq>0.250
Quantiles for Normal Distribution
PercentQuantile
ObservedEstimated
1.045.0000-31.5336
5.045.000017.9841
10.060.000044.3818
25.082.500088.4913
50.0125.0000137.5000
75.0180.0000186.5087
90.0210.0000230.6182
95.0300.0000257.0159
99.0300.0000306.5336


To Visit My Previous Original Data Management Dataset:Click Here
To Visit My Previous Cricket World Cup Dataset:Click Here
To Visit My Previous Home Tour Dataset:Click Here
To Visit My Previous Real-World Dal Price Dataset:Click Here  



Follow Us On : 


 


--- FOLLOW OUR BLOG FOR MORE INFORMATION.

--->PLEASE DO COMMENTS AND SHARE OUR BLOG.







Comments

Popular posts from this blog

409.Can We Build a Reliable Emergency Services Analytics & Fraud Detection System in SAS While Identifying and Fixing Intentional Errors?

397.If a satellite has excellent signal strength but very high latency, can it still deliver good quality communication? Why or why not?A Sas Study

401.How Efficient Are Global Data Centers? A Complete SAS Analytics Study