324.RENEWABLE POWER PLANT ANALYSIS USING DATA STEP | PROC FORMAT | PROC SQL | PROC MEANS | PROC TABULATE | PROC CORR | PROC UNIVARIATE | PROC SGPLOT | MACROS
RENEWABLE POWER PLANT ANALYSIS USING DATA STEP | PROC FORMAT | PROC SQL | PROC MEANS | PROC TABULATE | PROC CORR | PROC UNIVARIATE | PROC SGPLOT | MACROS
options nocenter;
1) CREATE RAW DATASET: renewable_plants_raw
data work.renewable_plants_raw;
informat Installed_Date date9.;
format Installed_Date date9.;
input Plant_Name :& $40. Country :& $20. Technology :$8. Capacity_MW
Installed_Date :date9. Cost_USD_Million Carbon_Reduction_TonsPerYear
Efficiency_PCT;
Year_Installed = year(Installed_Date);
datalines;
Sunrise Solar India Solar 150 01JAN2016 120.5 180000 20.5
Azure Ridge USA Wind 200 15JUN2014 250.0 250000 33.8
RiverFlow Brazil Hydro 300 10MAR2010 480.0 400000 45.2
DesertSun UAE Solar 120 20SEP2019 95.0 150000 22.7
CoastalBreeze UK Wind 90 05NOV2018 110.0 90000 29.1
HighlandHydro Canada Hydro 75 12MAY2015 140.0 90000 42.0
ValleySun Spain Solar 60 01APR2020 45.0 70000 19.4
GaleFarm Germany Wind 130 30AUG2012 180.0 160000 31.6
MountainFlow Norway Hydro 110 22FEB2011 220.0 160000 46.5
SunPeak Australia Solar 45 17JUL2017 36.0 50000 21.0
WindEdge India Wind 160 03OCT2013 190.0 210000 34.2
BlueDam China Hydro 210 29DEC2008 400.0 320000 44.1
;
run;
proc print data=work.renewable_plants_raw(obs=12) noobs label;
title "RENEWABLE PLANTS - RAW DATA (12 OBS)";
var Plant_Name Country Technology Capacity_MW Installed_Date
Year_Installed Cost_USD_Million Carbon_Reduction_TonsPerYear Efficiency_PCT;
run;
OUTPUT:
| Plant_Name | Country | Technology | Capacity_MW | Installed_Date | Year_Installed | Cost_USD_Million | Carbon_Reduction_TonsPerYear | Efficiency_PCT |
|---|---|---|---|---|---|---|---|---|
| Sunrise Solar | India | Solar | 150 | 01JAN2016 | 2016 | 120.5 | 180000 | 20.5 |
| Azure Ridge | USA | Wind | 200 | 15JUN2014 | 2014 | 250.0 | 250000 | 33.8 |
| RiverFlow | Brazil | Hydro | 300 | 10MAR2010 | 2010 | 480.0 | 400000 | 45.2 |
| DesertSun | UAE | Solar | 120 | 20SEP2019 | 2019 | 95.0 | 150000 | 22.7 |
| CoastalBreeze | UK | Wind | 90 | 05NOV2018 | 2018 | 110.0 | 90000 | 29.1 |
| HighlandHydro | Canada | Hydro | 75 | 12MAY2015 | 2015 | 140.0 | 90000 | 42.0 |
| ValleySun | Spain | Solar | 60 | 01APR2020 | 2020 | 45.0 | 70000 | 19.4 |
| GaleFarm | Germany | Wind | 130 | 30AUG2012 | 2012 | 180.0 | 160000 | 31.6 |
| MountainFlow | Norway | Hydro | 110 | 22FEB2011 | 2011 | 220.0 | 160000 | 46.5 |
| SunPeak | Australia | Solar | 45 | 17JUL2017 | 2017 | 36.0 | 50000 | 21.0 |
| WindEdge | India | Wind | 160 | 03OCT2013 | 2013 | 190.0 | 210000 | 34.2 |
| BlueDam | China | Hydro | 210 | 29DEC2008 | 2008 | 400.0 | 320000 | 44.1 |
2) DERIVATIONS: Age, Next_Maintenance_Date using INTCK / INTNX
data work.renewable_plants;
set work.renewable_plants_raw;
/* Age in completed years */
Age_Years = intck('year', Installed_Date, today(), 'c'); /* 'c' count completed boundaries */
/* Days since installed (useful for more precise calculations) */
Days_Since_Installed = intck('day', Installed_Date, today());
/* Next scheduled maintenance (example every 5 years from installation) */
Next_Maintenance_Date = intnx('year', Installed_Date, ceil(Age_Years/5)*5, 'same');
format Next_Maintenance_Date date9.;
/* Round numeric variables to sensible precision */
Capacity_MW = round(Capacity_MW, 0.1);
Efficiency_PCT = round(Efficiency_PCT, 0.1);
run;
/* Validate derived dates and ages */
proc print data=work.renewable_plants(obs=12) label noobs;
title "RENEWABLE PLANTS - DERIVED FIELDS";
var Plant_Name Installed_Date Age_Years Days_Since_Installed Next_Maintenance_Date;
run;
OUTPUT:
| Plant_Name | Installed_Date | Age_Years | Days_Since_Installed | Next_Maintenance_Date |
|---|---|---|---|---|
| Sunrise Solar | 01JAN2016 | 9 | 3621 | 01JAN2026 |
| Azure Ridge | 15JUN2014 | 11 | 4186 | 15JUN2029 |
| RiverFlow | 10MAR2010 | 15 | 5744 | 10MAR2025 |
| DesertSun | 20SEP2019 | 6 | 2263 | 20SEP2029 |
| CoastalBreeze | 05NOV2018 | 7 | 2582 | 05NOV2028 |
| HighlandHydro | 12MAY2015 | 10 | 3855 | 12MAY2025 |
| ValleySun | 01APR2020 | 5 | 2069 | 01APR2025 |
| GaleFarm | 30AUG2012 | 13 | 4840 | 30AUG2027 |
| MountainFlow | 22FEB2011 | 14 | 5395 | 22FEB2026 |
| SunPeak | 17JUL2017 | 8 | 3058 | 17JUL2027 |
| WindEdge | 03OCT2013 | 12 | 4441 | 03OCT2028 |
| BlueDam | 29DEC2008 | 16 | 6180 | 29DEC2028 |
3) MACRO: Efficiency classification
- Efficiency_Class: 'Low', 'Medium', 'High'
%macro classify_efficiency(in_ds=work.renewable_plants, out_ds=work.renewable_plants_class,
low_thr=20, high_thr=40);
proc format;
value efffmt
low - < &low_thr = 'Low'
&low_thr - < &high_thr = 'Medium'
&high_thr - high = 'High'
;
run;
data &out_ds;
set &in_ds;
/* apply thresholds: careful about boundary values */
if Efficiency_PCT < &low_thr then Efficiency_Class = 'Low';
else if Efficiency_PCT < &high_thr then Efficiency_Class = 'Medium';
else Efficiency_Class = 'High';
/* also stash a formatted value */
Efficiency_Class_F = put(Efficiency_PCT, efffmt.);
run;
proc print data=&out_ds;
run;
%mend classify_efficiency;
%classify_efficiency();
OUTPUT:
| Obs | Installed_Date | Plant_Name | Country | Technology | Capacity_MW | Cost_USD_Million | Carbon_Reduction_TonsPerYear | Efficiency_PCT | Year_Installed | Age_Years | Days_Since_Installed | Next_Maintenance_Date | Efficiency_Class | Efficiency_Class_F |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | 01JAN2016 | Sunrise Solar | India | Solar | 150 | 120.5 | 180000 | 20.5 | 2016 | 9 | 3621 | 01JAN2026 | Med | Medium |
| 2 | 15JUN2014 | Azure Ridge | USA | Wind | 200 | 250.0 | 250000 | 33.8 | 2014 | 11 | 4186 | 15JUN2029 | Med | Medium |
| 3 | 10MAR2010 | RiverFlow | Brazil | Hydro | 300 | 480.0 | 400000 | 45.2 | 2010 | 15 | 5744 | 10MAR2025 | Hig | High |
| 4 | 20SEP2019 | DesertSun | UAE | Solar | 120 | 95.0 | 150000 | 22.7 | 2019 | 6 | 2263 | 20SEP2029 | Med | Medium |
| 5 | 05NOV2018 | CoastalBreeze | UK | Wind | 90 | 110.0 | 90000 | 29.1 | 2018 | 7 | 2582 | 05NOV2028 | Med | Medium |
| 6 | 12MAY2015 | HighlandHydro | Canada | Hydro | 75 | 140.0 | 90000 | 42.0 | 2015 | 10 | 3855 | 12MAY2025 | Hig | High |
| 7 | 01APR2020 | ValleySun | Spain | Solar | 60 | 45.0 | 70000 | 19.4 | 2020 | 5 | 2069 | 01APR2025 | Low | Low |
| 8 | 30AUG2012 | GaleFarm | Germany | Wind | 130 | 180.0 | 160000 | 31.6 | 2012 | 13 | 4840 | 30AUG2027 | Med | Medium |
| 9 | 22FEB2011 | MountainFlow | Norway | Hydro | 110 | 220.0 | 160000 | 46.5 | 2011 | 14 | 5395 | 22FEB2026 | Hig | High |
| 10 | 17JUL2017 | SunPeak | Australia | Solar | 45 | 36.0 | 50000 | 21.0 | 2017 | 8 | 3058 | 17JUL2027 | Med | Medium |
| 11 | 03OCT2013 | WindEdge | India | Wind | 160 | 190.0 | 210000 | 34.2 | 2013 | 12 | 4441 | 03OCT2028 | Med | Medium |
| 12 | 29DEC2008 | BlueDam | China | Hydro | 210 | 400.0 | 320000 | 44.1 | 2008 | 16 | 6180 | 29DEC2028 | Hig | High |
/* Check classification */
proc freq data=work.renewable_plants_class;
tables Efficiency_Class Efficiency_Class_F / nocum nopercent;
title "EFFICIENCY CLASSIFICATION - FREQUENCY";
run;
OUTPUT:
The FREQ Procedure
| Efficiency_Class | Frequency |
|---|---|
| Hig | 4 |
| Low | 1 |
| Med | 7 |
| Efficiency_Class_F | Frequency |
|---|---|
| High | 4 |
| Low | 1 |
| Medium | 7 |
4) PROC SQL: summary tables by Technology and by Country
proc sql;
create table work.summary_by_tech as
select Technology,
count(*) as N_PowerPlants,
sum(Capacity_MW) as Total_Capacity_MW format=comma12.1,
mean(Efficiency_PCT) as Mean_Efficiency_PCT format=8.2,
sum(Carbon_Reduction_TonsPerYear) as Total_Carbon_Reduction format=comma12.,
mean(Cost_USD_Million) as Mean_Cost_Million format=comma12.2
from work.renewable_plants_class
group by Technology
order by Total_Capacity_MW desc;
quit;
proc print data=work.summary_by_tech noobs label;
title "SUMMARY BY TECHNOLOGY (PROC SQL)";
run;
OUTPUT:
| Technology | N_PowerPlants | Total_Capacity_MW | Mean_Efficiency_PCT | Total_Carbon_Reduction | Mean_Cost_Million |
|---|---|---|---|---|---|
| Hydro | 4 | 695.0 | 44.45 | 970,000 | 310.00 |
| Wind | 4 | 580.0 | 32.18 | 710,000 | 182.50 |
| Solar | 4 | 375.0 | 20.90 | 450,000 | 74.13 |
/* Summary by country */
proc sql;
create table work.summary_by_country as
select Country,
count(*) as N_PowerPlants,
sum(Capacity_MW) as Total_Capacity_MW format=comma12.1,
mean(Efficiency_PCT) as Mean_Efficiency_PCT format=8.2
from work.renewable_plants_class
group by Country
order by Total_Capacity_MW desc;
quit;
proc print data=work.summary_by_country noobs label;
title "SUMMARY BY COUNTRY (PROC SQL)";
run;
OUTPUT:
| Country | N_PowerPlants | Total_Capacity_MW | Mean_Efficiency_PCT |
|---|---|---|---|
| India | 2 | 310.0 | 27.35 |
| Brazil | 1 | 300.0 | 45.20 |
| China | 1 | 210.0 | 44.10 |
| USA | 1 | 200.0 | 33.80 |
| Germany | 1 | 130.0 | 31.60 |
| UAE | 1 | 120.0 | 22.70 |
| Norway | 1 | 110.0 | 46.50 |
| UK | 1 | 90.0 | 29.10 |
| Canada | 1 | 75.0 | 42.00 |
| Spain | 1 | 60.0 | 19.40 |
| Australia | 1 | 45.0 | 21.00 |
5) PROC MEANS: Descriptive statistics by Technology
proc means data=work.renewable_plants_class n mean std min max median maxdec=2;
class Technology;
var Capacity_MW Efficiency_PCT Cost_USD_Million Carbon_Reduction_TonsPerYear Age_Years;
title "DESCRIPTIVE STATISTICS BY TECHNOLOGY (PROC MEANS)";
run;
OUTPUT:
The MEANS Procedure
| Technology | N Obs | Variable | N | Mean | Std Dev | Minimum | Maximum | Median |
|---|---|---|---|---|---|---|---|---|
| Hydro | 4 | Capacity_MW Efficiency_PCT Cost_USD_Million Carbon_Reduction_TonsPerYear Age_Years | 4 4 4 4 4 | 173.75 44.45 310.00 242500.00 13.75 | 101.77 1.91 157.06 142448.82 2.63 | 75.00 42.00 140.00 90000.00 10.00 | 300.00 46.50 480.00 400000.00 16.00 | 160.00 44.65 310.00 240000.00 14.50 |
| Solar | 4 | Capacity_MW Efficiency_PCT Cost_USD_Million Carbon_Reduction_TonsPerYear Age_Years | 4 4 4 4 4 | 93.75 20.90 74.13 112500.00 7.00 | 49.56 1.37 40.37 62383.22 1.83 | 45.00 19.40 36.00 50000.00 5.00 | 150.00 22.70 120.50 180000.00 9.00 | 90.00 20.75 70.00 110000.00 7.00 |
| Wind | 4 | Capacity_MW Efficiency_PCT Cost_USD_Million Carbon_Reduction_TonsPerYear Age_Years | 4 4 4 4 4 | 145.00 32.18 182.50 177500.00 10.75 | 46.55 2.35 57.37 68980.67 2.63 | 90.00 29.10 110.00 90000.00 7.00 | 200.00 34.20 250.00 250000.00 13.00 | 145.00 32.70 185.00 185000.00 11.50 |
6) PROC TABULATE: Nicely formatted report
- Cross-tab: Technology x Efficiency_Class with sums and means
proc tabulate data=work.renewable_plants_class format=8.2 noseps missing;
class Technology Efficiency_Class;
var Capacity_MW Efficiency_PCT Cost_USD_Million Carbon_Reduction_TonsPerYear;
table Technology,
(Capacity_MW*sum='Total Capacity (MW)'
Efficiency_PCT*mean='Avg Efficiency (%)'
Cost_USD_Million*mean='Avg Cost (M$)')*f=comma12.2
/ rts=20;
keylabel sum='Sum' mean='Mean';
title "TABULATED SUMMARY: TECHNOLOGY x METRICS (PROC TABULATE)";
run;
OUTPUT:
| Capacity_MW | Efficiency_PCT | Cost_USD_Million | |
|---|---|---|---|
| Total Capacity (MW) | Avg Efficiency (%) | Avg Cost (M$) | |
| Technology | 695.00 | 44.45 | 310.00 |
| Hydro | |||
| Solar | 375.00 | 20.90 | 74.13 |
| Wind | 580.00 | 32.18 | 182.50 |
7) PROC SGPLOT: Visualizations
- Bar chart: Total Capacity by Country
- Scatter: Efficiency vs Capacity, sized by Carbon_Reduction
- Series or vbar for count by Efficiency_Class
/* Prepare aggregated dataset for plotting capacity by country */
proc sql;
create table work.plot_capacity_by_country as
select Country, sum(Capacity_MW) as Total_Capacity_MW
from work.renewable_plants_class
group by Country;
quit;
proc print data=work.plot_capacity_by_country;
run;
OUTPUT:
| Obs | Country | Total_Capacity_MW |
|---|---|---|
| 1 | Australia | 45 |
| 2 | Brazil | 300 |
| 3 | Canada | 75 |
| 4 | China | 210 |
| 5 | Germany | 130 |
| 6 | India | 310 |
| 7 | Norway | 110 |
| 8 | Spain | 60 |
| 9 | UAE | 120 |
| 10 | UK | 90 |
| 11 | USA | 200 |
proc sgplot data=work.plot_capacity_by_country;
vbar Country / response=Total_Capacity_MW datalabel;
yaxis label="Total Capacity (MW)";
title "TOTAL CAPACITY BY COUNTRY (MW)";
run;
OUTPUT:
proc sgplot data=work.renewable_plants_class;
bubble x=Capacity_MW y=Efficiency_PCT size=Carbon_Reduction_TonsPerYear / datalabel=Plant_Name;
xaxis label="Capacity (MW)";
yaxis label="Efficiency (%)";
title "EFFICIENCY vs CAPACITY (BUBBLE = CARBON REDUCTION)";
run;
OUTPUT:
proc sgplot data=work.renewable_plants_class;
vbar Efficiency_Class / stat=freq datalabel;
xaxis label="Efficiency Class";
yaxis label="Number of Plants";
title "NUMBER OF PLANTS BY EFFICIENCY CLASS";
run;
OUTPUT:
8) PROC CORR: Check correlation between key performance metrics
proc corr data=work.renewable_plants_class plots=matrix(histogram);
var Capacity_MW Cost_USD_Million Efficiency_PCT Carbon_Reduction_TonsPerYear Age_Years;
title "CORRELATION ANALYSIS OF RENEWABLE PLANT METRICS (PROC CORR)";
run;
OUTPUT:
The CORR Procedure
| 5 Variables: | Capacity_MW Cost_USD_Million Efficiency_PCT Carbon_Reduction_TonsPerYear Age_Years |
|---|
| Simple Statistics | ||||||
|---|---|---|---|---|---|---|
| Variable | N | Mean | Std Dev | Sum | Minimum | Maximum |
| Capacity_MW | 12 | 137.50000 | 72.66048 | 1650 | 45.00000 | 300.00000 |
| Cost_USD_Million | 12 | 188.87500 | 134.93469 | 2267 | 36.00000 | 480.00000 |
| Efficiency_PCT | 12 | 32.50833 | 10.19335 | 390.10000 | 19.40000 | 46.50000 |
| Carbon_Reduction_TonsPerYear | 12 | 177500 | 104718 | 2130000 | 50000 | 400000 |
| Age_Years | 12 | 10.50000 | 3.60555 | 126.00000 | 5.00000 | 16.00000 |
| Pearson Correlation Coefficients, N = 12 Prob > |r| under H0: Rho=0 | |||||
|---|---|---|---|---|---|
| Capacity_MW | Cost_USD_Million | Efficiency_PCT | Carbon_Reduction_TonsPerYear | Age_Years | |
| Capacity_MW | 1.00000 | 0.91379 <.0001 | 0.52954 0.0766 | 0.98718 <.0001 | 0.69228 0.0126 |
| Cost_USD_Million | 0.91379 <.0001 | 1.00000 | 0.78379 0.0026 | 0.94888 <.0001 | 0.85754 0.0004 |
| Efficiency_PCT | 0.52954 0.0766 | 0.78379 0.0026 | 1.00000 | 0.59440 0.0415 | 0.84335 0.0006 |
| Carbon_Reduction_TonsPerYear | 0.98718 <.0001 | 0.94888 <.0001 | 0.59440 0.0415 | 1.00000 | 0.75724 0.0043 |
| Age_Years | 0.69228 0.0126 | 0.85754 0.0004 | 0.84335 0.0006 | 0.75724 0.0043 | 1.00000 |
9) PROC UNIVARIATE: Distribution, normality, outliers Using OUTPUT statement for percentiles
proc univariate data=work.renewable_plants_class noprint;
var Capacity_MW Efficiency_PCT Carbon_Reduction_TonsPerYear;
histogram Capacity_MW / normal;
inset n mean std min max / position=ne;
title "DISTRIBUTION AND OUTLIER ANALYSIS (PROC UNIVARIATE)";
run;
The UNIVARIATE Procedure
The UNIVARIATE Procedure
Fitted Normal Distribution for Capacity_MW
| Parameters for Normal Distribution | ||
|---|---|---|
| Parameter | Symbol | Estimate |
| Mean | Mu | 137.5 |
| Std Dev | Sigma | 72.66048 |
| Goodness-of-Fit Tests for Normal Distribution | ||||
|---|---|---|---|---|
| Test | Statistic | p Value | ||
| Kolmogorov-Smirnov | D | 0.12840999 | Pr > D | >0.150 |
| Cramer-von Mises | W-Sq | 0.03540932 | Pr > W-Sq | >0.250 |
| Anderson-Darling | A-Sq | 0.26628388 | Pr > A-Sq | >0.250 |
| Quantiles for Normal Distribution | ||
|---|---|---|
| Percent | Quantile | |
| Observed | Estimated | |
| 1.0 | 45.0000 | -31.5336 |
| 5.0 | 45.0000 | 17.9841 |
| 10.0 | 60.0000 | 44.3818 |
| 25.0 | 82.5000 | 88.4913 |
| 50.0 | 125.0000 | 137.5000 |
| 75.0 | 180.0000 | 186.5087 |
| 90.0 | 210.0000 | 230.6182 |
| 95.0 | 300.0000 | 257.0159 |
| 99.0 | 300.0000 | 306.5336 |
Comments
Post a Comment