SAS STADIUMS DATA CREATION AND ANALYSIS DATA STEP | PROC SQL | PROC MEANS | PROC REG | PROC SGPLOT | MACROS
options nocenter;
1.Creating Stadium Raw Master Dataset
data stadiums_raw;
length Stadium_Name $30 Country $20;
input Stadium_Name $ Country $ Capacity Year_Built Revenue Maintenance_Cost;
Built_Date = mdy(1,1,Year_Built);
format Built_Date date9.;
datalines;
Wembley England 90000 2007 120 45
Camp_Nou Spain 99354 1957 140 55
Melbourne_Cricket_Ground Australia 100024 1853 160 70
Maracana Brazil 78838 1950 95 40
MetLife USA 82500 2010 180 60
FNB_Stadium SouthAfrica 94736 1989 85 35
Old_Trafford England 74879 1910 130 50
Allianz_Arena Germany 75000 2005 155 58
Birds_Nest China 80000 2008 110 65
San_Siro Italy 80018 1926 125 52
Azteca Mexico 87000 1966 105 42
Lusail Qatar 80000 2022 200 75
;
run;
proc print data=stadiums_raw;
run;
OUTPUT:
| Obs | Stadium_Name | Country | Capacity | Year_Built | Revenue | Maintenance_Cost | Built_Date |
|---|---|---|---|---|---|---|---|
| 1 | Wembley | England | 90000 | 2007 | 120 | 45 | 01JAN2007 |
| 2 | Camp_Nou | Spain | 99354 | 1957 | 140 | 55 | 01JAN1957 |
| 3 | Melbourne_Cricket_Ground | Australia | 100024 | 1853 | 160 | 70 | 01JAN1853 |
| 4 | Maracana | Brazil | 78838 | 1950 | 95 | 40 | 01JAN1950 |
| 5 | MetLife | USA | 82500 | 2010 | 180 | 60 | 01JAN2010 |
| 6 | FNB_Stadium | SouthAfrica | 94736 | 1989 | 85 | 35 | 01JAN1989 |
| 7 | Old_Trafford | England | 74879 | 1910 | 130 | 50 | 01JAN1910 |
| 8 | Allianz_Arena | Germany | 75000 | 2005 | 155 | 58 | 01JAN2005 |
| 9 | Birds_Nest | China | 80000 | 2008 | 110 | 65 | 01JAN2008 |
| 10 | San_Siro | Italy | 80018 | 1926 | 125 | 52 | 01JAN1926 |
| 11 | Azteca | Mexico | 87000 | 1966 | 105 | 42 | 01JAN1966 |
| 12 | Lusail | Qatar | 80000 | 2022 | 200 | 75 | 01JAN2022 |
2.Creating Derived Variables
2.1 Stadium Age & Profit Calculation
data stadiums_derived;
set stadiums_raw;
Stadium_Age = intck('year', Built_Date, today());
Profit = Revenue - Maintenance_Cost;
run;
proc print data=stadiums_derived;
run;
OUTPUT:
| Obs | Stadium_Name | Country | Capacity | Year_Built | Revenue | Maintenance_Cost | Built_Date | Stadium_Age | Profit |
|---|---|---|---|---|---|---|---|---|---|
| 1 | Wembley | England | 90000 | 2007 | 120 | 45 | 01JAN2007 | 19 | 75 |
| 2 | Camp_Nou | Spain | 99354 | 1957 | 140 | 55 | 01JAN1957 | 69 | 85 |
| 3 | Melbourne_Cricket_Ground | Australia | 100024 | 1853 | 160 | 70 | 01JAN1853 | 173 | 90 |
| 4 | Maracana | Brazil | 78838 | 1950 | 95 | 40 | 01JAN1950 | 76 | 55 |
| 5 | MetLife | USA | 82500 | 2010 | 180 | 60 | 01JAN2010 | 16 | 120 |
| 6 | FNB_Stadium | SouthAfrica | 94736 | 1989 | 85 | 35 | 01JAN1989 | 37 | 50 |
| 7 | Old_Trafford | England | 74879 | 1910 | 130 | 50 | 01JAN1910 | 116 | 80 |
| 8 | Allianz_Arena | Germany | 75000 | 2005 | 155 | 58 | 01JAN2005 | 21 | 97 |
| 9 | Birds_Nest | China | 80000 | 2008 | 110 | 65 | 01JAN2008 | 18 | 45 |
| 10 | San_Siro | Italy | 80018 | 1926 | 125 | 52 | 01JAN1926 | 100 | 73 |
| 11 | Azteca | Mexico | 87000 | 1966 | 105 | 42 | 01JAN1966 | 60 | 63 |
| 12 | Lusail | Qatar | 80000 | 2022 | 200 | 75 | 01JAN2022 | 4 | 125 |
3.Date Manipulation with INTNX
3.1 Renovation Date Estimation
data stadiums_dates;
set stadiums_derived;
Renovation_Date = intnx('year', Built_Date, 20);
format Renovation_Date date9.;
run;
proc print data=stadiums_dates;
run;
OUTPUT:
| Obs | Stadium_Name | Country | Capacity | Year_Built | Revenue | Maintenance_Cost | Built_Date | Stadium_Age | Profit | Renovation_Date |
|---|---|---|---|---|---|---|---|---|---|---|
| 1 | Wembley | England | 90000 | 2007 | 120 | 45 | 01JAN2007 | 19 | 75 | 01JAN2027 |
| 2 | Camp_Nou | Spain | 99354 | 1957 | 140 | 55 | 01JAN1957 | 69 | 85 | 01JAN1977 |
| 3 | Melbourne_Cricket_Ground | Australia | 100024 | 1853 | 160 | 70 | 01JAN1853 | 173 | 90 | 01JAN1873 |
| 4 | Maracana | Brazil | 78838 | 1950 | 95 | 40 | 01JAN1950 | 76 | 55 | 01JAN1970 |
| 5 | MetLife | USA | 82500 | 2010 | 180 | 60 | 01JAN2010 | 16 | 120 | 01JAN2030 |
| 6 | FNB_Stadium | SouthAfrica | 94736 | 1989 | 85 | 35 | 01JAN1989 | 37 | 50 | 01JAN2009 |
| 7 | Old_Trafford | England | 74879 | 1910 | 130 | 50 | 01JAN1910 | 116 | 80 | 01JAN1930 |
| 8 | Allianz_Arena | Germany | 75000 | 2005 | 155 | 58 | 01JAN2005 | 21 | 97 | 01JAN2025 |
| 9 | Birds_Nest | China | 80000 | 2008 | 110 | 65 | 01JAN2008 | 18 | 45 | 01JAN2028 |
| 10 | San_Siro | Italy | 80018 | 1926 | 125 | 52 | 01JAN1926 | 100 | 73 | 01JAN1946 |
| 11 | Azteca | Mexico | 87000 | 1966 | 105 | 42 | 01JAN1966 | 60 | 63 | 01JAN1986 |
| 12 | Lusail | Qatar | 80000 | 2022 | 200 | 75 | 01JAN2022 | 4 | 125 | 01JAN2042 |
4.Descriptive Statistics (PROC MEANS)
proc means data=stadiums_dates mean min max std;
var Capacity Revenue Maintenance_Cost Stadium_Age Profit;
run;
OUTPUT:
The MEANS Procedure
| Variable | Mean | Minimum | Maximum | Std Dev |
|---|---|---|---|---|
Capacity Revenue Maintenance_Cost Stadium_Age Profit | 85195.75 133.7500000 53.9166667 59.0833333 79.8333333 | 74879.00 85.0000000 35.0000000 4.0000000 45.0000000 | 100024.00 200.0000000 75.0000000 173.0000000 125.0000000 | 8926.79 34.7147467 12.3322072 50.8106258 25.4980689 |
5.SQL-Based Analysis (PROC SQL)
5.1 High Revenue Stadiums
proc sql;
create table high_revenue as
select Stadium_Name, Country, Revenue
from stadiums_dates
where Revenue > 130
order by Revenue desc;
quit;
proc print data=high_revenue;
run;
OUTPUT:
| Obs | Stadium_Name | Country | Revenue |
|---|---|---|---|
| 1 | Lusail | Qatar | 200 |
| 2 | MetLife | USA | 180 |
| 3 | Melbourne_Cricket_Ground | Australia | 160 |
| 4 | Allianz_Arena | Germany | 155 |
| 5 | Camp_Nou | Spain | 140 |
6.Regression Analysis (PROC REG)
6.1 Capacity Impact on Revenue
proc reg data=stadiums_dates;
model Revenue = Capacity Stadium_Age;
run;
quit;
OUTPUT:
The REG Procedure
Model: MODEL1
Dependent Variable: Revenue
| Number of Observations Read | 12 |
|---|---|
| Number of Observations Used | 12 |
| Analysis of Variance | |||||
|---|---|---|---|---|---|
| Source | DF | Sum of Squares | Mean Square | F Value | Pr > F |
| Model | 2 | 212.66969 | 106.33485 | 0.07 | 0.9298 |
| Error | 9 | 13044 | 1449.28670 | ||
| Corrected Total | 11 | 13256 | |||
| Root MSE | 38.06950 | R-Square | 0.0160 |
|---|---|---|---|
| Dependent Mean | 133.75000 | Adj R-Sq | -0.2026 |
| Coeff Var | 28.46318 |
| Parameter Estimates | |||||
|---|---|---|---|---|---|
| Variable | DF | Parameter Estimate | Standard Error | t Value | Pr > |t| |
| Intercept | 1 | 161.44763 | 112.11259 | 1.44 | 0.1837 |
| Capacity | 1 | -0.00028572 | 0.00135 | -0.21 | 0.8372 |
| Stadium_Age | 1 | -0.05679 | 0.23727 | -0.24 | 0.8162 |
The REG Procedure
Model: MODEL1
Dependent Variable: Revenue
7.Visualization (PROC SGPLOT)
proc sgplot data=stadiums_dates;
scatter x=Capacity y=Revenue;
reg x=Capacity y=Revenue;
title "Revenue vs Capacity Analysis";
run;
OUTPUT:
8.Macro for Stadium Performance Rating
%macro stadium_rating;
data stadiums_rating;
set stadiums_dates;
if Profit >= 100 then Rating="Excellent";
else if Profit >= 70 then Rating="Good";
else if Profit >= 40 then Rating="Average";
else Rating="Poor";
run;
proc print data=stadiums_rating;
run;
%mend;
%stadium_rating;
OUTPUT:
| Obs | Stadium_Name | Country | Capacity | Year_Built | Revenue | Maintenance_Cost | Built_Date | Stadium_Age | Profit | Renovation_Date | Rating |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | Wembley | England | 90000 | 2007 | 120 | 45 | 01JAN2007 | 19 | 75 | 01JAN2027 | Good |
| 2 | Camp_Nou | Spain | 99354 | 1957 | 140 | 55 | 01JAN1957 | 69 | 85 | 01JAN1977 | Good |
| 3 | Melbourne_Cricket_Ground | Australia | 100024 | 1853 | 160 | 70 | 01JAN1853 | 173 | 90 | 01JAN1873 | Good |
| 4 | Maracana | Brazil | 78838 | 1950 | 95 | 40 | 01JAN1950 | 76 | 55 | 01JAN1970 | Average |
| 5 | MetLife | USA | 82500 | 2010 | 180 | 60 | 01JAN2010 | 16 | 120 | 01JAN2030 | Excellent |
| 6 | FNB_Stadium | SouthAfrica | 94736 | 1989 | 85 | 35 | 01JAN1989 | 37 | 50 | 01JAN2009 | Average |
| 7 | Old_Trafford | England | 74879 | 1910 | 130 | 50 | 01JAN1910 | 116 | 80 | 01JAN1930 | Good |
| 8 | Allianz_Arena | Germany | 75000 | 2005 | 155 | 58 | 01JAN2005 | 21 | 97 | 01JAN2025 | Good |
| 9 | Birds_Nest | China | 80000 | 2008 | 110 | 65 | 01JAN2008 | 18 | 45 | 01JAN2028 | Average |
| 10 | San_Siro | Italy | 80018 | 1926 | 125 | 52 | 01JAN1926 | 100 | 73 | 01JAN1946 | Good |
| 11 | Azteca | Mexico | 87000 | 1966 | 105 | 42 | 01JAN1966 | 60 | 63 | 01JAN1986 | Average |
| 12 | Lusail | Qatar | 80000 | 2022 | 200 | 75 | 01JAN2022 | 4 | 125 | 01JAN2042 | Excellent |
9.APPEND Example
9.1 New Stadium Data
data new_stadium;
length Stadium_Name $30 Country $20;
Stadium_Name="SoFi";
Country="USA";
Capacity=70000;
Year_Built=2020;
Revenue=210;
Maintenance_Cost=80;
Built_Date=mdy(1,1,2020);
format Built_Date date9.;
run;
proc print data=new_stadium;
run;
OUTPUT:
| Obs | Stadium_Name | Country | Capacity | Year_Built | Revenue | Maintenance_Cost | Built_Date |
|---|---|---|---|---|---|---|---|
| 1 | SoFi | USA | 70000 | 2020 | 210 | 80 | 01JAN2020 |
proc append base=stadiums_raw
data=new_stadium force;
run;
proc print data=stadiums_raw;
run;
OUTPUT:
| Obs | Stadium_Name | Country | Capacity | Year_Built | Revenue | Maintenance_Cost | Built_Date |
|---|---|---|---|---|---|---|---|
| 1 | Wembley | England | 90000 | 2007 | 120 | 45 | 01JAN2007 |
| 2 | Camp_Nou | Spain | 99354 | 1957 | 140 | 55 | 01JAN1957 |
| 3 | Melbourne_Cricket_Ground | Australia | 100024 | 1853 | 160 | 70 | 01JAN1853 |
| 4 | Maracana | Brazil | 78838 | 1950 | 95 | 40 | 01JAN1950 |
| 5 | MetLife | USA | 82500 | 2010 | 180 | 60 | 01JAN2010 |
| 6 | FNB_Stadium | SouthAfrica | 94736 | 1989 | 85 | 35 | 01JAN1989 |
| 7 | Old_Trafford | England | 74879 | 1910 | 130 | 50 | 01JAN1910 |
| 8 | Allianz_Arena | Germany | 75000 | 2005 | 155 | 58 | 01JAN2005 |
| 9 | Birds_Nest | China | 80000 | 2008 | 110 | 65 | 01JAN2008 |
| 10 | San_Siro | Italy | 80018 | 1926 | 125 | 52 | 01JAN1926 |
| 11 | Azteca | Mexico | 87000 | 1966 | 105 | 42 | 01JAN1966 |
| 12 | Lusail | Qatar | 80000 | 2022 | 200 | 75 | 01JAN2022 |
| 13 | SoFi | USA | 70000 | 2020 | 210 | 80 | 01JAN2020 |
10.SET vs MERGE Demonstration
10.1 SET (Vertical Combination)
data combined_stadiums;
set stadiums_raw new_stadium;
run;
proc print data=combined_stadiums;
run;
OUTPUT:
| Obs | Stadium_Name | Country | Capacity | Year_Built | Revenue | Maintenance_Cost | Built_Date |
|---|---|---|---|---|---|---|---|
| 1 | Wembley | England | 90000 | 2007 | 120 | 45 | 01JAN2007 |
| 2 | Camp_Nou | Spain | 99354 | 1957 | 140 | 55 | 01JAN1957 |
| 3 | Melbourne_Cricket_Ground | Australia | 100024 | 1853 | 160 | 70 | 01JAN1853 |
| 4 | Maracana | Brazil | 78838 | 1950 | 95 | 40 | 01JAN1950 |
| 5 | MetLife | USA | 82500 | 2010 | 180 | 60 | 01JAN2010 |
| 6 | FNB_Stadium | SouthAfrica | 94736 | 1989 | 85 | 35 | 01JAN1989 |
| 7 | Old_Trafford | England | 74879 | 1910 | 130 | 50 | 01JAN1910 |
| 8 | Allianz_Arena | Germany | 75000 | 2005 | 155 | 58 | 01JAN2005 |
| 9 | Birds_Nest | China | 80000 | 2008 | 110 | 65 | 01JAN2008 |
| 10 | San_Siro | Italy | 80018 | 1926 | 125 | 52 | 01JAN1926 |
| 11 | Azteca | Mexico | 87000 | 1966 | 105 | 42 | 01JAN1966 |
| 12 | Lusail | Qatar | 80000 | 2022 | 200 | 75 | 01JAN2022 |
| 13 | SoFi | USA | 70000 | 2020 | 210 | 80 | 01JAN2020 |
| 14 | SoFi | USA | 70000 | 2020 | 210 | 80 | 01JAN2020 |
10.2 MERGE (Horizontal Combination)
proc sort data=stadiums_dates;
by Stadium_Name;
run;
proc print data=stadiums_dates;
run;
OUTPUT:
| Obs | Stadium_Name | Country | Capacity | Year_Built | Revenue | Maintenance_Cost | Built_Date | Stadium_Age | Profit | Renovation_Date |
|---|---|---|---|---|---|---|---|---|---|---|
| 1 | Allianz_Arena | Germany | 75000 | 2005 | 155 | 58 | 01JAN2005 | 21 | 97 | 01JAN2025 |
| 2 | Azteca | Mexico | 87000 | 1966 | 105 | 42 | 01JAN1966 | 60 | 63 | 01JAN1986 |
| 3 | Birds_Nest | China | 80000 | 2008 | 110 | 65 | 01JAN2008 | 18 | 45 | 01JAN2028 |
| 4 | Camp_Nou | Spain | 99354 | 1957 | 140 | 55 | 01JAN1957 | 69 | 85 | 01JAN1977 |
| 5 | FNB_Stadium | SouthAfrica | 94736 | 1989 | 85 | 35 | 01JAN1989 | 37 | 50 | 01JAN2009 |
| 6 | Lusail | Qatar | 80000 | 2022 | 200 | 75 | 01JAN2022 | 4 | 125 | 01JAN2042 |
| 7 | Maracana | Brazil | 78838 | 1950 | 95 | 40 | 01JAN1950 | 76 | 55 | 01JAN1970 |
| 8 | Melbourne_Cricket_Ground | Australia | 100024 | 1853 | 160 | 70 | 01JAN1853 | 173 | 90 | 01JAN1873 |
| 9 | MetLife | USA | 82500 | 2010 | 180 | 60 | 01JAN2010 | 16 | 120 | 01JAN2030 |
| 10 | Old_Trafford | England | 74879 | 1910 | 130 | 50 | 01JAN1910 | 116 | 80 | 01JAN1930 |
| 11 | San_Siro | Italy | 80018 | 1926 | 125 | 52 | 01JAN1926 | 100 | 73 | 01JAN1946 |
| 12 | Wembley | England | 90000 | 2007 | 120 | 45 | 01JAN2007 | 19 | 75 | 01JAN2027 |
proc sort data=high_revenue;
by Stadium_Name;
run;
proc print data=high_revenue;
run;
OUTPUT:
| Obs | Stadium_Name | Country | Revenue |
|---|---|---|---|
| 1 | Allianz_Arena | Germany | 155 |
| 2 | Camp_Nou | Spain | 140 |
| 3 | Lusail | Qatar | 200 |
| 4 | Melbourne_Cricket_Ground | Australia | 160 |
| 5 | MetLife | USA | 180 |
data stadiums_merge;
merge stadiums_dates(in=a) high_revenue(in=b);
by Stadium_Name;
if a;
run;
proc print data=stadiums_merge;
run;
OUTPUT:
| Obs | Stadium_Name | Country | Capacity | Year_Built | Revenue | Maintenance_Cost | Built_Date | Stadium_Age | Profit | Renovation_Date |
|---|---|---|---|---|---|---|---|---|---|---|
| 1 | Allianz_Arena | Germany | 75000 | 2005 | 155 | 58 | 01JAN2005 | 21 | 97 | 01JAN2025 |
| 2 | Azteca | Mexico | 87000 | 1966 | 105 | 42 | 01JAN1966 | 60 | 63 | 01JAN1986 |
| 3 | Birds_Nest | China | 80000 | 2008 | 110 | 65 | 01JAN2008 | 18 | 45 | 01JAN2028 |
| 4 | Camp_Nou | Spain | 99354 | 1957 | 140 | 55 | 01JAN1957 | 69 | 85 | 01JAN1977 |
| 5 | FNB_Stadium | SouthAfrica | 94736 | 1989 | 85 | 35 | 01JAN1989 | 37 | 50 | 01JAN2009 |
| 6 | Lusail | Qatar | 80000 | 2022 | 200 | 75 | 01JAN2022 | 4 | 125 | 01JAN2042 |
| 7 | Maracana | Brazil | 78838 | 1950 | 95 | 40 | 01JAN1950 | 76 | 55 | 01JAN1970 |
| 8 | Melbourne_Cricket_Ground | Australia | 100024 | 1853 | 160 | 70 | 01JAN1853 | 173 | 90 | 01JAN1873 |
| 9 | MetLife | USA | 82500 | 2010 | 180 | 60 | 01JAN2010 | 16 | 120 | 01JAN2030 |
| 10 | Old_Trafford | England | 74879 | 1910 | 130 | 50 | 01JAN1910 | 116 | 80 | 01JAN1930 |
| 11 | San_Siro | Italy | 80018 | 1926 | 125 | 52 | 01JAN1926 | 100 | 73 | 01JAN1946 |
| 12 | Wembley | England | 90000 | 2007 | 120 | 45 | 01JAN2007 | 19 | 75 | 01JAN2027 |
11.TRANSPOSE Example
11.1 Converting Columns to Rows
proc transpose data=stadiums_dates out=stadiums_t;
var Revenue Maintenance_Cost Profit;
by Stadium_Name;
run;
proc print data=stadiums_t;
run;
OUTPUT:
| Obs | Stadium_Name | _NAME_ | COL1 |
|---|---|---|---|
| 1 | Allianz_Arena | Revenue | 155 |
| 2 | Allianz_Arena | Maintenance_Cost | 58 |
| 3 | Allianz_Arena | Profit | 97 |
| 4 | Azteca | Revenue | 105 |
| 5 | Azteca | Maintenance_Cost | 42 |
| 6 | Azteca | Profit | 63 |
| 7 | Birds_Nest | Revenue | 110 |
| 8 | Birds_Nest | Maintenance_Cost | 65 |
| 9 | Birds_Nest | Profit | 45 |
| 10 | Camp_Nou | Revenue | 140 |
| 11 | Camp_Nou | Maintenance_Cost | 55 |
| 12 | Camp_Nou | Profit | 85 |
| 13 | FNB_Stadium | Revenue | 85 |
| 14 | FNB_Stadium | Maintenance_Cost | 35 |
| 15 | FNB_Stadium | Profit | 50 |
| 16 | Lusail | Revenue | 200 |
| 17 | Lusail | Maintenance_Cost | 75 |
| 18 | Lusail | Profit | 125 |
| 19 | Maracana | Revenue | 95 |
| 20 | Maracana | Maintenance_Cost | 40 |
| 21 | Maracana | Profit | 55 |
| 22 | Melbourne_Cricket_Ground | Revenue | 160 |
| 23 | Melbourne_Cricket_Ground | Maintenance_Cost | 70 |
| 24 | Melbourne_Cricket_Ground | Profit | 90 |
| 25 | MetLife | Revenue | 180 |
| 26 | MetLife | Maintenance_Cost | 60 |
| 27 | MetLife | Profit | 120 |
| 28 | Old_Trafford | Revenue | 130 |
| 29 | Old_Trafford | Maintenance_Cost | 50 |
| 30 | Old_Trafford | Profit | 80 |
| 31 | San_Siro | Revenue | 125 |
| 32 | San_Siro | Maintenance_Cost | 52 |
| 33 | San_Siro | Profit | 73 |
| 34 | Wembley | Revenue | 120 |
| 35 | Wembley | Maintenance_Cost | 45 |
| 36 | Wembley | Profit | 75 |
No comments:
Post a Comment