Monday, 17 November 2025

312.AIRPORT TRAFFIC ANALYSIS USING PROC SQL | PROC MEANS | PROC TABULATE | PROC SGPLOT | PROC UNIVARIATE | PROC FREQ | PROC REPORT | PROC TRANSPOSE | PROC CORR | MACRO-BASED BINNING IN SAS

AIRPORT TRAFFIC ANALYSIS USING PROC SQL | PROC MEANS | PROC TABULATE | PROC SGPLOT | PROC UNIVARIATE | PROC FREQ | PROC REPORT | PROC TRANSPOSE | PROC CORR | MACRO-BASED  BINNING IN SAS


 options nocenter validvarname=any nodate nonumber;

1. Create airport dataset 

data work.airports;

    infile datalines dsd truncover;

    informat Airport_Code $8. Country $40. Continent $20. Passengers_per_Year comma12.

             Cargo_Tons comma12. Runways 8. EST_DATE date9. LAST_RENOV date9.;

    format EST_DATE LAST_RENOV date9.;

    input Airport_Code :$8. Country :$40. Continent :$20. Passengers_per_Year :comma12.

          Cargo_Tons :comma12. Runways EST_DATE :date9. LAST_RENOV :date9.;

datalines;

JFK,United States,North_America,61500000,220000,4,04JAN1948,15JUN2019

LHR,United Kingdom,Europe,80000000,210000,2,01MAR1946,01SEP2015

DEL,India,Asia,65000000,150000,3,01APR1962,12DEC2018

DXB,United Arab Emirates,Asia,89000000,250000,3,25JUL1960,10NOV2020

SIN,Singapore,Asia,62000000,180000,3,01APR1971,30APR2017

HND,Japan,Asia,85000000,170000,4,01MAY1964,05MAR2018

LAX,United States,North_America,88000000,300000,4,01OCT1930,20AUG2016

CDG,France,Europe,72000000,160000,4,08MAR1974,20MAY2018

AMS,Netherlands,Europe,52000000,140000,6,01JUN1967,01JUN2019

PEK,China,Asia,101000000,400000,4,01JAN1958,14FEB2021

SYD,Australia,Oceania,42000000,90000,3,15MAY1919,01JUL2017

GRU,Brazil,South_America,43000000,85000,3,01JAN1936,22SEP2016

;

run;

proc print data=work.airports label noobs;

    title "Airport Source Dataset (work.airports)";

    var Airport_Code Country Continent Passengers_per_Year Cargo_Tons Runways EST_DATE          LAST_RENOV;

run;

OUTPUT:

Airport Source Dataset (work.airports)

Airport_CodeCountryContinentPassengers_per_YearCargo_TonsRunwaysEST_DATELAST_RENOV
JFKUnited StatesNorth_America61500000220000404JAN194815JUN2019
LHRUnited KingdomEurope80000000210000201MAR194601SEP2015
DELIndiaAsia65000000150000301APR196212DEC2018
DXBUnited Arab EmiratesAsia89000000250000325JUL196010NOV2020
SINSingaporeAsia62000000180000301APR197130APR2017
HNDJapanAsia85000000170000401MAY196405MAR2018
LAXUnited StatesNorth_America88000000300000401OCT193020AUG2016
CDGFranceEurope72000000160000408MAR197420MAY2018
AMSNetherlandsEurope52000000140000601JUN196701JUN2019
PEKChinaAsia101000000400000401JAN195814FEB2021
SYDAustraliaOceania4200000090000315MAY191901JUL2017
GRUBrazilSouth_America4300000085000301JAN193622SEP2016

2. Derived date variables and numeric checks

data work.airports2;

  set work.airports;

  /* Derive calendar variables */

  Est_Year = year(EST_DATE);

  Est_Qtr  = qtr(EST_DATE);

  Est_Month = month(EST_DATE);

  LastRenov_Year = year(LAST_RENOV);

  Days_Since_Established = today() - EST_DATE;

  Age_Years = int(Days_Since_Established / 365.25);


  /* Normalize continent value (ensure proper capitalization for reporting) */

  Continent_up = propcase(strip(Continent));


  /* Basic data quality flags */

  flag_passengers_missing = (missing(Passengers_per_Year));

  flag_cargo_missing = (missing(Cargo_Tons));

  flag_runways_zero = (Runways <= 0);


  output;

run;

Proc print data=work.airports2;

run;

OUTPUT:

ObsAirport_CodeCountryContinentPassengers_per_YearCargo_TonsRunwaysEST_DATELAST_RENOVEst_YearEst_QtrEst_MonthLastRenov_YearDays_Since_EstablishedAge_YearsContinent_upflag_passengers_missingflag_cargo_missingflag_runways_zero
1JFKUnited StatesNorth_America61500000220000404JAN194815JUN201919481120192844377North_america000
2LHRUnited KingdomEurope80000000210000201MAR194601SEP201519461320152911779Europe000
3DELIndiaAsia65000000150000301APR196212DEC201819622420182324263Asia000
4DXBUnited Arab EmiratesAsia89000000250000325JUL196010NOV202019603720202385765Asia000
5SINSingaporeAsia62000000180000301APR197130APR201719712420171995554Asia000
6HNDJapanAsia85000000170000401MAY196405MAR201819642520182248161Asia000
7LAXUnited StatesNorth_America88000000300000401OCT193020AUG2016193041020163474795North_america000
8CDGFranceEurope72000000160000408MAR197420MAY201819741320181888351Europe000
9AMSNetherlandsEurope52000000140000601JUN196701JUN201919672620192135558Europe000
10PEKChinaAsia101000000400000401JAN195814FEB202119581120212479367Asia000
11SYDAustraliaOceania4200000090000315MAY191901JUL2017191925201738904106Oceania000
12GRUBrazilSouth_America4300000085000301JAN193622SEP201619361120163282989South_america000


3. PROC MEANS & PROC UNIVARIATE — descriptive stats

proc means data=work.airports2 n mean median std min maxdec=0;

  var Passengers_per_Year Cargo_Tons Runways Age_Years;

  title "Numeric Summary: Passengers, Cargo, Runways, Age";

run;

OUTPUT:

Numeric Summary: Passengers, Cargo, Runways, Age

The MEANS Procedure

VariableNMeanMedianStd DevMinimum
Passengers_per_Year
Cargo_Tons
Runways
Age_Years
12
12
12
12
70041667
196250
4
72
68500000
175000
4
66
19005332
89013
1
17
42000000
85000
2
51

proc univariate data=work.airports2 cibasic;

  var Passengers_per_Year Cargo_Tons;

  title "Univariate Distribution (Passengers and Cargo)";

run;

OUTPUT:

Univariate Distribution (Passengers and Cargo)

The UNIVARIATE Procedure

Variable: Passengers_per_Year

Moments
N12Sum Weights12
Mean70041666.7Sum Observations840500000
Std Deviation19005332.2Variance3.61203E14
Skewness-0.0373413Kurtosis-1.0370466
Uncorrected SS6.28433E16Corrected SS3.97323E15
Coeff Variation27.1343232Std Error Mean5486366.83
Basic Statistical Measures
LocationVariability
Mean70041667Std Deviation19005332
Median68500000Variance3.61203E14
Mode.Range59000000
  Interquartile Range29750000
Basic Confidence Limits Assuming Normality
ParameterEstimate95% Confidence Limits
Mean700416675796625582117079
Std Deviation190053321346328532268736
Variance3.61203E141.8126E141.04127E15
Tests for Location: Mu0=0
TestStatisticp Value
Student's tt12.76649Pr > |t|<.0001
SignM6Pr >= |M|0.0005
Signed RankS39Pr >= |S|0.0005
Quantiles (Definition 5)
LevelQuantile
100% Max101000000
99%101000000
95%101000000
90%89000000
75% Q386500000
50% Median68500000
25% Q156750000
10%43000000
5%42000000
1%42000000
0% Min42000000
Extreme Observations
LowestHighest
ValueObsValueObs
4200000011800000002
4300000012850000006
520000009880000007
615000001890000004
62000000510100000010

Univariate Distribution (Passengers and Cargo)

The UNIVARIATE Procedure

Variable: Cargo_Tons

Moments
N12Sum Weights12
Mean196250Sum Observations2355000
Std Deviation89012.8949Variance7923295455
Skewness1.05541209Kurtosis1.35182923
Uncorrected SS5.49325E11Corrected SS8.71563E10
Coeff Variation45.3568891Std Error Mean25695.8094
Basic Statistical Measures
LocationVariability
Mean196250.0Std Deviation89013
Median175000.0Variance7923295455
Mode.Range315000
  Interquartile Range90000
Basic Confidence Limits Assuming Normality
ParameterEstimate95% Confidence Limits
Mean196250139694252806
Std Deviation8901363056151133
Variance792329545539760973602.28412E10
Tests for Location: Mu0=0
TestStatisticp Value
Student's tt7.637432Pr > |t|<.0001
SignM6Pr >= |M|0.0005
Signed RankS39Pr >= |S|0.0005
Quantiles (Definition 5)
LevelQuantile
100% Max400000
99%400000
95%400000
90%300000
75% Q3235000
50% Median175000
25% Q1145000
10%90000
5%85000
1%85000
0% Min85000
Extreme Observations
LowestHighest
ValueObsValueObs
85000122100002
90000112200001
14000092500004
15000033000007
160000840000010

4. Macro-based binning for traffic categories

%macro traffic_bins(in=work.airports2, out=work.airports_binned, var=Passengers_per_Year,

                    low=20000000, mid=50000000, high=80000000);

  /* Create bins: Low, Medium, High, Very_High using thresholds passed as params */

  data &out.;

    set &in.;

    length Traffic_Category $12.;

    if missing(&var.) then Traffic_Category = 'Missing';

    else if &var. < &low. then Traffic_Category = 'Low';

    else if &var. >= &low. and &var. < &mid. then Traffic_Category = 'Medium';

    else if &var. >= &mid. and &var. < &high. then Traffic_Category = 'High';

    else Traffic_Category = 'Very_High';

  run;

  proc print data=&out.;

  run;

  /* Create format for nice printing */

  proc freq data=&out. noprint;

    tables Traffic_Category / out=_freq_;

  run;

  proc print data=_freq_;

  run;

%mend traffic_bins;


%traffic_bins();

OUTPUT:

ObsAirport_CodeCountryContinentPassengers_per_YearCargo_TonsRunwaysEST_DATELAST_RENOVEst_YearEst_QtrEst_MonthLastRenov_YearDays_Since_EstablishedAge_YearsContinent_upflag_passengers_missingflag_cargo_missingflag_runways_zeroTraffic_Category
1JFKUnited StatesNorth_America61500000220000404JAN194815JUN201919481120192844377North_america000High
2LHRUnited KingdomEurope80000000210000201MAR194601SEP201519461320152911779Europe000Very_High
3DELIndiaAsia65000000150000301APR196212DEC201819622420182324263Asia000High
4DXBUnited Arab EmiratesAsia89000000250000325JUL196010NOV202019603720202385765Asia000Very_High
5SINSingaporeAsia62000000180000301APR197130APR201719712420171995554Asia000High
6HNDJapanAsia85000000170000401MAY196405MAR201819642520182248161Asia000Very_High
7LAXUnited StatesNorth_America88000000300000401OCT193020AUG2016193041020163474795North_america000Very_High
8CDGFranceEurope72000000160000408MAR197420MAY201819741320181888351Europe000High
9AMSNetherlandsEurope52000000140000601JUN196701JUN201919672620192135558Europe000High
10PEKChinaAsia101000000400000401JAN195814FEB202119581120212479367Asia000Very_High
11SYDAustraliaOceania4200000090000315MAY191901JUL2017191925201738904106Oceania000Medium
12GRUBrazilSouth_America4300000085000301JAN193622SEP201619361120163282989South_america000Medium
ObsTraffic_CategoryCOUNTPERCENT
1High541.6667
2Medium216.6667
3Very_High541.6667

proc print data=work.airports_binned noobs;

  title "Airports with Traffic Category (based on Passengers per Year)";

  var Airport_Code Passengers_per_Year Traffic_Category;

run;

OUTPUT:

Airports with Traffic Category (based on Passengers per Year)

Airport_CodePassengers_per_YearTraffic_Category
JFK61500000High
LHR80000000Very_High
DEL65000000High
DXB89000000Very_High
SIN62000000High
HND85000000Very_High
LAX88000000Very_High
CDG72000000High
AMS52000000High
PEK101000000Very_High
SYD42000000Medium
GRU43000000Medium

5. PROC TABULATE — multi-dimensional summary

proc tabulate data=work.airports_binned format=12.0;

  class Continent_up Traffic_Category;

  var Passengers_per_Year Cargo_Tons Runways;

  table

    Continent_up, 

    (Traffic_Category all='All Traffic') *

      (Passengers_per_Year*sum='Total Passengers' 

       Passengers_per_Year*mean='Avg Passengers' 

       Cargo_Tons*sum='Total Cargo (t)' 

       Runways*mean='Avg Runways');

  title "Tabulate: Summary of Passengers/Cargo by Continent and Traffic Category";

run;

OUTPUT:

Tabulate: Summary of Passengers/Cargo by Continent and Traffic Category

 Traffic_CategoryAll Traffic
HighMediumVery_High
Passengers_per_YearPassengers_per_YearCargo_TonsRunwaysPassengers_per_YearPassengers_per_YearCargo_TonsRunwaysPassengers_per_YearPassengers_per_YearCargo_TonsRunwaysPassengers_per_YearPassengers_per_YearCargo_TonsRunways
Total PassengersAvg PassengersTotal Cargo (t)Avg RunwaysTotal PassengersAvg PassengersTotal Cargo (t)Avg RunwaysTotal PassengersAvg PassengersTotal Cargo (t)Avg RunwaysTotal PassengersAvg PassengersTotal Cargo (t)Avg Runways
Continent_up127000000635000003300003....2750000009166666782000044020000008040000011500003
Asia
Europe124000000620000003000005....80000000800000002100002204000000680000005100004
North_america61500000615000002200004....88000000880000003000004149500000747500005200004
Oceania....4200000042000000900003....4200000042000000900003
South_america....4300000043000000850003....4300000043000000850003

6. PROC SGPLOT — visualizations

proc sgplot data=work.airports_binned;

  title "Total Passengers by Continent (sum)";

  vbar Continent_up / response=Passengers_per_Year stat=sum datalabel;

run;

OUTPUT:

The SGPlot Procedure


proc sgplot data=work.airports_binned;

  title "Passengers vs Cargo (scatter, labeled by Airport_Code)";

  scatter x=Passengers_per_Year y=Cargo_Tons / datalabel=Airport_Code markerattrs=(symbol=CircleFilled size=10);

  xaxis label="Passengers per Year";

  yaxis label="Cargo (Tons)";

run;

OUTPUT:

The SGPlot Procedure


proc sgplot data=work.airports_binned;

  title "Runways distribution by Traffic Category";

  vbar Traffic_Category / response=Runways stat=mean datalabel;

run;

OUTPUT:

The SGPlot Procedure


7. PROC FREQ — categorical summaries and cross-tabs

proc freq data=work.airports_binned;

  tables Continent_up*Traffic_Category / norow nocol nopercent;

  title "Continent x Traffic Category Cross-tabulation";

run;

OUTPUT:

Continent x Traffic Category Cross-tabulation

The FREQ Procedure

Frequency
Table of Continent_up by Traffic_Category
Continent_upTraffic_Category
HighMediumVery_HighTotal
Asia
2
0
3
5
Europe
2
0
1
3
North_america
1
0
1
2
Oceania
0
1
0
1
South_america
0
1
0
1
Total
5
2
5
12

8. PROC REPORT — formatted summary for reporting

proc report data=work.airports_binned nowd;

    column Continent_up Traffic_Category n Passengers_per_Year Cargo_Tons;

    define Continent_up / group 'Continent';

    define Traffic_Category / group 'Traffic Category';

    define n / computed 'Count';

    define Passengers_per_Year / analysis mean 'Avg Passengers';

    define Cargo_Tons / analysis sum 'Total Cargo (t)';


    compute n;

        n = _c3_;   /* pulls the computed COUNT column */

    endcomp;


    title "Formatted Report: Avg Passengers and Total Cargo by Continent and Traffic Category";

run;

OUTPUT:

Formatted Report: Avg Passengers and Total Cargo by Continent and Traffic Category

ContinentTraffic CategoryCountAvg PassengersTotal Cargo (t)
AsiaHigh.63500000330000
 Very_High.91666667820000
EuropeHigh.62000000300000
 Very_High.80000000210000
North_americaHigh.61500000220000
 Very_High.88000000300000
OceaniaMedium.4200000090000
South_americaMedium.4300000085000

9. PROC CORR — check relationships

proc corr data=work.airports_binned pearson;

  var Passengers_per_Year Cargo_Tons Runways Age_Years;

  title "Correlation Matrix (Passengers, Cargo, Runways, Age)";

run;

OUTPUT:

Correlation Matrix (Passengers, Cargo, Runways, Age)

The CORR Procedure

4 Variables:Passengers_per_Year Cargo_Tons Runways Age_Years
Simple Statistics
VariableNMeanStd DevSumMinimumMaximum
Passengers_per_Year12700416671900533284050000042000000101000000
Cargo_Tons1219625089013235500085000400000
Runways123.583330.9962043.000002.000006.00000
Age_Years1272.0833317.28088865.0000051.00000106.00000
Pearson Correlation Coefficients, N = 12
Prob > |r| under H0: Rho=0
 Passengers_per_YearCargo_TonsRunwaysAge_Years
Passengers_per_Year
1.00000
 
0.86125
0.0003
-0.03261
0.9199
-0.28332
0.3722
Cargo_Tons
0.86125
0.0003
1.00000
 
0.08842
0.7846
-0.10350
0.7489
Runways
-0.03261
0.9199
0.08842
0.7846
1.00000
 
-0.30936
0.3278
Age_Years
-0.28332
0.3722
-0.10350
0.7489
-0.30936
0.3278
1.00000
 

10. PROC TRANSPOSE — wide-to-long example

proc sort data=work.airports_binned out=work.airports_binned_sorted;

    by Airport_Code Country Continent_up Traffic_Category;

run;

proc transpose data=work.airports_binned_sorted out=work.long_pass(drop=_NAME_);

    by Airport_Code Country Continent_up Traffic_Category;

    var Passengers_per_Year Cargo_Tons Runways;

run;

proc print data=work.long_pass noobs;

  title "Transposed (Long) Format of Passengers/Cargo/Runways per Airport";

run;

OUTPUT:

Transposed (Long) Format of Passengers/Cargo/Runways per Airport

Airport_CodeCountryContinent_upTraffic_CategoryCOL1
AMSNetherlandsEuropeHigh52000000
AMSNetherlandsEuropeHigh140000
AMSNetherlandsEuropeHigh6
CDGFranceEuropeHigh72000000
CDGFranceEuropeHigh160000
CDGFranceEuropeHigh4
DELIndiaAsiaHigh65000000
DELIndiaAsiaHigh150000
DELIndiaAsiaHigh3
DXBUnited Arab EmiratesAsiaVery_High89000000
DXBUnited Arab EmiratesAsiaVery_High250000
DXBUnited Arab EmiratesAsiaVery_High3
GRUBrazilSouth_americaMedium43000000
GRUBrazilSouth_americaMedium85000
GRUBrazilSouth_americaMedium3
HNDJapanAsiaVery_High85000000
HNDJapanAsiaVery_High170000
HNDJapanAsiaVery_High4
JFKUnited StatesNorth_americaHigh61500000
JFKUnited StatesNorth_americaHigh220000
JFKUnited StatesNorth_americaHigh4
LAXUnited StatesNorth_americaVery_High88000000
LAXUnited StatesNorth_americaVery_High300000
LAXUnited StatesNorth_americaVery_High4
LHRUnited KingdomEuropeVery_High80000000
LHRUnited KingdomEuropeVery_High210000
LHRUnited KingdomEuropeVery_High2
PEKChinaAsiaVery_High101000000
PEKChinaAsiaVery_High400000
PEKChinaAsiaVery_High4
SINSingaporeAsiaHigh62000000
SINSingaporeAsiaHigh180000
SINSingaporeAsiaHigh3
SYDAustraliaOceaniaMedium42000000
SYDAustraliaOceaniaMedium90000
SYDAustraliaOceaniaMedium3

11. PROC SQL examples — create summary table and export-ready table

proc sql;

  create table work.summary_by_continent as

  select Continent_up as Continent,

         count(*) as Airports,

         sum(Passengers_per_Year) as Total_Passengers format=comma12.,

         mean(Passengers_per_Year) as Avg_Passengers format=comma12.,

         sum(Cargo_Tons) as Total_Cargo format=comma12.,

         mean(Runways) as Avg_Runways format=8.2

  from work.airports_binned

  group by Continent_up

  order by Total_Passengers desc;

quit;


proc print data=work.summary_by_continent noobs;

  title "SQL Summary: Airports by Continent (by total passengers)";

run;

OUTPUT:

SQL Summary: Airports by Continent (by total passengers)

ContinentAirportsTotal_PassengersAvg_PassengersTotal_CargoAvg_Runways
Asia5402,000,00080,400,0001,150,0003.40
Europe3204,000,00068,000,000510,0004.00
North_america2149,500,00074,750,000520,0004.00
South_america143,000,00043,000,00085,0003.00
Oceania142,000,00042,000,00090,0003.00



To Visit My Previous Online Streaming Flatform Dataset:Click Here
To Visit My Previous Statistical Evaluation Of Clinical Trials:Click Here
To Visit My Previous Unlocking Retail Insights Dataset:Click Here
To Visit My Previous Sas Interview Questions-1:Click Here  



Follow Us On : 


 


--- FOLLOW OUR BLOG FOR MORE INFORMATION.

--->PLEASE DO COMMENTS AND SHARE OUR BLOG.


No comments:

Post a Comment