319.MAJOR CITY TRAFFIC AND POLLUTION LEVEL ANALYSIS USING DATA STEP | PROC SQL | PROC MEANS | PROC REG | PROC SGPLOT | DATA STEP | MACRO PROCESSING | DATE FUNCTIONS
MAJOR CITY TRAFFIC AND POLLUTION LEVEL ANALYSIS USING DATA STEP | PROC SQL | PROC MEANS | PROC REG | PROC SGPLOT | DATA STEP | MACRO PROCESSING | DATE FUNCTIONS
options nocenter;
1. CREATE RAW DATASET WITH DATE FORMATS
data work.city_traffic_raw;
length City $25 Country $20;
infile datalines dsd truncover;
input City : $25.Country: $20. Vehicle_Count Pollution_Index Avg_Speed
Transport_Score Measurement_Date: ddmmyy10. Policy_Date: date9.;
format Measurement_Date ddmmyy10. Policy_Date date9.;
datalines;
Hyderabad,India,1250000,95,24,62,15-01-2025,01JAN2024
Delhi,India,1800000,130,20,55,15-06-2025,01JUL2023
Mumbai,India,1700000,120,22,58,10-06-2025,15AUG2022
Bengaluru,India,1350000,88,25,70,05-05-2025,01JAN2023
Chennai,India,1100000,90,23,64,12-05-2025,01APR2023
Kolkata,India,1400000,115,21,57,20-04-2025,01DEC2022
London,United Kingdom,950000,75,30,78,18-04-2025,01MAR2021
Paris,France,900000,82,28,76,25-03-2025,15SEP2021
New York,USA,1600000,105,27,73,30-03-2025,01JAN2022
Los Angeles,USA,1450000,110,26,69,15-02-2025,01JUN2022
Tokyo,Japan,1550000,98,29,80,05-02-2025,01APR2020
Beijing,China,1900000,140,19,52,10-01-2025,01JAN2023
Shanghai,China,1750000,125,20,56,18-01-2025,01MAY2022
Singapore,Singapore,650000,60,32,85,08-03-2025,01JAN2019
Sydney,Australia,700000,65,31,83,22-02-2025,01JAN2020
Dubai,UAE,800000,72,29,79,28-02-2025,01JUN2019
Mexico City,Mexico,1500000,118,22,59,14-03-2025,01JAN2021
Johannesburg,South Africa,900000,92,24,63,11-03-2025,01JAN2022
;
run;
proc print data=work.city_traffic_raw;
run;
OUTPUT:
| Obs | City | Country | Vehicle_Count | Pollution_Index | Avg_Speed | Transport_Score | Measurement_Date | Policy_Date |
|---|---|---|---|---|---|---|---|---|
| 1 | Hyderabad | India | 1250000 | 95 | 24 | 62 | 15/01/2025 | 01JAN2024 |
| 2 | Delhi | India | 1800000 | 130 | 20 | 55 | 15/06/2025 | 01JUL2023 |
| 3 | Mumbai | India | 1700000 | 120 | 22 | 58 | 10/06/2025 | 15AUG2022 |
| 4 | Bengaluru | India | 1350000 | 88 | 25 | 70 | 05/05/2025 | 01JAN2023 |
| 5 | Chennai | India | 1100000 | 90 | 23 | 64 | 12/05/2025 | 01APR2023 |
| 6 | Kolkata | India | 1400000 | 115 | 21 | 57 | 20/04/2025 | 01DEC2022 |
| 7 | London | United Kingdom | 950000 | 75 | 30 | 78 | 18/04/2025 | 01MAR2021 |
| 8 | Paris | France | 900000 | 82 | 28 | 76 | 25/03/2025 | 15SEP2021 |
| 9 | New York | USA | 1600000 | 105 | 27 | 73 | 30/03/2025 | 01JAN2022 |
| 10 | Los Angeles | USA | 1450000 | 110 | 26 | 69 | 15/02/2025 | 01JUN2022 |
| 11 | Tokyo | Japan | 1550000 | 98 | 29 | 80 | 05/02/2025 | 01APR2020 |
| 12 | Beijing | China | 1900000 | 140 | 19 | 52 | 10/01/2025 | 01JAN2023 |
| 13 | Shanghai | China | 1750000 | 125 | 20 | 56 | 18/01/2025 | 01MAY2022 |
| 14 | Singapore | Singapore | 650000 | 60 | 32 | 85 | 08/03/2025 | 01JAN2019 |
| 15 | Sydney | Australia | 700000 | 65 | 31 | 83 | 22/02/2025 | 01JAN2020 |
| 16 | Dubai | UAE | 800000 | 72 | 29 | 79 | 28/02/2025 | 01JUN2019 |
| 17 | Mexico City | Mexico | 1500000 | 118 | 22 | 59 | 14/03/2025 | 01JAN2021 |
| 18 | Johannesburg | South Africa | 900000 | 92 | 24 | 63 | 11/03/2025 | 01JAN2022 |
options MCOMPILENOTE=ALL;
%macro pollution_case(var=);
/* This macro expands into a CASE expression for PROC SQL */
case
when &var. < 40 then "LOW"
when &var. < 80 then "MODERATE"
when &var. < 120 then "HIGH"
else "VERY_HIGH"
end
%mend pollution_case;
LOG:
NOTE: The macro POLLUTION_CASE completed compilation without errors.
3. CREATE ENRICHED DATASET USING PROC SQL + MACRO
proc sql;
create table work.city_traffic as
select City,Country,Vehicle_Count,Pollution_Index,Avg_Speed,Transport_Score,
Measurement_Date,Policy_Date,
%pollution_case(var=Pollution_Index) as Pollution_Category length=12
from work.city_traffic_raw;
quit;
proc print data=work.city_traffic;
run;
OUTPUT:
| Obs | City | Country | Vehicle_Count | Pollution_Index | Avg_Speed | Transport_Score | Measurement_Date | Policy_Date | Pollution_Category |
|---|---|---|---|---|---|---|---|---|---|
| 1 | Hyderabad | India | 1250000 | 95 | 24 | 62 | 15/01/2025 | 01JAN2024 | HIGH |
| 2 | Delhi | India | 1800000 | 130 | 20 | 55 | 15/06/2025 | 01JUL2023 | VERY_HIGH |
| 3 | Mumbai | India | 1700000 | 120 | 22 | 58 | 10/06/2025 | 15AUG2022 | VERY_HIGH |
| 4 | Bengaluru | India | 1350000 | 88 | 25 | 70 | 05/05/2025 | 01JAN2023 | HIGH |
| 5 | Chennai | India | 1100000 | 90 | 23 | 64 | 12/05/2025 | 01APR2023 | HIGH |
| 6 | Kolkata | India | 1400000 | 115 | 21 | 57 | 20/04/2025 | 01DEC2022 | HIGH |
| 7 | London | United Kingdom | 950000 | 75 | 30 | 78 | 18/04/2025 | 01MAR2021 | MODERATE |
| 8 | Paris | France | 900000 | 82 | 28 | 76 | 25/03/2025 | 15SEP2021 | HIGH |
| 9 | New York | USA | 1600000 | 105 | 27 | 73 | 30/03/2025 | 01JAN2022 | HIGH |
| 10 | Los Angeles | USA | 1450000 | 110 | 26 | 69 | 15/02/2025 | 01JUN2022 | HIGH |
| 11 | Tokyo | Japan | 1550000 | 98 | 29 | 80 | 05/02/2025 | 01APR2020 | HIGH |
| 12 | Beijing | China | 1900000 | 140 | 19 | 52 | 10/01/2025 | 01JAN2023 | VERY_HIGH |
| 13 | Shanghai | China | 1750000 | 125 | 20 | 56 | 18/01/2025 | 01MAY2022 | VERY_HIGH |
| 14 | Singapore | Singapore | 650000 | 60 | 32 | 85 | 08/03/2025 | 01JAN2019 | MODERATE |
| 15 | Sydney | Australia | 700000 | 65 | 31 | 83 | 22/02/2025 | 01JAN2020 | MODERATE |
| 16 | Dubai | UAE | 800000 | 72 | 29 | 79 | 28/02/2025 | 01JUN2019 | MODERATE |
| 17 | Mexico City | Mexico | 1500000 | 118 | 22 | 59 | 14/03/2025 | 01JAN2021 | HIGH |
| 18 | Johannesburg | South Africa | 900000 | 92 | 24 | 63 | 11/03/2025 | 01JAN2022 | HIGH |
4. ADD DATE-BASED DERIVED VARIABLES (YEAR, MONTH, POLICY AGE)
data work.city_traffic_enh;
set work.city_traffic;
/* Extract year and month from measurement date */
Measurement_Year = year(Measurement_Date);
Measurement_Month = month(Measurement_Date);
/* Age of policy in years at measurement date */
Policy_Age_Years = intck('year', Policy_Date, Measurement_Date);
format Measurement_Date ddmmyy10. Policy_Date date9.;
run;
proc print data=work.city_traffic_enh;
title "CITY TRAFFIC AND POLLUTION DATA WITH DERIVED VARIABLES";
run;
title;
OUTPUT:
| Obs | City | Country | Vehicle_Count | Pollution_Index | Avg_Speed | Transport_Score | Measurement_Date | Policy_Date | Pollution_Category | Measurement_Year | Measurement_Month | Policy_Age_Years |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | Hyderabad | India | 1250000 | 95 | 24 | 62 | 15/01/2025 | 01JAN2024 | HIGH | 2025 | 1 | 1 |
| 2 | Delhi | India | 1800000 | 130 | 20 | 55 | 15/06/2025 | 01JUL2023 | VERY_HIGH | 2025 | 6 | 2 |
| 3 | Mumbai | India | 1700000 | 120 | 22 | 58 | 10/06/2025 | 15AUG2022 | VERY_HIGH | 2025 | 6 | 3 |
| 4 | Bengaluru | India | 1350000 | 88 | 25 | 70 | 05/05/2025 | 01JAN2023 | HIGH | 2025 | 5 | 2 |
| 5 | Chennai | India | 1100000 | 90 | 23 | 64 | 12/05/2025 | 01APR2023 | HIGH | 2025 | 5 | 2 |
| 6 | Kolkata | India | 1400000 | 115 | 21 | 57 | 20/04/2025 | 01DEC2022 | HIGH | 2025 | 4 | 3 |
| 7 | London | United Kingdom | 950000 | 75 | 30 | 78 | 18/04/2025 | 01MAR2021 | MODERATE | 2025 | 4 | 4 |
| 8 | Paris | France | 900000 | 82 | 28 | 76 | 25/03/2025 | 15SEP2021 | HIGH | 2025 | 3 | 4 |
| 9 | New York | USA | 1600000 | 105 | 27 | 73 | 30/03/2025 | 01JAN2022 | HIGH | 2025 | 3 | 3 |
| 10 | Los Angeles | USA | 1450000 | 110 | 26 | 69 | 15/02/2025 | 01JUN2022 | HIGH | 2025 | 2 | 3 |
| 11 | Tokyo | Japan | 1550000 | 98 | 29 | 80 | 05/02/2025 | 01APR2020 | HIGH | 2025 | 2 | 5 |
| 12 | Beijing | China | 1900000 | 140 | 19 | 52 | 10/01/2025 | 01JAN2023 | VERY_HIGH | 2025 | 1 | 2 |
| 13 | Shanghai | China | 1750000 | 125 | 20 | 56 | 18/01/2025 | 01MAY2022 | VERY_HIGH | 2025 | 1 | 3 |
| 14 | Singapore | Singapore | 650000 | 60 | 32 | 85 | 08/03/2025 | 01JAN2019 | MODERATE | 2025 | 3 | 6 |
| 15 | Sydney | Australia | 700000 | 65 | 31 | 83 | 22/02/2025 | 01JAN2020 | MODERATE | 2025 | 2 | 5 |
| 16 | Dubai | UAE | 800000 | 72 | 29 | 79 | 28/02/2025 | 01JUN2019 | MODERATE | 2025 | 2 | 6 |
| 17 | Mexico City | Mexico | 1500000 | 118 | 22 | 59 | 14/03/2025 | 01JAN2021 | HIGH | 2025 | 3 | 4 |
| 18 | Johannesburg | South Africa | 900000 | 92 | 24 | 63 | 11/03/2025 | 01JAN2022 | HIGH | 2025 | 3 | 3 |
5. PROC MEANS – SUMMARY STATISTICS BY COUNTRY
proc means data=work.city_traffic_enh n mean std min max maxdec=2;
class Country;
var Vehicle_Count Pollution_Index Avg_Speed Transport_Score;
output out=work.city_traffic_summary mean= / autoname;
title "SUMMARY STATISTICS OF TRAFFIC AND POLLUTION BY COUNTRY";
run;
title;
proc print data=work.city_traffic_summary;
run;
OUTPUT:
The MEANS Procedure
| Country | N Obs | Variable | N | Mean | Std Dev | Minimum | Maximum |
|---|---|---|---|---|---|---|---|
| Australia | 1 | Vehicle_Count Pollution_Index Avg_Speed Transport_Score | 1 1 1 1 | 700000.00 65.00 31.00 83.00 | . . . . | 700000.00 65.00 31.00 83.00 | 700000.00 65.00 31.00 83.00 |
| China | 2 | Vehicle_Count Pollution_Index Avg_Speed Transport_Score | 2 2 2 2 | 1825000.00 132.50 19.50 54.00 | 106066.02 10.61 0.71 2.83 | 1750000.00 125.00 19.00 52.00 | 1900000.00 140.00 20.00 56.00 |
| France | 1 | Vehicle_Count Pollution_Index Avg_Speed Transport_Score | 1 1 1 1 | 900000.00 82.00 28.00 76.00 | . . . . | 900000.00 82.00 28.00 76.00 | 900000.00 82.00 28.00 76.00 |
| India | 6 | Vehicle_Count Pollution_Index Avg_Speed Transport_Score | 6 6 6 6 | 1433333.33 106.33 22.50 61.00 | 267706.31 17.63 1.87 5.51 | 1100000.00 88.00 20.00 55.00 | 1800000.00 130.00 25.00 70.00 |
| Japan | 1 | Vehicle_Count Pollution_Index Avg_Speed Transport_Score | 1 1 1 1 | 1550000.00 98.00 29.00 80.00 | . . . . | 1550000.00 98.00 29.00 80.00 | 1550000.00 98.00 29.00 80.00 |
| Mexico | 1 | Vehicle_Count Pollution_Index Avg_Speed Transport_Score | 1 1 1 1 | 1500000.00 118.00 22.00 59.00 | . . . . | 1500000.00 118.00 22.00 59.00 | 1500000.00 118.00 22.00 59.00 |
| Singapore | 1 | Vehicle_Count Pollution_Index Avg_Speed Transport_Score | 1 1 1 1 | 650000.00 60.00 32.00 85.00 | . . . . | 650000.00 60.00 32.00 85.00 | 650000.00 60.00 32.00 85.00 |
| South Africa | 1 | Vehicle_Count Pollution_Index Avg_Speed Transport_Score | 1 1 1 1 | 900000.00 92.00 24.00 63.00 | . . . . | 900000.00 92.00 24.00 63.00 | 900000.00 92.00 24.00 63.00 |
| UAE | 1 | Vehicle_Count Pollution_Index Avg_Speed Transport_Score | 1 1 1 1 | 800000.00 72.00 29.00 79.00 | . . . . | 800000.00 72.00 29.00 79.00 | 800000.00 72.00 29.00 79.00 |
| USA | 2 | Vehicle_Count Pollution_Index Avg_Speed Transport_Score | 2 2 2 2 | 1525000.00 107.50 26.50 71.00 | 106066.02 3.54 0.71 2.83 | 1450000.00 105.00 26.00 69.00 | 1600000.00 110.00 27.00 73.00 |
| United Kingdom | 1 | Vehicle_Count Pollution_Index Avg_Speed Transport_Score | 1 1 1 1 | 950000.00 75.00 30.00 78.00 | . . . . | 950000.00 75.00 30.00 78.00 | 950000.00 75.00 30.00 78.00 |
| Obs | Country | _TYPE_ | _FREQ_ | Vehicle_Count_Mean | Pollution_Index_Mean | Avg_Speed_Mean | Transport_Score_Mean |
|---|---|---|---|---|---|---|---|
| 1 | 0 | 18 | 1291666.67 | 98.889 | 25.1111 | 67.7222 | |
| 2 | Australia | 1 | 1 | 700000.00 | 65.000 | 31.0000 | 83.0000 |
| 3 | China | 1 | 2 | 1825000.00 | 132.500 | 19.5000 | 54.0000 |
| 4 | France | 1 | 1 | 900000.00 | 82.000 | 28.0000 | 76.0000 |
| 5 | India | 1 | 6 | 1433333.33 | 106.333 | 22.5000 | 61.0000 |
| 6 | Japan | 1 | 1 | 1550000.00 | 98.000 | 29.0000 | 80.0000 |
| 7 | Mexico | 1 | 1 | 1500000.00 | 118.000 | 22.0000 | 59.0000 |
| 8 | Singapore | 1 | 1 | 650000.00 | 60.000 | 32.0000 | 85.0000 |
| 9 | South Africa | 1 | 1 | 900000.00 | 92.000 | 24.0000 | 63.0000 |
| 10 | UAE | 1 | 1 | 800000.00 | 72.000 | 29.0000 | 79.0000 |
| 11 | USA | 1 | 2 | 1525000.00 | 107.500 | 26.5000 | 71.0000 |
| 12 | United Kingdom | 1 | 1 | 950000.00 | 75.000 | 30.0000 | 78.0000 |
6. PROC SQL – EXAMPLE SUMMARY TABLE USING POLLUTION CATEGORY
proc sql;
create table work.pollution_category_summary as
select Pollution_Category, count(*) as City_Count,avg(Vehicle_Count) as Avg_Vehicles,
avg(Pollution_Index) as Avg_Pollution_Index,
avg(Avg_Speed) as Avg_Speed,
avg(Transport_Score) as Avg_Transport_Score
from work.city_traffic_enh
group by Pollution_Category
order by City_Count desc;
quit;
proc print data=work.city_traffic_summary;
run;
OUTPUT:
| Obs | Country | _TYPE_ | _FREQ_ | Vehicle_Count_Mean | Pollution_Index_Mean | Avg_Speed_Mean | Transport_Score_Mean |
|---|---|---|---|---|---|---|---|
| 1 | 0 | 18 | 1291666.67 | 98.889 | 25.1111 | 67.7222 | |
| 2 | Australia | 1 | 1 | 700000.00 | 65.000 | 31.0000 | 83.0000 |
| 3 | China | 1 | 2 | 1825000.00 | 132.500 | 19.5000 | 54.0000 |
| 4 | France | 1 | 1 | 900000.00 | 82.000 | 28.0000 | 76.0000 |
| 5 | India | 1 | 6 | 1433333.33 | 106.333 | 22.5000 | 61.0000 |
| 6 | Japan | 1 | 1 | 1550000.00 | 98.000 | 29.0000 | 80.0000 |
| 7 | Mexico | 1 | 1 | 1500000.00 | 118.000 | 22.0000 | 59.0000 |
| 8 | Singapore | 1 | 1 | 650000.00 | 60.000 | 32.0000 | 85.0000 |
| 9 | South Africa | 1 | 1 | 900000.00 | 92.000 | 24.0000 | 63.0000 |
| 10 | UAE | 1 | 1 | 800000.00 | 72.000 | 29.0000 | 79.0000 |
| 11 | USA | 1 | 2 | 1525000.00 | 107.500 | 26.5000 | 71.0000 |
| 12 | United Kingdom | 1 | 1 | 950000.00 | 75.000 | 30.0000 | 78.0000 |
6. PROC REG – MODEL POLLUTION AS FUNCTION OF TRAFFIC METRICS
proc reg data=work.city_traffic_enh outest=work.city_reg_estimates;
model Pollution_Index = Vehicle_Count Avg_Speed Transport_Score;
title "REGRESSION OF POLLUTION INDEX ON TRAFFIC AND TRANSPORT VARIABLES";
run;
quit;
title;
OUTPUT:
The REG Procedure
Model: MODEL1
Dependent Variable: Pollution_Index
| Number of Observations Read | 18 |
|---|---|
| Number of Observations Used | 18 |
| Analysis of Variance | |||||
|---|---|---|---|---|---|
| Source | DF | Sum of Squares | Mean Square | F Value | Pr > F |
| Model | 3 | 8739.02458 | 2913.00819 | 122.56 | <.0001 |
| Error | 14 | 332.75320 | 23.76809 | ||
| Corrected Total | 17 | 9071.77778 | |||
| Root MSE | 4.87525 | R-Square | 0.9633 |
|---|---|---|---|
| Dependent Mean | 98.88889 | Adj R-Sq | 0.9555 |
| Coeff Var | 4.93003 |
| Parameter Estimates | |||||
|---|---|---|---|---|---|
| Variable | DF | Parameter Estimate | Standard Error | t Value | Pr > |t| |
| Intercept | 1 | 110.11708 | 16.22334 | 6.79 | <.0001 |
| Vehicle_Count | 1 | 0.00003770 | 0.00000466 | 8.09 | <.0001 |
| Avg_Speed | 1 | 1.17574 | 2.13305 | 0.55 | 0.5902 |
| Transport_Score | 1 | -1.32089 | 0.77895 | -1.70 | 0.1121 |
The REG Procedure
Model: MODEL1
Dependent Variable: Pollution_Index
7. PROC SGPLOT – SCATTER + REGRESSION LINE
proc sgplot data=work.city_traffic_enh;
scatter x=Vehicle_Count
y=Pollution_Index
/ group=Pollution_Category datalabel=City;
reg x=Vehicle_Count
y=Pollution_Index;
xaxis label="Vehicle Count (approx. daily vehicles)";
yaxis label="Pollution Index (AQI-like score)";
title "RELATIONSHIP BETWEEN VEHICLE COUNT AND POLLUTION INDEX";
run;
title;
OUTPUT:
8. PROC SGPLOT – BOXPLOT OF POLLUTION BY CATEGORY
proc sgplot data=work.city_traffic_enh;
vbox Pollution_Index / category=Pollution_Category;
yaxis label="Pollution Index";
xaxis label="Pollution Category";
title "DISTRIBUTION OF POLLUTION INDEX BY CATEGORY";
run;
title;
Comments
Post a Comment