319.MAJOR CITY TRAFFIC AND POLLUTION LEVEL ANALYSIS USING DATA STEP | PROC SQL | PROC MEANS | PROC REG | PROC SGPLOT | DATA STEP | MACRO PROCESSING | DATE FUNCTIONS

MAJOR CITY TRAFFIC AND POLLUTION LEVEL ANALYSIS USING DATA STEP | PROC SQL | PROC MEANS | PROC REG | PROC SGPLOT | DATA STEP | MACRO PROCESSING | DATE FUNCTIONS

options nocenter;

1. CREATE RAW DATASET WITH DATE FORMATS

data work.city_traffic_raw;

    length City $25 Country $20;

    infile datalines dsd truncover;

    input City : $25.Country: $20. Vehicle_Count Pollution_Index Avg_Speed

          Transport_Score Measurement_Date: ddmmyy10. Policy_Date: date9.;

    format  Measurement_Date ddmmyy10. Policy_Date date9.;

datalines;

Hyderabad,India,1250000,95,24,62,15-01-2025,01JAN2024

Delhi,India,1800000,130,20,55,15-06-2025,01JUL2023

Mumbai,India,1700000,120,22,58,10-06-2025,15AUG2022

Bengaluru,India,1350000,88,25,70,05-05-2025,01JAN2023

Chennai,India,1100000,90,23,64,12-05-2025,01APR2023

Kolkata,India,1400000,115,21,57,20-04-2025,01DEC2022

London,United Kingdom,950000,75,30,78,18-04-2025,01MAR2021

Paris,France,900000,82,28,76,25-03-2025,15SEP2021

New York,USA,1600000,105,27,73,30-03-2025,01JAN2022

Los Angeles,USA,1450000,110,26,69,15-02-2025,01JUN2022

Tokyo,Japan,1550000,98,29,80,05-02-2025,01APR2020

Beijing,China,1900000,140,19,52,10-01-2025,01JAN2023

Shanghai,China,1750000,125,20,56,18-01-2025,01MAY2022

Singapore,Singapore,650000,60,32,85,08-03-2025,01JAN2019

Sydney,Australia,700000,65,31,83,22-02-2025,01JAN2020

Dubai,UAE,800000,72,29,79,28-02-2025,01JUN2019

Mexico City,Mexico,1500000,118,22,59,14-03-2025,01JAN2021

Johannesburg,South Africa,900000,92,24,63,11-03-2025,01JAN2022

;

run;

proc print data=work.city_traffic_raw;

run;

OUTPUT:

ObsCityCountryVehicle_CountPollution_IndexAvg_SpeedTransport_ScoreMeasurement_DatePolicy_Date
1HyderabadIndia125000095246215/01/202501JAN2024
2DelhiIndia1800000130205515/06/202501JUL2023
3MumbaiIndia1700000120225810/06/202515AUG2022
4BengaluruIndia135000088257005/05/202501JAN2023
5ChennaiIndia110000090236412/05/202501APR2023
6KolkataIndia1400000115215720/04/202501DEC2022
7LondonUnited Kingdom95000075307818/04/202501MAR2021
8ParisFrance90000082287625/03/202515SEP2021
9New YorkUSA1600000105277330/03/202501JAN2022
10Los AngelesUSA1450000110266915/02/202501JUN2022
11TokyoJapan155000098298005/02/202501APR2020
12BeijingChina1900000140195210/01/202501JAN2023
13ShanghaiChina1750000125205618/01/202501MAY2022
14SingaporeSingapore65000060328508/03/202501JAN2019
15SydneyAustralia70000065318322/02/202501JAN2020
16DubaiUAE80000072297928/02/202501JUN2019
17Mexico CityMexico1500000118225914/03/202501JAN2021
18JohannesburgSouth Africa90000092246311/03/202501JAN2022

2. MACRO FOR POLLUTION-BASED CATEGORIZATION (USED IN PROC SQL)

options MCOMPILENOTE=ALL;

%macro pollution_case(var=);

    /* This macro expands into a CASE expression for PROC SQL */

    case 

        when &var. < 40 then "LOW"

        when &var. < 80 then "MODERATE"

        when &var. < 120 then "HIGH"

        else "VERY_HIGH"

    end

%mend pollution_case;

LOG:

NOTE: The macro POLLUTION_CASE completed compilation without errors.


3. CREATE ENRICHED DATASET USING PROC SQL + MACRO

proc sql;

    create table work.city_traffic as

    select City,Country,Vehicle_Count,Pollution_Index,Avg_Speed,Transport_Score,

        Measurement_Date,Policy_Date,

        %pollution_case(var=Pollution_Index) as Pollution_Category length=12

    from  work.city_traffic_raw;

 quit;

proc print data=work.city_traffic;

run;

OUTPUT:

ObsCityCountryVehicle_CountPollution_IndexAvg_SpeedTransport_ScoreMeasurement_DatePolicy_DatePollution_Category
1HyderabadIndia125000095246215/01/202501JAN2024HIGH
2DelhiIndia1800000130205515/06/202501JUL2023VERY_HIGH
3MumbaiIndia1700000120225810/06/202515AUG2022VERY_HIGH
4BengaluruIndia135000088257005/05/202501JAN2023HIGH
5ChennaiIndia110000090236412/05/202501APR2023HIGH
6KolkataIndia1400000115215720/04/202501DEC2022HIGH
7LondonUnited Kingdom95000075307818/04/202501MAR2021MODERATE
8ParisFrance90000082287625/03/202515SEP2021HIGH
9New YorkUSA1600000105277330/03/202501JAN2022HIGH
10Los AngelesUSA1450000110266915/02/202501JUN2022HIGH
11TokyoJapan155000098298005/02/202501APR2020HIGH
12BeijingChina1900000140195210/01/202501JAN2023VERY_HIGH
13ShanghaiChina1750000125205618/01/202501MAY2022VERY_HIGH
14SingaporeSingapore65000060328508/03/202501JAN2019MODERATE
15SydneyAustralia70000065318322/02/202501JAN2020MODERATE
16DubaiUAE80000072297928/02/202501JUN2019MODERATE
17Mexico CityMexico1500000118225914/03/202501JAN2021HIGH
18JohannesburgSouth Africa90000092246311/03/202501JAN2022HIGH


4. ADD DATE-BASED DERIVED VARIABLES (YEAR, MONTH, POLICY AGE)

data work.city_traffic_enh;

    set work.city_traffic;

    /* Extract year and month from measurement date */

    Measurement_Year  = year(Measurement_Date);

    Measurement_Month = month(Measurement_Date);

    /* Age of policy in years at measurement date */

    Policy_Age_Years = intck('year', Policy_Date, Measurement_Date);

    format Measurement_Date ddmmyy10. Policy_Date date9.;

run;

proc print data=work.city_traffic_enh;

    title "CITY TRAFFIC AND POLLUTION DATA WITH DERIVED VARIABLES";

run;

title;

OUTPUT:

CITY TRAFFIC AND POLLUTION DATA WITH DERIVED VARIABLES

ObsCityCountryVehicle_CountPollution_IndexAvg_SpeedTransport_ScoreMeasurement_DatePolicy_DatePollution_CategoryMeasurement_YearMeasurement_MonthPolicy_Age_Years
1HyderabadIndia125000095246215/01/202501JAN2024HIGH202511
2DelhiIndia1800000130205515/06/202501JUL2023VERY_HIGH202562
3MumbaiIndia1700000120225810/06/202515AUG2022VERY_HIGH202563
4BengaluruIndia135000088257005/05/202501JAN2023HIGH202552
5ChennaiIndia110000090236412/05/202501APR2023HIGH202552
6KolkataIndia1400000115215720/04/202501DEC2022HIGH202543
7LondonUnited Kingdom95000075307818/04/202501MAR2021MODERATE202544
8ParisFrance90000082287625/03/202515SEP2021HIGH202534
9New YorkUSA1600000105277330/03/202501JAN2022HIGH202533
10Los AngelesUSA1450000110266915/02/202501JUN2022HIGH202523
11TokyoJapan155000098298005/02/202501APR2020HIGH202525
12BeijingChina1900000140195210/01/202501JAN2023VERY_HIGH202512
13ShanghaiChina1750000125205618/01/202501MAY2022VERY_HIGH202513
14SingaporeSingapore65000060328508/03/202501JAN2019MODERATE202536
15SydneyAustralia70000065318322/02/202501JAN2020MODERATE202525
16DubaiUAE80000072297928/02/202501JUN2019MODERATE202526
17Mexico CityMexico1500000118225914/03/202501JAN2021HIGH202534
18JohannesburgSouth Africa90000092246311/03/202501JAN2022HIGH202533

5. PROC MEANS – SUMMARY STATISTICS BY COUNTRY

proc means data=work.city_traffic_enh n mean std min max maxdec=2;

    class Country;

    var Vehicle_Count Pollution_Index Avg_Speed Transport_Score;

    output out=work.city_traffic_summary     mean= / autoname;

    title "SUMMARY STATISTICS OF TRAFFIC AND POLLUTION BY COUNTRY";

run;

title;

proc print data=work.city_traffic_summary;

run;

OUTPUT:

SUMMARY STATISTICS OF TRAFFIC AND POLLUTION BY COUNTRY

The MEANS Procedure

CountryN ObsVariableNMeanStd DevMinimumMaximum
Australia1
Vehicle_Count
Pollution_Index
Avg_Speed
Transport_Score
1
1
1
1
700000.00
65.00
31.00
83.00
.
.
.
.
700000.00
65.00
31.00
83.00
700000.00
65.00
31.00
83.00
China2
Vehicle_Count
Pollution_Index
Avg_Speed
Transport_Score
2
2
2
2
1825000.00
132.50
19.50
54.00
106066.02
10.61
0.71
2.83
1750000.00
125.00
19.00
52.00
1900000.00
140.00
20.00
56.00
France1
Vehicle_Count
Pollution_Index
Avg_Speed
Transport_Score
1
1
1
1
900000.00
82.00
28.00
76.00
.
.
.
.
900000.00
82.00
28.00
76.00
900000.00
82.00
28.00
76.00
India6
Vehicle_Count
Pollution_Index
Avg_Speed
Transport_Score
6
6
6
6
1433333.33
106.33
22.50
61.00
267706.31
17.63
1.87
5.51
1100000.00
88.00
20.00
55.00
1800000.00
130.00
25.00
70.00
Japan1
Vehicle_Count
Pollution_Index
Avg_Speed
Transport_Score
1
1
1
1
1550000.00
98.00
29.00
80.00
.
.
.
.
1550000.00
98.00
29.00
80.00
1550000.00
98.00
29.00
80.00
Mexico1
Vehicle_Count
Pollution_Index
Avg_Speed
Transport_Score
1
1
1
1
1500000.00
118.00
22.00
59.00
.
.
.
.
1500000.00
118.00
22.00
59.00
1500000.00
118.00
22.00
59.00
Singapore1
Vehicle_Count
Pollution_Index
Avg_Speed
Transport_Score
1
1
1
1
650000.00
60.00
32.00
85.00
.
.
.
.
650000.00
60.00
32.00
85.00
650000.00
60.00
32.00
85.00
South Africa1
Vehicle_Count
Pollution_Index
Avg_Speed
Transport_Score
1
1
1
1
900000.00
92.00
24.00
63.00
.
.
.
.
900000.00
92.00
24.00
63.00
900000.00
92.00
24.00
63.00
UAE1
Vehicle_Count
Pollution_Index
Avg_Speed
Transport_Score
1
1
1
1
800000.00
72.00
29.00
79.00
.
.
.
.
800000.00
72.00
29.00
79.00
800000.00
72.00
29.00
79.00
USA2
Vehicle_Count
Pollution_Index
Avg_Speed
Transport_Score
2
2
2
2
1525000.00
107.50
26.50
71.00
106066.02
3.54
0.71
2.83
1450000.00
105.00
26.00
69.00
1600000.00
110.00
27.00
73.00
United Kingdom1
Vehicle_Count
Pollution_Index
Avg_Speed
Transport_Score
1
1
1
1
950000.00
75.00
30.00
78.00
.
.
.
.
950000.00
75.00
30.00
78.00
950000.00
75.00
30.00
78.00
ObsCountry_TYPE__FREQ_Vehicle_Count_MeanPollution_Index_MeanAvg_Speed_MeanTransport_Score_Mean
1 0181291666.6798.88925.111167.7222
2Australia11700000.0065.00031.000083.0000
3China121825000.00132.50019.500054.0000
4France11900000.0082.00028.000076.0000
5India161433333.33106.33322.500061.0000
6Japan111550000.0098.00029.000080.0000
7Mexico111500000.00118.00022.000059.0000
8Singapore11650000.0060.00032.000085.0000
9South Africa11900000.0092.00024.000063.0000
10UAE11800000.0072.00029.000079.0000
11USA121525000.00107.50026.500071.0000
12United Kingdom11950000.0075.00030.000078.0000

6. PROC SQL – EXAMPLE SUMMARY TABLE USING POLLUTION CATEGORY

proc sql;

    create table work.pollution_category_summary as

    select Pollution_Category, count(*) as City_Count,avg(Vehicle_Count) as Avg_Vehicles,

           avg(Pollution_Index) as Avg_Pollution_Index,

           avg(Avg_Speed) as Avg_Speed,

           avg(Transport_Score) as Avg_Transport_Score

   from work.city_traffic_enh

   group by Pollution_Category

   order by City_Count desc;

quit;    

proc print data=work.city_traffic_summary;

run;   

OUTPUT:

ObsCountry_TYPE__FREQ_Vehicle_Count_MeanPollution_Index_MeanAvg_Speed_MeanTransport_Score_Mean
1 0181291666.6798.88925.111167.7222
2Australia11700000.0065.00031.000083.0000
3China121825000.00132.50019.500054.0000
4France11900000.0082.00028.000076.0000
5India161433333.33106.33322.500061.0000
6Japan111550000.0098.00029.000080.0000
7Mexico111500000.00118.00022.000059.0000
8Singapore11650000.0060.00032.000085.0000
9South Africa11900000.0092.00024.000063.0000
10UAE11800000.0072.00029.000079.0000
11USA121525000.00107.50026.500071.0000
12United Kingdom11950000.0075.00030.000078.0000


6. PROC REG – MODEL POLLUTION AS FUNCTION OF TRAFFIC METRICS

proc reg data=work.city_traffic_enh outest=work.city_reg_estimates;

    model Pollution_Index = Vehicle_Count Avg_Speed Transport_Score;

    title "REGRESSION OF POLLUTION INDEX ON TRAFFIC AND TRANSPORT VARIABLES";

run;

quit;

title;

OUTPUT:

REGRESSION OF POLLUTION INDEX ON TRAFFIC AND TRANSPORT VARIABLES

The REG Procedure

Model: MODEL1

Dependent Variable: Pollution_Index

Number of Observations Read18
Number of Observations Used18
Analysis of Variance
SourceDFSum of
Squares
Mean
Square
F ValuePr > F
Model38739.024582913.00819122.56<.0001
Error14332.7532023.76809  
Corrected Total179071.77778   
Root MSE4.87525R-Square0.9633
Dependent Mean98.88889Adj R-Sq0.9555
Coeff Var4.93003  
Parameter Estimates
VariableDFParameter
Estimate
Standard
Error
t ValuePr > |t|
Intercept1110.1170816.223346.79<.0001
Vehicle_Count10.000037700.000004668.09<.0001
Avg_Speed11.175742.133050.550.5902
Transport_Score1-1.320890.77895-1.700.1121

REGRESSION OF POLLUTION INDEX ON TRAFFIC AND TRANSPORT VARIABLES

The REG Procedure

Model: MODEL1

Dependent Variable: Pollution_Index

Panel of fit diagnostics for Pollution_Index.
Panel of scatterplots of residuals by regressors for Pollution_Index.

7. PROC SGPLOT – SCATTER + REGRESSION LINE

proc sgplot data=work.city_traffic_enh;

    scatter x=Vehicle_Count 

            y=Pollution_Index 

            / group=Pollution_Category datalabel=City;

    reg     x=Vehicle_Count 

            y=Pollution_Index;

    xaxis label="Vehicle Count (approx. daily vehicles)";

    yaxis label="Pollution Index (AQI-like score)";

    title "RELATIONSHIP BETWEEN VEHICLE COUNT AND POLLUTION INDEX";

run;

title;

OUTPUT:

The SGPlot Procedure


8. PROC SGPLOT – BOXPLOT OF POLLUTION BY CATEGORY

proc sgplot data=work.city_traffic_enh;

    vbox Pollution_Index / category=Pollution_Category;

    yaxis label="Pollution Index";

    xaxis label="Pollution Category";

    title "DISTRIBUTION OF POLLUTION INDEX BY CATEGORY";

run;

title;

OUTPUT:
The SGPlot Procedure





To Visit My Previous Software Company Analysis Dataset:Click Here
To Visit My Previous Vote Program Dataset:Click Here
To Visit My Previous Audi Cars Performance Analysis Dataset:Click Here
To Visit My Previous Global Clothing Trends Dataset:Click Here  



Follow Us On : 


 


--- FOLLOW OUR BLOG FOR MORE INFORMATION.

--->PLEASE DO COMMENTS AND SHARE OUR BLOG.





Comments

Popular posts from this blog

409.Can We Build a Reliable Emergency Services Analytics & Fraud Detection System in SAS While Identifying and Fixing Intentional Errors?

397.If a satellite has excellent signal strength but very high latency, can it still deliver good quality communication? Why or why not?A Sas Study

401.How Efficient Are Global Data Centers? A Complete SAS Analytics Study