REALWORLD HIGHWAYS IN INDIA PROJECT DEMONSTRATED USING PROC PRINT | PROC MEANS | PROC FREQ | PROC SQL | PROC SORT | PROC REPORT | MACROS FOR CREATION, AGGREGATION, SUMMARIES, FILTERING, QC CHECKS AND AUTOMATED REPORTING WITH FULL ANALYSIS AND VALIDATION
1) Create raw dataset
options nocenter;
data work.india_highways;
infile datalines dlm='|' dsd truncover;
length HWY_ID $8 HWY_NAME $60 HWY_TYPE $6 START_CITY $30 END_CITY $30 STATES $120;
format LENGTH_KM 8.2;
input HWY_ID $ HWY_NAME $ HWY_TYPE $ START_CITY $ END_CITY $ LENGTH_KM STATES $;
datalines;
NH001|Delhi–Mumbai Expressway|NH|Delhi|Mumbai|1400.50|Delhi,Haryana,Rajasthan,Gujarat,Maharashtra
NH002|Bangalore–Chennai Highway|NH|Bangalore|Chennai|350.75|Karnataka,Tamil_Nadu
SH101|Mumbai–Pune Road|SH|Mumbai|Pune|150.40|Maharashtra
NH003|Kolkata–Durgapur Corridor|NH|Kolkata|Durgapur|180.20|West_Bengal
NH004|Amritsar–Jalandhar Link|NH|Amritsar|Jalandhar|80.00|Punjab
NH005|Agra–Lucknow Expressway|NH|Agra|Lucknow|302.00|Uttar_Pradesh
NH006|Hyderabad–Vijayawada Road|NH|Hyderabad|Vijayawada|275.30|Telangana,Andhra_Pradesh
SH102|Goa Coastal Road|SH|Panaji|Margao|60.10|Goa
NH007|Surat–Vadodara Highway|NH|Surat|Vadodara|150.00|Gujarat
NH008|Patna–Gaya Connector|NH|Patna|Gaya|92.80|Bihar
NH009|Chandigarh–Shimla Hill Road|NH|Chandigarh|Shimla|112.50|Punjab,Himachal_Pradesh
NH010|Bhopal–Indore Expressway|NH|Bhopal|Indore|200.25|Madhya_Pradesh
SH103|Siliguri Bypass|SH|Siliguri|Bagdogra|25.00|West_Bengal
NH011|Kochi–Calicut Coastal Link|NH|Kochi|Calicut|190.60|Kerala
NH012|Visakhapatnam–Vijayanagaram|NH|Visakhapatnam|Vijayanagaram|110.75|Andhra_Pradesh
;
run;
proc print;run;
Output:
| Obs | HWY_ID | HWY_NAME | HWY_TYPE | START_CITY | END_CITY | STATES | LENGTH_KM |
|---|---|---|---|---|---|---|---|
| 1 | NH001 | Delhi–Mumbai Expressway | NH | Delhi | Mumbai | Delhi,Haryana,Rajasthan,Gujarat,Maharashtra | 1400.50 |
| 2 | NH002 | Bangalore–Chennai Highway | NH | Bangalore | Chennai | Karnataka,Tamil_Nadu | 350.75 |
| 3 | SH101 | Mumbai–Pune Road | SH | Mumbai | Pune | Maharashtra | 150.40 |
| 4 | NH003 | Kolkata–Durgapur Corridor | NH | Kolkata | Durgapur | West_Bengal | 180.20 |
| 5 | NH004 | Amritsar–Jalandhar Link | NH | Amritsar | Jalandhar | Punjab | 80.00 |
| 6 | NH005 | Agra–Lucknow Expressway | NH | Agra | Lucknow | Uttar_Pradesh | 302.00 |
| 7 | NH006 | Hyderabad–Vijayawada Road | NH | Hyderabad | Vijayawada | Telangana,Andhra_Pradesh | 275.30 |
| 8 | SH102 | Goa Coastal Road | SH | Panaji | Margao | Goa | 60.10 |
| 9 | NH007 | Surat–Vadodara Highway | NH | Surat | Vadodara | Gujarat | 150.00 |
| 10 | NH008 | Patna–Gaya Connector | NH | Patna | Gaya | Bihar | 92.80 |
| 11 | NH009 | Chandigarh–Shimla Hill Road | NH | Chandigarh | Shimla | Punjab,Himachal_Pradesh | 112.50 |
| 12 | NH010 | Bhopal–Indore Expressway | NH | Bhopal | Indore | Madhya_Pradesh | 200.25 |
| 13 | SH103 | Siliguri Bypass | SH | Siliguri | Bagdogra | West_Bengal | 25.00 |
| 14 | NH011 | Kochi–Calicut Coastal Link | NH | Kochi | Calicut | Kerala | 190.60 |
| 15 | NH012 | Visakhapatnam–Vijayanagaram | NH | Visakhapatnam | Vijayanagaram | Andhra_Pradesh | 110.75 |
2) Quick print to inspect (PROC PRINT)
Purpose: Quick tabular inspection of the raw dataset to ensure correct import.
proc print data=work.india_highways(obs=10) noobs label;
title "India Highways - Raw Data";
run;
Output:
| HWY_ID | HWY_NAME | HWY_TYPE | START_CITY | END_CITY | STATES | LENGTH_KM |
|---|---|---|---|---|---|---|
| NH001 | Delhi–Mumbai Expressway | NH | Delhi | Mumbai | Delhi,Haryana,Rajasthan,Gujarat,Maharashtra | 1400.50 |
| NH002 | Bangalore–Chennai Highway | NH | Bangalore | Chennai | Karnataka,Tamil_Nadu | 350.75 |
| SH101 | Mumbai–Pune Road | SH | Mumbai | Pune | Maharashtra | 150.40 |
| NH003 | Kolkata–Durgapur Corridor | NH | Kolkata | Durgapur | West_Bengal | 180.20 |
| NH004 | Amritsar–Jalandhar Link | NH | Amritsar | Jalandhar | Punjab | 80.00 |
| NH005 | Agra–Lucknow Expressway | NH | Agra | Lucknow | Uttar_Pradesh | 302.00 |
| NH006 | Hyderabad–Vijayawada Road | NH | Hyderabad | Vijayawada | Telangana,Andhra_Pradesh | 275.30 |
| SH102 | Goa Coastal Road | SH | Panaji | Margao | Goa | 60.10 |
| NH007 | Surat–Vadodara Highway | NH | Surat | Vadodara | Gujarat | 150.00 |
| NH008 | Patna–Gaya Connector | NH | Patna | Gaya | Bihar | 92.80 |
3) Create a numeric variable for number of states traversed using DATA step + INSTR count
Purpose: Derive a numeric variable STATES_COUNT for aggregation and sorting.
data work.india_highways2;
set work.india_highways;
/* count states by counting commas + 1 (if not missing) */
if missing(STATES) then STATES_COUNT = 0;
else STATES_COUNT = countw(STATES, ',');
run;
proc print;run;
Output:
| Obs | HWY_ID | HWY_NAME | HWY_TYPE | START_CITY | END_CITY | STATES | LENGTH_KM | STATES_COUNT |
|---|---|---|---|---|---|---|---|---|
| 1 | NH001 | Delhi–Mumbai Expressway | NH | Delhi | Mumbai | Delhi,Haryana,Rajasthan,Gujarat,Maharashtra | 1400.50 | 5 |
| 2 | NH002 | Bangalore–Chennai Highway | NH | Bangalore | Chennai | Karnataka,Tamil_Nadu | 350.75 | 2 |
| 3 | SH101 | Mumbai–Pune Road | SH | Mumbai | Pune | Maharashtra | 150.40 | 1 |
| 4 | NH003 | Kolkata–Durgapur Corridor | NH | Kolkata | Durgapur | West_Bengal | 180.20 | 1 |
| 5 | NH004 | Amritsar–Jalandhar Link | NH | Amritsar | Jalandhar | Punjab | 80.00 | 1 |
| 6 | NH005 | Agra–Lucknow Expressway | NH | Agra | Lucknow | Uttar_Pradesh | 302.00 | 1 |
| 7 | NH006 | Hyderabad–Vijayawada Road | NH | Hyderabad | Vijayawada | Telangana,Andhra_Pradesh | 275.30 | 2 |
| 8 | SH102 | Goa Coastal Road | SH | Panaji | Margao | Goa | 60.10 | 1 |
| 9 | NH007 | Surat–Vadodara Highway | NH | Surat | Vadodara | Gujarat | 150.00 | 1 |
| 10 | NH008 | Patna–Gaya Connector | NH | Patna | Gaya | Bihar | 92.80 | 1 |
| 11 | NH009 | Chandigarh–Shimla Hill Road | NH | Chandigarh | Shimla | Punjab,Himachal_Pradesh | 112.50 | 2 |
| 12 | NH010 | Bhopal–Indore Expressway | NH | Bhopal | Indore | Madhya_Pradesh | 200.25 | 1 |
| 13 | SH103 | Siliguri Bypass | SH | Siliguri | Bagdogra | West_Bengal | 25.00 | 1 |
| 14 | NH011 | Kochi–Calicut Coastal Link | NH | Kochi | Calicut | Kerala | 190.60 | 1 |
| 15 | NH012 | Visakhapatnam–Vijayanagaram | NH | Visakhapatnam | Vijayanagaram | Andhra_Pradesh | 110.75 | 1 |
4) Format: define LENGTH categories using PROC FORMAT
Purpose: Create user-friendly categories for highway lengths to use in reports.
proc format;
value lengthcat
low - <100 = 'Short (<100 km)'
100 - <300 = 'Medium (100-299 km)'
300 - high = 'Long (>=300 km)';
run;
Log:
5) PROC MEANS for numeric summary
Purpose: Provide numerical summaries (count, mean, median, min, max, std) of numeric
vars.
proc means data=work.india_highways2 n mean median min max std maxdec=2;
var LENGTH_KM STATES_COUNT;
title "Numeric Summary: Lengths and States Count";
run;
Output:
The MEANS Procedure
| Variable | N | Mean | Median | Minimum | Maximum | Std Dev |
|---|---|---|---|---|---|---|
LENGTH_KM STATES_COUNT | 15 15 | 245.41 1.47 | 150.40 1.00 | 25.00 1.00 | 1400.50 5.00 | 332.33 1.06 |
6) PROC FREQ for categorical summaries
Purpose: Frequency distributions for types and cross-tab with length categories.
proc freq data=work.india_highways2;
tables HWY_TYPE / nocum norow nopercent;
format LENGTH_KM lengthcat.;
tables LENGTH_KM*HWY_TYPE / list;
title "Categorical Summary: Highway Type and Length categories";
run;
Output:
The FREQ Procedure
| HWY_TYPE | Frequency |
|---|---|
| NH | 12 |
| SH | 3 |
| LENGTH_KM | HWY_TYPE | Frequency | Percent | Cumulative Frequency | Cumulative Percent |
|---|---|---|---|---|---|
| Short (<100 km) | NH | 2 | 13.33 | 2 | 13.33 |
| Short (<100 km) | SH | 2 | 13.33 | 4 | 26.67 |
| Medium (100-299 km) | NH | 7 | 46.67 | 11 | 73.33 |
| Medium (100-299 km) | SH | 1 | 6.67 | 12 | 80.00 |
| Long (>=300 km) | NH | 3 | 20.00 | 15 | 100.00 |
7) PROC SQL: aggregate by HWY_TYPE
Purpose: Aggregate totals and averages by highway type using PROC SQL.
proc sql;
create table work.type_summary as
select HWY_TYPE,
count(*) as Num_Highways,
sum(LENGTH_KM) as Total_Length format=8.2,
mean(LENGTH_KM) as Avg_Length format=8.2
from work.india_highways2
group by HWY_TYPE
order by Total_Length desc;
quit;
proc print;run;
Output:
| Obs | HWY_TYPE | Num_Highways | Total_Length | Avg_Length |
|---|---|---|---|---|
| 1 | NH | 12 | 3445.65 | 287.14 |
| 2 | SH | 3 | 235.50 | 78.50 |
8) PROC SORT then PROC REPORT
Purpose: Sort highways by length (descending) for prioritized reports.
proc sort data=work.india_highways2 out=work.high_sorted;
by descending LENGTH_KM;
run;
proc print;run;
Output:
| Obs | HWY_ID | HWY_NAME | HWY_TYPE | START_CITY | END_CITY | STATES | LENGTH_KM | STATES_COUNT |
|---|---|---|---|---|---|---|---|---|
| 1 | NH001 | Delhi–Mumbai Expressway | NH | Delhi | Mumbai | Delhi,Haryana,Rajasthan,Gujarat,Maharashtra | 1400.50 | 5 |
| 2 | NH002 | Bangalore–Chennai Highway | NH | Bangalore | Chennai | Karnataka,Tamil_Nadu | 350.75 | 2 |
| 3 | NH005 | Agra–Lucknow Expressway | NH | Agra | Lucknow | Uttar_Pradesh | 302.00 | 1 |
| 4 | NH006 | Hyderabad–Vijayawada Road | NH | Hyderabad | Vijayawada | Telangana,Andhra_Pradesh | 275.30 | 2 |
| 5 | NH010 | Bhopal–Indore Expressway | NH | Bhopal | Indore | Madhya_Pradesh | 200.25 | 1 |
| 6 | NH011 | Kochi–Calicut Coastal Link | NH | Kochi | Calicut | Kerala | 190.60 | 1 |
| 7 | NH003 | Kolkata–Durgapur Corridor | NH | Kolkata | Durgapur | West_Bengal | 180.20 | 1 |
| 8 | SH101 | Mumbai–Pune Road | SH | Mumbai | Pune | Maharashtra | 150.40 | 1 |
| 9 | NH007 | Surat–Vadodara Highway | NH | Surat | Vadodara | Gujarat | 150.00 | 1 |
| 10 | NH009 | Chandigarh–Shimla Hill Road | NH | Chandigarh | Shimla | Punjab,Himachal_Pradesh | 112.50 | 2 |
| 11 | NH012 | Visakhapatnam–Vijayanagaram | NH | Visakhapatnam | Vijayanagaram | Andhra_Pradesh | 110.75 | 1 |
| 12 | NH008 | Patna–Gaya Connector | NH | Patna | Gaya | Bihar | 92.80 | 1 |
| 13 | NH004 | Amritsar–Jalandhar Link | NH | Amritsar | Jalandhar | Punjab | 80.00 | 1 |
| 14 | SH102 | Goa Coastal Road | SH | Panaji | Margao | Goa | 60.10 | 1 |
| 15 | SH103 | Siliguri Bypass | SH | Siliguri | Bagdogra | West_Bengal | 25.00 | 1 |
Purpose: Produce a formatted report combining multiple variables for a compact summary.
proc report data=work.high_sorted nowd;
columns HWY_ID HWY_NAME HWY_TYPE START_CITY END_CITY LENGTH_KM STATES_COUNT STATES;
define LENGTH_KM / analysis sum format=8.2 'Length (km)';
define STATES_COUNT / analysis sum 'States Count';
title "Detailed Highway Report (sorted by length)";
run;
Output:
| HWY_ID | HWY_NAME | HWY_TYPE | START_CITY | END_CITY | Length (km) | States Count | STATES |
|---|---|---|---|---|---|---|---|
| NH001 | Delhi–Mumbai Expressway | NH | Delhi | Mumbai | 1400.50 | 5 | Delhi,Haryana,Rajasthan,Gujarat,Maharashtra |
| NH002 | Bangalore–Chennai Highway | NH | Bangalore | Chennai | 350.75 | 2 | Karnataka,Tamil_Nadu |
| NH005 | Agra–Lucknow Expressway | NH | Agra | Lucknow | 302.00 | 1 | Uttar_Pradesh |
| NH006 | Hyderabad–Vijayawada Road | NH | Hyderabad | Vijayawada | 275.30 | 2 | Telangana,Andhra_Pradesh |
| NH010 | Bhopal–Indore Expressway | NH | Bhopal | Indore | 200.25 | 1 | Madhya_Pradesh |
| NH011 | Kochi–Calicut Coastal Link | NH | Kochi | Calicut | 190.60 | 1 | Kerala |
| NH003 | Kolkata–Durgapur Corridor | NH | Kolkata | Durgapur | 180.20 | 1 | West_Bengal |
| SH101 | Mumbai–Pune Road | SH | Mumbai | Pune | 150.40 | 1 | Maharashtra |
| NH007 | Surat–Vadodara Highway | NH | Surat | Vadodara | 150.00 | 1 | Gujarat |
| NH009 | Chandigarh–Shimla Hill Road | NH | Chandigarh | Shimla | 112.50 | 2 | Punjab,Himachal_Pradesh |
| NH012 | Visakhapatnam–Vijayanagaram | NH | Visakhapatnam | Vijayanagaram | 110.75 | 1 | Andhra_Pradesh |
| NH008 | Patna–Gaya Connector | NH | Patna | Gaya | 92.80 | 1 | Bihar |
| NH004 | Amritsar–Jalandhar Link | NH | Amritsar | Jalandhar | 80.00 | 1 | Punjab |
| SH102 | Goa Coastal Road | SH | Panaji | Margao | 60.10 | 1 | Goa |
| SH103 | Siliguri Bypass | SH | Siliguri | Bagdogra | 25.00 | 1 | West_Bengal |
9) Macro: filter dataset by type and produce summary (macro parameter: type=)
Purpose: Reusable macro to create a filtered table and numeric summary for a given
highway type.
%macro summary_by_type(type=NH);
%put NOTE: Running summary for type=&type;
proc sql;
create table work.&type._highways as
select * from work.india_highways2
where upcase(HWY_TYPE)=upcase("&type")
order by LENGTH_KM desc;
quit;
proc means data=work.&type._highways n mean median min max maxdec=2;
var LENGTH_KM STATES_COUNT;
title "Macro Summary: Type=&type - Numeric Summary";
run;
proc print data=work.&type._highways noobs;
title "Macro Output: All highways of type &type";
run;
%mend summary_by_type;
%summary_by_type(type=NH);
Output:
The MEANS Procedure
| Variable | N | Mean | Median | Minimum | Maximum |
|---|---|---|---|---|---|
LENGTH_KM STATES_COUNT | 12 12 | 287.14 1.58 | 185.40 1.00 | 80.00 1.00 | 1400.50 5.00 |
| HWY_ID | HWY_NAME | HWY_TYPE | START_CITY | END_CITY | STATES | LENGTH_KM | STATES_COUNT |
|---|---|---|---|---|---|---|---|
| NH001 | Delhi–Mumbai Expressway | NH | Delhi | Mumbai | Delhi,Haryana,Rajasthan,Gujarat,Maharashtra | 1400.50 | 5 |
| NH002 | Bangalore–Chennai Highway | NH | Bangalore | Chennai | Karnataka,Tamil_Nadu | 350.75 | 2 |
| NH005 | Agra–Lucknow Expressway | NH | Agra | Lucknow | Uttar_Pradesh | 302.00 | 1 |
| NH006 | Hyderabad–Vijayawada Road | NH | Hyderabad | Vijayawada | Telangana,Andhra_Pradesh | 275.30 | 2 |
| NH010 | Bhopal–Indore Expressway | NH | Bhopal | Indore | Madhya_Pradesh | 200.25 | 1 |
| NH011 | Kochi–Calicut Coastal Link | NH | Kochi | Calicut | Kerala | 190.60 | 1 |
| NH003 | Kolkata–Durgapur Corridor | NH | Kolkata | Durgapur | West_Bengal | 180.20 | 1 |
| NH007 | Surat–Vadodara Highway | NH | Surat | Vadodara | Gujarat | 150.00 | 1 |
| NH009 | Chandigarh–Shimla Hill Road | NH | Chandigarh | Shimla | Punjab,Himachal_Pradesh | 112.50 | 2 |
| NH012 | Visakhapatnam–Vijayanagaram | NH | Visakhapatnam | Vijayanagaram | Andhra_Pradesh | 110.75 | 1 |
| NH008 | Patna–Gaya Connector | NH | Patna | Gaya | Bihar | 92.80 | 1 |
| NH004 | Amritsar–Jalandhar Link | NH | Amritsar | Jalandhar | Punjab | 80.00 | 1 |
%summary_by_type(type=SH);
Output:
The MEANS Procedure
| Variable | N | Mean | Median | Minimum | Maximum |
|---|---|---|---|---|---|
LENGTH_KM STATES_COUNT | 3 3 | 78.50 1.00 | 60.10 1.00 | 25.00 1.00 | 150.40 1.00 |
| HWY_ID | HWY_NAME | HWY_TYPE | START_CITY | END_CITY | STATES | LENGTH_KM | STATES_COUNT |
|---|---|---|---|---|---|---|---|
| SH101 | Mumbai–Pune Road | SH | Mumbai | Pune | Maharashtra | 150.40 | 1 |
| SH102 | Goa Coastal Road | SH | Panaji | Margao | Goa | 60.10 | 1 |
| SH103 | Siliguri Bypass | SH | Siliguri | Bagdogra | West_Bengal | 25.00 | 1 |
10) Macro: create LENGTH_CATEGORY var and output a crosswalk table
Purpose: Macro to derive a categorical length variable and summarize counts/averages.
%macro create_length_cat(in=work.india_highways2, out=work.high_cat);
data &out;
set ∈
if LENGTH_KM < 100 then LENGTH_CAT='Short (<100)';
else if LENGTH_KM < 300 then LENGTH_CAT='Medium (100-299)';
else LENGTH_CAT='Long (>=300)';
run;
proc sql;
create table &out._summary as
select LENGTH_CAT, count(*) as Count_Hwys, mean(LENGTH_KM) as Avg_Length format=8.2
from &out
group by LENGTH_CAT;
quit;
proc print;run;
%mend create_length_cat;
%create_length_cat(in=work.india_highways2, out=work.high_cat);
Output:
| Obs | LENGTH_CAT | Count_Hwys | Avg_Length |
|---|---|---|---|
| 1 | Long (>=300) | 3 | 684.42 |
| 2 | Medium (100- | 8 | 171.25 |
| 3 | Short (<100) | 4 | 64.48 |
11) PROC SQL to find top 5 longest highways
Purpose: Extract top N (here 5) longest highways for focused review.
proc sql outobs=5;
create table work.top5_longest as
select HWY_ID, HWY_NAME, LENGTH_KM, STATES
from work.india_highways2
order by LENGTH_KM desc;
quit;
proc print;run;
Output:
| Obs | HWY_ID | HWY_NAME | LENGTH_KM | STATES |
|---|---|---|---|---|
| 1 | NH001 | Delhi–Mumbai Expressway | 1400.50 | Delhi,Haryana,Rajasthan,Gujarat,Maharashtra |
| 2 | NH002 | Bangalore–Chennai Highway | 350.75 | Karnataka,Tamil_Nadu |
| 3 | NH005 | Agra–Lucknow Expressway | 302.00 | Uttar_Pradesh |
| 4 | NH006 | Hyderabad–Vijayawada Road | 275.30 | Telangana,Andhra_Pradesh |
| 5 | NH010 | Bhopal–Indore Expressway | 200.25 | Madhya_Pradesh |
12) Basic QC: check duplicates on HWY_ID
Purpose: Identify duplicate HWY_IDs (if any) for data integrity checks.
proc sort data=work.india_highways2 out=work.qc_sorted nodupkey dupout=work.dup_hwys;
by HWY_ID;
run;
proc print data=work.dup_hwys noobs;
title "QC: Duplicate HWY_IDs (if present)";
run;
Log:
No comments:
Post a Comment