267.Which Bus Routes In India Carry The Most Passengers? Let Sas Reveal The Data

Which Bus Routes In India Carry The Most Passengers? Let Sas Reveal The Data

/*CREATING A DATASET OF BUS PASSENGERS OF INDIA*/

1) MASTER CONTROL OPTIONS 

options nocenter nodate nonumber missing=' ' formchar="|----|+|---|+=|-/\<>*";


2) USER FORMATS FOR CLEANER REPORTING    

proc format;

  value $gender   'M'='Male' 'F'='Female' 'O'='Other';

  value $tkt      'REG'='Regular' 'STU'='Student' 'SNR'='Senior' 'WKP'='Weekly Pass';

  value $pay      'CASH'='Cash' 'UPI'='UPI' 'CARD'='Card' 'WAL'='Wallet';

  value $seat     'S'='Seater' 'SL'='Sleeper';

  value $ac       'A'='AC' 'N'='Non-AC';

  value $btype    'EXP'='Express' 'LOC'='Local' 'VOL'='Volvo';

  value ageseg    low-17='Child (<=17)'

                  18-24='Youth (18-24)'

                  25-44='Adult (25-44)'

                  45-59='Mid-age (45-59)'

                  60-high='Senior (60+)';

  value sats      1='Very Low' 2='Low' 3='Medium' 4='High' 5='Very High';

run;

Log:

NOTE: Format $GENDER has been output.

NOTE: Format $TKT has been output.

NOTE: Format $PAY has been output.

NOTE: Format $SEAT has been output.

NOTE: Format $AC has been output.

NOTE: Format $BTYPE has been output.

NOTE: Format AGESEG has been output.

NOTE: Format SATS has been output.


3) LOOKUP TABLES          |

3.1 Bus Operators 

data operators;

  length Operator_ID 8 Operator_Name $30 HQ_City $20 Rating 8;

  infile datalines dsd truncover;

  input Operator_ID Operator_Name :$30. HQ_City :$20. Rating;

  datalines;

1,TSRTC,Hyderabad,4.2

2,APSRTC,Vijayawada,4.0

3,MSRTC,Mumbai,3.9

4,KSRTC_Bengaluru,Bengaluru,4.4

5,UPSRTC,Delhi,3.7

;

run;

proc print;run;

Output:

ObsOperator_IDOperator_NameHQ_CityRating
11TSRTCHyderabad4.2
22APSRTCVijayawada4.0
33MSRTCMumbai3.9
44KSRTC_BengaluruBengaluru4.4
55UPSRTCDelhi3.7

3.2 Routes (origin-destination pairs) 

data routes;

  length Route_ID 8 Origin $20 Destination $20 Distance_km 8 Std_Fare 8;

  infile datalines dsd;

  input Route_ID Origin :$20. Destination :$20. Distance_km Std_Fare;

  datalines;

101,Hyderabad,Warangal,150,280

102,Hyderabad,Vijayawada,275,520

103,Mumbai,Pune,150,300

104,Bengaluru,Mysuru,150,320

105,Delhi,Agra,230,480

106,Chennai,Vellore,140,260

107,Hyderabad,Karimnagar,165,300

108,Pune,Nashik,210,360

109,Bengaluru,Tumakuru,75,150

110,Mumbai,Nashik,170,340

;

run;

proc print;run;

Output:

ObsRoute_IDOriginDestinationDistance_kmStd_Fare
1101HyderabadWarangal150280
2102HyderabadVijayawada275520
3103MumbaiPune150300
4104BengaluruMysuru150320
5105DelhiAgra230480
6106ChennaiVellore140260
7107HyderabadKarimnagar165300
8108PuneNashik210360
9109BengaluruTumakuru75150
10110MumbaiNashik170340

4) PRIMARY FACT TABLE: PASSENGER TRIPS   

data passengers;

  length Trip_ID 8 Passenger_ID 8 Passenger_Name $28 Gender $1 Age 8

         City $20 Route_ID 8 Operator_ID 8 Ticket_Type $3 Fare_INR 8

         Distance_km 8 Boarding_Time 8 Alighting_Time 8

         Payment_Method $4 Seat_Type $2 AC_Flag $1 Bus_Type $3

         Trip_Date 8 Delay_Minutes 8 Satisfaction 8;

  format Gender $gender. Ticket_Type $tkt. Payment_Method $pay.

         Seat_Type $seat. AC_Flag $ac. Bus_Type $btype. Trip_Date date9.

         Boarding_Time datetime19. Alighting_Time datetime19.

         Age ageseg. Satisfaction sats.;

  infile datalines dsd truncover;

  input Trip_ID Passenger_ID Passenger_Name :$28. Gender $ Age City :$20.

        Route_ID Operator_ID Ticket_Type $ Fare_INR Distance_km

        Boarding_Time :datetime20. Alighting_Time :datetime20.

        Payment_Method $ Seat_Type $ AC_Flag $ Bus_Type $ Trip_Date :date9.

        Delay_Minutes Satisfaction;

  datalines;

1,1001,Anil Reddy,M,29,Hyderabad,101,1,REG,300,150,21JUL2025:07:10:00,21JUL2025:10:05:00,UPI,S,A,EXP,21JUL2025,5,4

2,1002,Neha Sharma,F,22,Hyderabad,102,2,STU,450,275,22JUL2025:06:40:00,22JUL2025:12:10:00,CARD,S,A,VOL,22JUL2025,10,5

3,1003,Rahul Patil,M,34,Mumbai,103,3,REG,320,150,19JUL2025:08:05:00,19JUL2025:11:05:00,UPI,S,N,LOC,19JUL2025,0,3

4,1004,Priyanka Rao,F,27,Bengaluru,104,4,REG,340,150,18JUL2025:09:00:00,18JUL2025:12:00:00,WAL,S,A,EXP,18JUL2025,7,4

5,1005,Amit Singh,M,41,Delhi,105,5,REG,500,230,17JUL2025:13:20:00,17JUL2025:17:50:00,CASH,SL,A,VOL,17JUL2025,20,2

6,1006,Sunita Iyer,F,63,Chennai,106,2,SNR,220,140,16JUL2025:06:30:00,16JUL2025:09:15:00,UPI,S,N,LOC,16JUL2025,3,4

7,1007,Farhan Shaikh,M,19,Mumbai,103,3,STU,250,150,21JUL2025:18:30:00,21JUL2025:21:30:00,WAL,S,A,EXP,21JUL2025,2,5

8,1008,Sruthi Nair,F,31,Bengaluru,109,4,WKP,140,75,21JUL2025:07:50:00,21JUL2025:09:10:00,UPI,S,N,LOC,21JUL2025,1,4

9,1009,Rakesh Kumar,M,47,Hyderabad,107,1,REG,310,165,20JUL2025:05:55:00,20JUL2025:09:05:00,CARD,SL,A,VOL,20JUL2025,0,3

10,1010,Kavya Joshi,F,25,Pune,108,3,REG,360,210,22JUL2025:10:30:00,22JUL2025:14:15:00,UPI,S,N,EXP,22JUL2025,4,4

11,1011,Abhinav Jain,M,33,Delhi,105,5,REG,480,230,21JUL2025:08:00:00,21JUL2025:12:30:00,UPI,S,A,EXP,21JUL2025,12,3

12,1012,Meena Kumari,F,58,Chennai,106,2,SNR,240,140,21JUL2025:15:45:00,21JUL2025:18:30:00,CASH,S,N,LOC,21JUL2025,0,5

13,1013,Arun Gupta,M,28,Hyderabad,102,2,REG,520,275,20JUL2025:20:00:00,21JUL2025:01:30:00,UPI,S,A,VOL,20JUL2025,15,4

14,1014,Sahana R,F,24,Bengaluru,104,4,STU,260,150,22JUL2025:16:40:00,22JUL2025:19:40:00,UPI,S,N,LOC,22JUL2025,0,5

15,1015,Vikram Mehta,M,39,Mumbai,110,3,REG,340,170,18JUL2025:06:20:00,18JUL2025:09:10:00,CARD,SL,A,EXP,18JUL2025,6,4

;

run;

proc print;run;

Output:

ObsTrip_IDPassenger_IDPassenger_NameGenderAgeCityRoute_IDOperator_IDTicket_TypeFare_INRDistance_kmBoarding_TimeAlighting_TimePayment_MethodSeat_TypeAC_FlagBus_TypeTrip_DateDelay_MinutesSatisfaction
111001Anil ReddyMaleAdult (25-44)Hyderabad1011Regular30015021JUL2025:07:10:0021JUL2025:10:05:00UPISeaterACExpress21JUL20255High
221002Neha SharmaFemaleYouth (18-24)Hyderabad1022Student45027522JUL2025:06:40:0022JUL2025:12:10:00CardSeaterACVolvo22JUL202510Very High
331003Rahul PatilMaleAdult (25-44)Mumbai1033Regular32015019JUL2025:08:05:0019JUL2025:11:05:00UPISeaterNon-ACLocal19JUL20250Medium
441004Priyanka RaoFemaleAdult (25-44)Bengaluru1044Regular34015018JUL2025:09:00:0018JUL2025:12:00:00WalletSeaterACExpress18JUL20257High
551005Amit SinghMaleAdult (25-44)Delhi1055Regular50023017JUL2025:13:20:0017JUL2025:17:50:00CashSleeperACVolvo17JUL202520Low
661006Sunita IyerFemaleSenior (60+)Chennai1062Senior22014016JUL2025:06:30:0016JUL2025:09:15:00UPISeaterNon-ACLocal16JUL20253High
771007Farhan ShaikhMaleYouth (18-24)Mumbai1033Student25015021JUL2025:18:30:0021JUL2025:21:30:00WalletSeaterACExpress21JUL20252Very High
881008Sruthi NairFemaleAdult (25-44)Bengaluru1094Weekly Pass1407521JUL2025:07:50:0021JUL2025:09:10:00UPISeaterNon-ACLocal21JUL20251High
991009Rakesh KumarMaleMid-age (45-59)Hyderabad1071Regular31016520JUL2025:05:55:0020JUL2025:09:05:00CardSleeperACVolvo20JUL20250Medium
10101010Kavya JoshiFemaleAdult (25-44)Pune1083Regular36021022JUL2025:10:30:0022JUL2025:14:15:00UPISeaterNon-ACExpress22JUL20254High
11111011Abhinav JainMaleAdult (25-44)Delhi1055Regular48023021JUL2025:08:00:0021JUL2025:12:30:00UPISeaterACExpress21JUL202512Medium
12121012Meena KumariFemaleMid-age (45-59)Chennai1062Senior24014021JUL2025:15:45:0021JUL2025:18:30:00CashSeaterNon-ACLocal21JUL20250Very High
13131013Arun GuptaMaleAdult (25-44)Hyderabad1022Regular52027520JUL2025:20:00:0021JUL2025:01:30:00UPISeaterACVolvo20JUL202515High
14141014Sahana RFemaleYouth (18-24)Bengaluru1044Student26015022JUL2025:16:40:0022JUL2025:19:40:00UPISeaterNon-ACLocal22JUL20250Very High
15151015Vikram MehtaMaleAdult (25-44)Mumbai1103Regular34017018JUL2025:06:20:0018JUL2025:09:10:00CardSleeperACExpress18JUL20256High


5) METADATA INSPECTION                   

proc contents data=passengers varnum; 

run;

Output:

The CONTENTS Procedure

Data Set NameWORK.PASSENGERSObservations15
Member TypeDATAVariables20
EngineV9Indexes0
Created08/21/2025 19:16:53Observation Length160
Last Modified08/21/2025 19:16:53Deleted Observations0
Protection CompressedNO
Data Set Type SortedNO
Label   
Data RepresentationSOLARIS_X86_64, LINUX_X86_64, ALPHA_TRU64, LINUX_IA64  
Encodingutf-8 Unicode (UTF-8)  
Engine/Host Dependent Information
Data Set Page Size131072
Number of Data Set Pages1
First Data Page1
Max Obs per Page818
Obs in First Data Page15
Number of Data Set Repairs0
Filename/saswork/SAS_work4EA100014E17_odaws01-apse1-2.oda.sas.com/SAS_work819100014E17_odaws01-apse1-2.oda.sas.com/passengers.sas7bdat
Release Created9.0401M8
Host CreatedLinux
Inode Number201326817
Access Permissionrw-r--r--
Owner Nameu63247146
File Size256KB
File Size (bytes)262144
Variables in Creation Order
#VariableTypeLenFormat
1Trip_IDNum8 
2Passenger_IDNum8 
3Passenger_NameChar28 
4GenderChar1$GENDER.
5AgeNum8AGESEG.
6CityChar20 
7Route_IDNum8 
8Operator_IDNum8 
9Ticket_TypeChar3$TKT.
10Fare_INRNum8 
11Distance_kmNum8 
12Boarding_TimeNum8DATETIME19.
13Alighting_TimeNum8DATETIME19.
14Payment_MethodChar4$PAY.
15Seat_TypeChar2$SEAT.
16AC_FlagChar1$AC.
17Bus_TypeChar3$BTYPE.
18Trip_DateNum8DATE9.
19Delay_MinutesNum8 
20SatisfactionNum8SATS.

6) ENRICH FACTS WITH LOOKUPS (SQL JOIN)  

proc sql outobs=5;

  create table trips as

  select p.*,

         o.Operator_Name, o.HQ_City, o.Rating as Operator_Rating,

         r.Origin, r.Destination, r.Distance_km as Route_Distance_km,

         r.Std_Fare as Route_Std_Fare

  from passengers p

  left join operators o on p.Operator_ID = o.Operator_ID

  left join routes r    on p.Route_ID    = r.Route_ID;

quit;

proc print;run;

Output:

ObsTrip_IDPassenger_IDPassenger_NameGenderAgeCityRoute_IDOperator_IDTicket_TypeFare_INRDistance_kmBoarding_TimeAlighting_TimePayment_MethodSeat_TypeAC_FlagBus_TypeTrip_DateDelay_MinutesSatisfactionOperator_NameHQ_CityOperator_RatingOriginDestinationRoute_Distance_kmRoute_Std_Fare
111001Anil ReddyMaleAdult (25-44)Hyderabad1011Regular30015021JUL2025:07:10:0021JUL2025:10:05:00UPISeaterACExpress21JUL20255HighTSRTCHyderabad4.2HyderabadWarangal150280
2131013Arun GuptaMaleAdult (25-44)Hyderabad1022Regular52027520JUL2025:20:00:0021JUL2025:01:30:00UPISeaterACVolvo20JUL202515HighAPSRTCVijayawada4.0HyderabadVijayawada275520
321002Neha SharmaFemaleYouth (18-24)Hyderabad1022Student45027522JUL2025:06:40:0022JUL2025:12:10:00CardSeaterACVolvo22JUL202510Very HighAPSRTCVijayawada4.0HyderabadVijayawada275520
471007Farhan ShaikhMaleYouth (18-24)Mumbai1033Student25015021JUL2025:18:30:0021JUL2025:21:30:00WalletSeaterACExpress21JUL20252Very HighMSRTCMumbai3.9MumbaiPune150300
531003Rahul PatilMaleAdult (25-44)Mumbai1033Regular32015019JUL2025:08:05:0019JUL2025:11:05:00UPISeaterNon-ACLocal19JUL20250MediumMSRTCMumbai3.9MumbaiPune150300


7) DERIVATIONS & QUALITY CHECKS          

data trips_enriched;

  set trips;

  /* Travel time in minutes */

  Travel_Minutes = intck('minute', Boarding_Time, Alighting_Time);

  /* On-time flag (<=5 min delay) */

  On_Time = (Delay_Minutes <= 5);

  /* Fare per km (observed) and deviation from standard */

  Fare_per_km = round(Fare_INR / Distance_km, .01);

  Std_Fare_per_km = round(Route_Std_Fare / Route_Distance_km, .01);

  Fare_Deviation = Fare_per_km - Std_Fare_per_km;

  /* Day part for operational analysis */

  length Day_Part $10;

  hour_b = hour(Boarding_Time);

  if 5 <= hour_b < 12 then Day_Part='Morning';

  else if 12 <= hour_b < 17 then Day_Part='Afternoon';

  else if 17 <= hour_b <= 23 then Day_Part='Evening';

  else Day_Part='Night';

  drop hour_b;

  label Travel_Minutes = "Trip Duration (mins)"

        On_Time        = "On-time (<=5 min delay)"

        Fare_per_km    = "Fare per km (INR)"

        Fare_Deviation = "Fare-per-km minus standard";

run;

proc print;run;

Output:

ObsTrip_IDPassenger_IDPassenger_NameGenderAgeCityRoute_IDOperator_IDTicket_TypeFare_INRDistance_kmBoarding_TimeAlighting_TimePayment_MethodSeat_TypeAC_FlagBus_TypeTrip_DateDelay_MinutesSatisfactionOperator_NameHQ_CityOperator_RatingOriginDestinationRoute_Distance_kmRoute_Std_FareTravel_MinutesOn_TimeFare_per_kmStd_Fare_per_kmFare_DeviationDay_Part
111001Anil ReddyMaleAdult (25-44)Hyderabad1011Regular30015021JUL2025:07:10:0021JUL2025:10:05:00UPISeaterACExpress21JUL20255HighTSRTCHyderabad4.2HyderabadWarangal15028017512.001.870.13Morning
2131013Arun GuptaMaleAdult (25-44)Hyderabad1022Regular52027520JUL2025:20:00:0021JUL2025:01:30:00UPISeaterACVolvo20JUL202515HighAPSRTCVijayawada4.0HyderabadVijayawada27552033001.891.890.00Evening
321002Neha SharmaFemaleYouth (18-24)Hyderabad1022Student45027522JUL2025:06:40:0022JUL2025:12:10:00CardSeaterACVolvo22JUL202510Very HighAPSRTCVijayawada4.0HyderabadVijayawada27552033001.641.89-0.25Morning
471007Farhan ShaikhMaleYouth (18-24)Mumbai1033Student25015021JUL2025:18:30:0021JUL2025:21:30:00WalletSeaterACExpress21JUL20252Very HighMSRTCMumbai3.9MumbaiPune15030018011.672.00-0.33Evening
531003Rahul PatilMaleAdult (25-44)Mumbai1033Regular32015019JUL2025:08:05:0019JUL2025:11:05:00UPISeaterNon-ACLocal19JUL20250MediumMSRTCMumbai3.9MumbaiPune15030018012.132.000.13Morning


/* Basic sort to prepare for reports */

proc sort data=trips_enriched out=trips_sorted;

  by City Trip_Date descending Fare_INR;

run;

proc print data=trips_sorted (obs=5);

run;

ObsTrip_IDPassenger_IDPassenger_NameGenderAgeCityRoute_IDOperator_IDTicket_TypeFare_INRDistance_kmBoarding_TimeAlighting_TimePayment_MethodSeat_TypeAC_FlagBus_TypeTrip_DateDelay_MinutesSatisfactionOperator_NameHQ_CityOperator_RatingOriginDestinationRoute_Distance_kmRoute_Std_FareTravel_MinutesOn_TimeFare_per_kmStd_Fare_per_kmFare_DeviationDay_Part
1131013Arun GuptaMaleAdult (25-44)Hyderabad1022Regular52027520JUL2025:20:00:0021JUL2025:01:30:00UPISeaterACVolvo20JUL202515HighAPSRTCVijayawada4.0HyderabadVijayawada27552033001.891.890.00Evening
211001Anil ReddyMaleAdult (25-44)Hyderabad1011Regular30015021JUL2025:07:10:0021JUL2025:10:05:00UPISeaterACExpress21JUL20255HighTSRTCHyderabad4.2HyderabadWarangal15028017512.001.870.13Morning
321002Neha SharmaFemaleYouth (18-24)Hyderabad1022Student45027522JUL2025:06:40:0022JUL2025:12:10:00CardSeaterACVolvo22JUL202510Very HighAPSRTCVijayawada4.0HyderabadVijayawada27552033001.641.89-0.25Morning
431003Rahul PatilMaleAdult (25-44)Mumbai1033Regular32015019JUL2025:08:05:0019JUL2025:11:05:00UPISeaterNon-ACLocal19JUL20250MediumMSRTCMumbai3.9MumbaiPune15030018012.132.000.13Morning
571007Farhan ShaikhMaleYouth (18-24)Mumbai1033Student25015021JUL2025:18:30:0021JUL2025:21:30:00WalletSeaterACExpress21JUL20252Very HighMSRTCMumbai3.9MumbaiPune15030018011.672.00-0.33Evening


8) QUICK VIEWS                           

proc print data=trips_sorted(obs=10) label noobs;

  title "Sample of Enriched Trips — First 10 Rows";

run;

title;

Output:

Sample of Enriched Trips — First 15 Rows

Trip_IDPassenger_IDPassenger_NameGenderAgeCityRoute_IDOperator_IDTicket_TypeFare_INRDistance_kmBoarding_TimeAlighting_TimePayment_MethodSeat_TypeAC_FlagBus_TypeTrip_DateDelay_MinutesSatisfactionOperator_NameHQ_CityOperator_RatingOriginDestinationRoute_Distance_kmRoute_Std_FareTrip Duration (mins)On-time (<=5 min delay)Fare per km (INR)Std_Fare_per_kmFare-per-km minus standardDay_Part
131013Arun GuptaMaleAdult (25-44)Hyderabad1022Regular52027520JUL2025:20:00:0021JUL2025:01:30:00UPISeaterACVolvo20JUL202515HighAPSRTCVijayawada4.0HyderabadVijayawada27552033001.891.890.00Evening
11001Anil ReddyMaleAdult (25-44)Hyderabad1011Regular30015021JUL2025:07:10:0021JUL2025:10:05:00UPISeaterACExpress21JUL20255HighTSRTCHyderabad4.2HyderabadWarangal15028017512.001.870.13Morning
21002Neha SharmaFemaleYouth (18-24)Hyderabad1022Student45027522JUL2025:06:40:0022JUL2025:12:10:00CardSeaterACVolvo22JUL202510Very HighAPSRTCVijayawada4.0HyderabadVijayawada27552033001.641.89-0.25Morning
31003Rahul PatilMaleAdult (25-44)Mumbai1033Regular32015019JUL2025:08:05:0019JUL2025:11:05:00UPISeaterNon-ACLocal19JUL20250MediumMSRTCMumbai3.9MumbaiPune15030018012.132.000.13Morning
71007Farhan ShaikhMaleYouth (18-24)Mumbai1033Student25015021JUL2025:18:30:0021JUL2025:21:30:00WalletSeaterACExpress21JUL20252Very HighMSRTCMumbai3.9MumbaiPune15030018011.672.00-0.33Evening

9) FREQUENCY & DISTRIBUTION ANALYSIS     

9.1 Ticket type by city 

proc freq data=trips_enriched;

  tables City*Ticket_Type / nocol nopercent norow;

  title "Ticket Type Mix by City";

run;

Output:

Ticket Type Mix by City

The FREQ Procedure

Frequency
Table of City by Ticket_Type
CityTicket_Type
RegularStudentTotal
Hyderabad
2
1
3
Mumbai
1
1
2
Total
3
2
5

9.2 Payment method and AC preference 

proc freq data=trips_enriched;

  tables Payment_Method*AC_Flag / chisq;

  title "Payment Method vs AC/Non-AC";

run;

Output:

Payment Method vs AC/Non-AC

The FREQ Procedure

Frequency
Percent
Row Pct
Col Pct
Table of Payment_Method by AC_Flag
Payment_MethodAC_Flag
ACNon-ACTotal
Card
1
20.00
100.00
25.00
0
0.00
0.00
0.00
1
20.00
 
 
UPI
2
40.00
66.67
50.00
1
20.00
33.33
100.00
3
60.00
 
 
Wallet
1
20.00
100.00
25.00
0
0.00
0.00
0.00
1
20.00
 
 
Total
4
80.00
1
20.00
5
100.00

Statistics for Table of Payment_Method by AC_Flag

StatisticDFValueProb
WARNING: 100% of the cells have expected counts less
than 5. Chi-Square may not be a valid test.
Chi-Square20.83330.6592
Likelihood Ratio Chi-Square21.18490.5530
Mantel-Haenszel Chi-Square10.00001.0000
Phi Coefficient 0.4082 
Contingency Coefficient 0.3780 
Cramer's V 0.4082 

Sample Size = 5

 9.3 Age distribution and satisfaction 

proc univariate data=trips_enriched noprint;

  var Fare_INR Travel_Minutes Delay_Minutes;

  histogram Fare_INR Travel_Minutes / normal;

  inset n mean std min max / position=ne;

run;

Output:

The UNIVARIATE Procedure

Histogram for Fare_INR


The UNIVARIATE Procedure

Fitted Normal Distribution for Fare_INR

Parameters for Normal Distribution
ParameterSymbolEstimate
MeanMu368
Std DevSigma112.5611
Goodness-of-Fit Tests for Normal Distribution
TestStatisticp Value
Kolmogorov-SmirnovD0.26510458Pr > D>0.150
Cramer-von MisesW-Sq0.05149527Pr > W-Sq>0.250
Anderson-DarlingA-Sq0.29400112Pr > A-Sq>0.250
Quantiles for Normal Distribution
PercentQuantile
ObservedEstimated
1.0250.000106.144
5.0250.000182.853
10.0250.000223.747
25.0300.000292.079
50.0320.000368.000
75.0450.000443.921
90.0520.000512.253
95.0520.000553.147
99.0520.000629.856

The UNIVARIATE Procedure

Histogram for Travel_Minutes


The UNIVARIATE Procedure

Fitted Normal Distribution for Travel_Minutes (Trip Duration (mins))

Parameters for Normal Distribution
ParameterSymbolEstimate
MeanMu239
Std DevSigma83.09633
Goodness-of-Fit Tests for Normal Distribution
TestStatisticp Value
Kolmogorov-SmirnovD0.36115392Pr > D0.031
Cramer-von MisesW-Sq0.13115599Pr > W-Sq0.029
Anderson-DarlingA-Sq0.75927167Pr > A-Sq0.019
Quantiles for Normal Distribution
PercentQuantile
ObservedEstimated
1.0175.00045.6890
5.0175.000102.3187
10.0175.000132.5078
25.0180.000182.9524
50.0180.000239.0000
75.0330.000295.0476
90.0330.000345.4922
95.0330.000375.6813
99.0330.000432.3110

10) NUMERIC SUMMARIES                    

proc means data=trips_enriched n mean std min p25 median p75 max maxdec=1;

  class City Bus_Type;

  var Fare_INR Distance_km Travel_Minutes Delay_Minutes Satisfaction;

  title "Key Measures by City and Bus Type";

run;

Output:

Key Measures by City and Bus Type

The MEANS Procedure

CityBus_TypeN ObsVariableLabelNMeanStd DevMinimum25th PctlMedian75th PctlMaximum
HyderabadExpress1
Fare_INR
Distance_km
Travel_Minutes
Delay_Minutes
Satisfaction
 
 
Trip Duration (mins)
 
 
1
1
1
1
1
300.0
150.0
175.0
5.0
4.0
 
 
 
 
 
300.0
150.0
175.0
5.0
4.0
300.0
150.0
175.0
5.0
4.0
300.0
150.0
175.0
5.0
4.0
300.0
150.0
175.0
5.0
4.0
300.0
150.0
175.0
5.0
4.0
 Volvo2
Fare_INR
Distance_km
Travel_Minutes
Delay_Minutes
Satisfaction
 
 
Trip Duration (mins)
 
 
2
2
2
2
2
485.0
275.0
330.0
12.5
4.5
49.5
0.0
0.0
3.5
0.7
450.0
275.0
330.0
10.0
4.0
450.0
275.0
330.0
10.0
4.0
485.0
275.0
330.0
12.5
4.5
520.0
275.0
330.0
15.0
5.0
520.0
275.0
330.0
15.0
5.0
MumbaiExpress1
Fare_INR
Distance_km
Travel_Minutes
Delay_Minutes
Satisfaction
 
 
Trip Duration (mins)
 
 
1
1
1
1
1
250.0
150.0
180.0
2.0
5.0
 
 
 
 
 
250.0
150.0
180.0
2.0
5.0
250.0
150.0
180.0
2.0
5.0
250.0
150.0
180.0
2.0
5.0
250.0
150.0
180.0
2.0
5.0
250.0
150.0
180.0
2.0
5.0
 Local1
Fare_INR
Distance_km
Travel_Minutes
Delay_Minutes
Satisfaction
 
 
Trip Duration (mins)
 
 
1
1
1
1
1
320.0
150.0
180.0
0.0
3.0
 
 
 
 
 
320.0
150.0
180.0
0.0
3.0
320.0
150.0
180.0
0.0
3.0
320.0
150.0
180.0
0.0
3.0
320.0
150.0
180.0
0.0
3.0
320.0
150.0
180.0
0.0
3.0

proc summary data=trips_enriched nway;

  class Operator_Name City;

  var Fare_INR Delay_Minutes;

  output out=op_city_summary(drop=_type_ _freq_) 

         n=Trips sum(Fare_INR)=Revenue_INR mean(Delay_Minutes)=Avg_Delay_Min;

run;

proc print;run;

Output:

ObsOperator_NameCityTripsRevenue_INRAvg_Delay_Min
1APSRTCHyderabad297012.5
2MSRTCMumbai25701.0
3TSRTCHyderabad13005.0


10.1 Rank operators by revenue within city 

proc rank data=op_city_summary out=op_city_rank ties=low descending;

  by City;

  var Revenue_INR;

  ranks Rev_Rank;

run;

Output:

ObsOperator_NameCityTripsRevenue_INRAvg_Delay_MinRev_Rank
1APSRTCHyderabad297012.51
2MSRTCMumbai25701.01
3TSRTCHyderabad13005.01


11) TABULATE & REPORT 

proc tabulate data=trips_enriched format=8.1;

  class City Ticket_Type Bus_Type;

  var Fare_INR Travel_Minutes;

  table City,

        (Ticket_Type all)*(Fare_INR*(mean) Travel_Minutes*(mean))

        Bus_Type*(Fare_INR*(mean)) / misstext='0';

  title "Average Fare and Duration by Ticket Type and Bus Type (City-wise)";

run;

Output:

Average Fare and Duration by Ticket Type and Bus Type (City-wise)

 Ticket_TypeAllBus_Type
RegularStudentExpressLocalVolvo
Fare_INRTrip Duration (mins)Fare_INRTrip Duration (mins)Fare_INRTrip Duration (mins)Fare_INRFare_INRFare_INR
MeanMeanMeanMeanMeanMeanMeanMeanMean
City410.0252.5450.0330.0423.3278.3300.00485.0
Hyderabad
Mumbai320.0180.0250.0180.0285.0180.0250.0320.00

proc report data=op_city_rank nowd headline headskip;

  columns City Operator_Name Trips Revenue_INR Rev_Rank Avg_Delay_Min;

  define City / group "City";

  define Operator_Name / display "Operator";

  define Trips / analysis sum "Trips";

  define Revenue_INR / analysis sum "Revenue (INR)" format=comma10.;

  define Rev_Rank / display "Revenue Rank (Within City)";

  define Avg_Delay_Min / analysis mean format=6.1 "Avg Delay (min)";

  title "Operator Performance by City — Ranked by Revenue";

run;

Output:

Operator Performance by City — Ranked by Revenue

CityOperatorTripsRevenue (INR)Revenue Rank (Within City)Avg Delay (min)
HyderabadAPSRTC2970112.5
 TSRTC130015.0
MumbaiMSRTC257011.0

12) TRANSPOSE EXAMPLE (WIDE PIVOT) 

12.1 Payment method revenue by city — then pivot to wide 

proc sql ;

  create table pay_city as

  select City, Payment_Method, sum(Fare_INR) as Revenue_INR

  from trips_enriched

  group by City, Payment_Method;

quit;

proc print;run;

Output:

ObsCityPayment_MethodRevenue_INR
1HyderabadCard450
2HyderabadUPI820
3MumbaiUPI320
4MumbaiWallet250


proc sort data=pay_city; 

    by City Payment_Method;

 run;


proc transpose data=pay_city out=pay_city_wide prefix=Rev_;

  by City;

  id Payment_Method;

  var Revenue_INR;

run;

proc print data=pay_city_wide;

run;

Output:

ObsCity_NAME_Rev_CardRev_UPIRev_Wallet
1HyderabadRevenue_INR450820 
2MumbaiRevenue_INR 320250

13) PROC SQL: KPIs, MONTH BUCKET, VIEWS  

proc sql;

/*   Month bucket (using Trip_Date) */

  create table kpi_month as

  select put(intnx('month', Trip_Date, 0, 'b'), monyy7.) as Mon label='Month',

         City,

         count(*) as Trips,

         mean(Fare_INR) as Avg_Fare,

         sum(Fare_INR)  as Revenue_INR,

         mean(Delay_Minutes) as Avg_Delay

  from trips_enriched

  group by calculated Mon, City;


/*   Route profitability metric (fare vs route standard) */

  create table route_perf as

  select Route_ID, Origin, Destination,

         count(*) as Trips,

         mean(Fare_per_km) as Avg_Fare_per_km,

         mean(Std_Fare_per_km) as Std_per_km,

         mean(Fare_Deviation)  as Avg_Deviation

  from trips_enriched

  group by Route_ID, Origin, Destination

  order by Avg_Deviation desc;


proc print data=kpi_month;

run;

Output:

ObsMonCityTripsAvg_FareRevenue_INRAvg_Delay
1JUL2025Hyderabad3423.333127010
2JUL2025Mumbai2285.0005701

proc print data=route_perf ;

run;

Output:

ObsRoute_IDOriginDestinationTripsAvg_Fare_per_kmStd_per_kmAvg_Deviation
1101HyderabadWarangal12.0001.870.130
2103MumbaiPune21.9002.00-0.100
3102HyderabadVijayawada21.7651.89-0.125

/*   A view for frequent student riders (ticket=STU or WKP) */

  create view v_student_frequent as

  select Passenger_ID, Passenger_Name, City, Ticket_Type, count(*) as Trips

  from trips_enriched

  where Ticket_Type in ('STU', 'WKP')

  group by Passenger_ID, Passenger_Name, City, Ticket_Type

  having calculated Trips >= 2;

quit;

proc print data=v_student_frequent ;

run;

Log:

NOTE: No observations in data set WORK.V_STUDENT_FREQUENT.
NOTE: There were 2 observations read from the data set WORK.TRIPS_ENRICHED.
WHERE Ticket_Type in ('STU', 'WKP');
NOTE: There were 0 observations read from the data set WORK.V_STUDENT_FREQUENT.
NOTE: PROCEDURE PRINT used (Total process time):
real time 0.00 seconds
user cpu time 0.01 seconds
system cpu time 0.00 seconds
memory 5528.34k
OS Memory 29608.00k
Timestamp 08/21/2025 02:04:04 PM
Step Count 227 Switch Count 0
Page Faults 0
Page Reclaims 205
Page Swaps 0
Voluntary Context Switches 10
Involuntary Context Switches 0
Block Input Operations 0
Block Output Operations 0

14) VISUAL: SGPLOT                       

ods graphics on;

proc sgplot data=trips_enriched;

  vbox Fare_per_km / category=City;

  title "Fare per km — Distribution by City";

run;


proc sgplot data=trips_enriched;

  vbar City / response=Satisfaction stat=mean datalabel;

  title "Average Satisfaction by City";

run;

ods graphics off;

Output:

The SGPlot Procedure

The SGPlot Procedure

15) MACROS FOR AUTOMATION                

15.1 Macro to print a city-specific report with parameterized filters 

%macro city_report(city=, bus=, mindate=, maxdate=);

  %put NOTE: Running city_report for &city. bus=&bus. date=&mindate - &maxdate;

  title "City Report: &city (Bus=&bus, Dates=&mindate to &maxdate)";

  proc sql;

    create table _city as

    select *

    from trips_enriched

    where City="&city"

      %if %length(&bus) %then and Bus_Type="&bus";

      %if %length(&mindate) %then and Trip_Date >= "&mindate"d;

      %if %length(&maxdate) %then and Trip_Date <= "&maxdate"d;

    ;

  quit;


  proc means data=_city n mean std min p25 median p75 max;

    class Ticket_Type;

    var Fare_INR Travel_Minutes Delay_Minutes Satisfaction;

    title2 "Descriptive Stats by Ticket Type";

  run;


  proc freq data=_city;

    tables Payment_Method*AC_Flag / chisq;

    title2 "Payment vs AC Flag — &city";

  run;


  proc report data=_city nowd;

    columns Route_ID Origin Destination Bus_Type Ticket_Type Fare_INR Travel_Minutes Delay_Minutes Satisfaction;

    define Route_ID / group;

    define Origin / display;

    define Destination / display;

    define Bus_Type / display;

    define Ticket_Type / display;

    define Fare_INR / analysis mean format=8.1 "Avg Fare";

    define Travel_Minutes / analysis mean format=6.1 "Avg Minutes";

    define Delay_Minutes / analysis mean format=5.1 "Avg Delay";

    define Satisfaction / analysis mean format=4.1 "Avg Sat";

  run;

  title;

%mend;


15.2 Macro to generate operator leaderboard by KPI 

%macro operator_leaderboard(metric=Revenue_INR);

  proc sql;

    create table _op as

    select Operator_Name,

           sum(Fare_INR) as Revenue_INR,

           count(*) as Trips,

           mean(Satisfaction) as Avg_Sat,

           mean(Delay_Minutes) as Avg_Delay

    from trips_enriched

    group by Operator_Name;

  quit;


  %if %upcase(&metric)=REVENUE_INR %then %do;

    proc sort data=_op; by descending Revenue_INR; run;

  %end;

  %else %if %upcase(&metric)=TRIPS %then %do;

    proc sort data=_op; by descending Trips; run;

  %end;

  %else %if %upcase(&metric)=AVG_SAT %then %do;

    proc sort data=_op; by descending Avg_Sat; run;

  %end;

  %else %if %upcase(&metric)=AVG_DELAY %then %do;

    proc sort data=_op; by Avg_Delay; run;

  %end;


  proc print data=_op label noobs;

    label Revenue_INR='Revenue (INR)' Trips='Trips' Avg_Sat='Avg Satisfaction' Avg_Delay='Avg Delay (min)';

    title "Operator Leaderboard — Sorted by &metric";

  run; title;

%mend;


15.3 Macro loop to print common city snapshots 

%macro common_cities;

  %city_report(city=Hyderabad, bus=EXP, mindate=16JUL2025, maxdate=24JUL2025);

  %city_report(city=Mumbai,    bus=EXP, mindate=16JUL2025, maxdate=24JUL2025);

  %city_report(city=Bengaluru, bus=,    mindate=16JUL2025, maxdate=24JUL2025);

%mend;


%common_cities;

Output:

City Report: Hyderabad (Bus=EXP, Dates=16JUL2025 to 24JUL2025)

Descriptive Stats by Ticket Type

The MEANS Procedure

Ticket_TypeN ObsVariableLabelNMeanStd DevMinimum25th PctlMedian75th PctlMaximum
Regular1
Fare_INR
Travel_Minutes
Delay_Minutes
Satisfaction
 
Trip Duration (mins)
 
 
1
1
1
1
300.0000000
175.0000000
5.0000000
4.0000000
 
 
 
 
300.0000000
175.0000000
5.0000000
4.0000000
300.0000000
175.0000000
5.0000000
4.0000000
300.0000000
175.0000000
5.0000000
4.0000000
300.0000000
175.0000000
5.0000000
4.0000000
300.0000000
175.0000000
5.0000000
4.0000000

City Report: Hyderabad (Bus=EXP, Dates=16JUL2025 to 24JUL2025)

Payment vs AC Flag — Hyderabad

The FREQ Procedure

Frequency
Percent
Row Pct
Col Pct
Table of Payment_Method by AC_Flag
Payment_MethodAC_Flag
ACTotal
UPI
1
100.00
100.00
100.00
1
100.00
 
 
Total
1
100.00
1
100.00

City Report: Hyderabad (Bus=EXP, Dates=16JUL2025 to 24JUL2025)

Payment vs AC Flag — Hyderabad

Route_IDOriginDestinationBus_TypeTicket_TypeAvg FareAvg MinutesAvg DelayAvg Sat
101HyderabadWarangalExpressRegular300.0175.05.04.0

City Report: Mumbai (Bus=EXP, Dates=16JUL2025 to 24JUL2025)

Descriptive Stats by Ticket Type

The MEANS Procedure

Ticket_TypeN ObsVariableLabelNMeanStd DevMinimum25th PctlMedian75th PctlMaximum
Student1
Fare_INR
Travel_Minutes
Delay_Minutes
Satisfaction
 
Trip Duration (mins)
 
 
1
1
1
1
250.0000000
180.0000000
2.0000000
5.0000000
 
 
 
 
250.0000000
180.0000000
2.0000000
5.0000000
250.0000000
180.0000000
2.0000000
5.0000000
250.0000000
180.0000000
2.0000000
5.0000000
250.0000000
180.0000000
2.0000000
5.0000000
250.0000000
180.0000000
2.0000000
5.0000000

City Report: Mumbai (Bus=EXP, Dates=16JUL2025 to 24JUL2025)

Payment vs AC Flag — Mumbai

The FREQ Procedure

Frequency
Percent
Row Pct
Col Pct
Table of Payment_Method by AC_Flag
Payment_MethodAC_Flag
ACTotal
Wallet
1
100.00
100.00
100.00
1
100.00
 
 
Total
1
100.00
1
100.00

City Report: Mumbai (Bus=EXP, Dates=16JUL2025 to 24JUL2025)

Payment vs AC Flag — Mumbai

Route_IDOriginDestinationBus_TypeTicket_TypeAvg FareAvg MinutesAvg DelayAvg Sat
103MumbaiPuneExpressStudent250.0180.02.05.0

%operator_leaderboard(metric=Revenue_INR);

Output:

Operator Leaderboard — Sorted by Revenue_INR

Operator_NameRevenue (INR)TripsAvg SatisfactionAvg Delay (min)
APSRTC97024.512.5
MSRTC57024.01.0
TSRTC30014.05.0

%operator_leaderboard(metric=AVG_Delay);

Output:

Operator Leaderboard — Sorted by AVG_Delay

Operator_NameRevenue (INR)TripsAvg SatisfactionAvg Delay (min)
MSRTC57024.01.0
TSRTC30014.05.0
APSRTC97024.512.5


To Visit My Previous Online Streaming Flatform Dataset:Click Here
To Visit My Previous Statistical Evaluation Of Clinical Trials:Click Here
To Visit My Previous Unlocking Retail Insights Dataset:Click Here
To Visit My Previous Sas Interview Questions-1:Click Here





--->PLEASE FOLLOW OUR BLOG FOR MORE INFORMATION.
--->PLEASE DO COMMENTS AND SHARE OUR BLOG.

PLEASE FOLLOW OUR TELEGRAM CHANNEL CLICK HERE

PLEASE FOLLOW OUR FACEBOOK PAGE  CLICK HERE

PLEASE FOLLOW OUR INSTAGRAM PAGE CLICK HERE

Comments

Popular posts from this blog

409.Can We Build a Reliable Emergency Services Analytics & Fraud Detection System in SAS While Identifying and Fixing Intentional Errors?

397.If a satellite has excellent signal strength but very high latency, can it still deliver good quality communication? Why or why not?A Sas Study

401.How Efficient Are Global Data Centers? A Complete SAS Analytics Study