Thursday, 21 August 2025

267.BUS PASSENGERS OF INDIA — DATA CREATION | PROC FORMAT | PROC CONTENTS | PROC SORT | PROC PRINT | PROC FREQ | PROC MEANS | PROC SUMMARY | PROC UNIVARIATE | PROC TABULATE |PROC REPORT | PROC TRANSPOSE | PROC SQL | PROC RANK | PROC SGPLOT | MACROS — IN SAS

BUS PASSENGERS OF INDIA — DATA CREATION | PROC FORMAT | PROC CONTENTS | PROC SORT | PROC PRINT | PROC FREQ | PROC MEANS | PROC SUMMARY | PROC UNIVARIATE | PROC TABULATE |PROC REPORT | PROC TRANSPOSE | PROC SQL | PROC RANK | PROC SGPLOT | MACROS — IN SAS

/*CREATING A DATASET OF BUS PASSENGERS OF INDIA*/

1) MASTER CONTROL OPTIONS 

options nocenter nodate nonumber missing=' ' formchar="|----|+|---|+=|-/\<>*";


2) USER FORMATS FOR CLEANER REPORTING    

proc format;

  value $gender   'M'='Male' 'F'='Female' 'O'='Other';

  value $tkt      'REG'='Regular' 'STU'='Student' 'SNR'='Senior' 'WKP'='Weekly Pass';

  value $pay      'CASH'='Cash' 'UPI'='UPI' 'CARD'='Card' 'WAL'='Wallet';

  value $seat     'S'='Seater' 'SL'='Sleeper';

  value $ac       'A'='AC' 'N'='Non-AC';

  value $btype    'EXP'='Express' 'LOC'='Local' 'VOL'='Volvo';

  value ageseg    low-17='Child (<=17)'

                  18-24='Youth (18-24)'

                  25-44='Adult (25-44)'

                  45-59='Mid-age (45-59)'

                  60-high='Senior (60+)';

  value sats      1='Very Low' 2='Low' 3='Medium' 4='High' 5='Very High';

run;

Log:

NOTE: Format $GENDER has been output.

NOTE: Format $TKT has been output.

NOTE: Format $PAY has been output.

NOTE: Format $SEAT has been output.

NOTE: Format $AC has been output.

NOTE: Format $BTYPE has been output.

NOTE: Format AGESEG has been output.

NOTE: Format SATS has been output.


3) LOOKUP TABLES          |

3.1 Bus Operators 

data operators;

  length Operator_ID 8 Operator_Name $30 HQ_City $20 Rating 8;

  infile datalines dsd truncover;

  input Operator_ID Operator_Name :$30. HQ_City :$20. Rating;

  datalines;

1,TSRTC,Hyderabad,4.2

2,APSRTC,Vijayawada,4.0

3,MSRTC,Mumbai,3.9

4,KSRTC_Bengaluru,Bengaluru,4.4

5,UPSRTC,Delhi,3.7

;

run;

proc print;run;

Output:

ObsOperator_IDOperator_NameHQ_CityRating
11TSRTCHyderabad4.2
22APSRTCVijayawada4.0
33MSRTCMumbai3.9
44KSRTC_BengaluruBengaluru4.4
55UPSRTCDelhi3.7

3.2 Routes (origin-destination pairs) 

data routes;

  length Route_ID 8 Origin $20 Destination $20 Distance_km 8 Std_Fare 8;

  infile datalines dsd;

  input Route_ID Origin :$20. Destination :$20. Distance_km Std_Fare;

  datalines;

101,Hyderabad,Warangal,150,280

102,Hyderabad,Vijayawada,275,520

103,Mumbai,Pune,150,300

104,Bengaluru,Mysuru,150,320

105,Delhi,Agra,230,480

106,Chennai,Vellore,140,260

107,Hyderabad,Karimnagar,165,300

108,Pune,Nashik,210,360

109,Bengaluru,Tumakuru,75,150

110,Mumbai,Nashik,170,340

;

run;

proc print;run;

Output:

ObsRoute_IDOriginDestinationDistance_kmStd_Fare
1101HyderabadWarangal150280
2102HyderabadVijayawada275520
3103MumbaiPune150300
4104BengaluruMysuru150320
5105DelhiAgra230480
6106ChennaiVellore140260
7107HyderabadKarimnagar165300
8108PuneNashik210360
9109BengaluruTumakuru75150
10110MumbaiNashik170340

4) PRIMARY FACT TABLE: PASSENGER TRIPS   

data passengers;

  length Trip_ID 8 Passenger_ID 8 Passenger_Name $28 Gender $1 Age 8

         City $20 Route_ID 8 Operator_ID 8 Ticket_Type $3 Fare_INR 8

         Distance_km 8 Boarding_Time 8 Alighting_Time 8

         Payment_Method $4 Seat_Type $2 AC_Flag $1 Bus_Type $3

         Trip_Date 8 Delay_Minutes 8 Satisfaction 8;

  format Gender $gender. Ticket_Type $tkt. Payment_Method $pay.

         Seat_Type $seat. AC_Flag $ac. Bus_Type $btype. Trip_Date date9.

         Boarding_Time datetime19. Alighting_Time datetime19.

         Age ageseg. Satisfaction sats.;

  infile datalines dsd truncover;

  input Trip_ID Passenger_ID Passenger_Name :$28. Gender $ Age City :$20.

        Route_ID Operator_ID Ticket_Type $ Fare_INR Distance_km

        Boarding_Time :datetime20. Alighting_Time :datetime20.

        Payment_Method $ Seat_Type $ AC_Flag $ Bus_Type $ Trip_Date :date9.

        Delay_Minutes Satisfaction;

  datalines;

1,1001,Anil Reddy,M,29,Hyderabad,101,1,REG,300,150,21JUL2025:07:10:00,21JUL2025:10:05:00,UPI,S,A,EXP,21JUL2025,5,4

2,1002,Neha Sharma,F,22,Hyderabad,102,2,STU,450,275,22JUL2025:06:40:00,22JUL2025:12:10:00,CARD,S,A,VOL,22JUL2025,10,5

3,1003,Rahul Patil,M,34,Mumbai,103,3,REG,320,150,19JUL2025:08:05:00,19JUL2025:11:05:00,UPI,S,N,LOC,19JUL2025,0,3

4,1004,Priyanka Rao,F,27,Bengaluru,104,4,REG,340,150,18JUL2025:09:00:00,18JUL2025:12:00:00,WAL,S,A,EXP,18JUL2025,7,4

5,1005,Amit Singh,M,41,Delhi,105,5,REG,500,230,17JUL2025:13:20:00,17JUL2025:17:50:00,CASH,SL,A,VOL,17JUL2025,20,2

6,1006,Sunita Iyer,F,63,Chennai,106,2,SNR,220,140,16JUL2025:06:30:00,16JUL2025:09:15:00,UPI,S,N,LOC,16JUL2025,3,4

7,1007,Farhan Shaikh,M,19,Mumbai,103,3,STU,250,150,21JUL2025:18:30:00,21JUL2025:21:30:00,WAL,S,A,EXP,21JUL2025,2,5

8,1008,Sruthi Nair,F,31,Bengaluru,109,4,WKP,140,75,21JUL2025:07:50:00,21JUL2025:09:10:00,UPI,S,N,LOC,21JUL2025,1,4

9,1009,Rakesh Kumar,M,47,Hyderabad,107,1,REG,310,165,20JUL2025:05:55:00,20JUL2025:09:05:00,CARD,SL,A,VOL,20JUL2025,0,3

10,1010,Kavya Joshi,F,25,Pune,108,3,REG,360,210,22JUL2025:10:30:00,22JUL2025:14:15:00,UPI,S,N,EXP,22JUL2025,4,4

11,1011,Abhinav Jain,M,33,Delhi,105,5,REG,480,230,21JUL2025:08:00:00,21JUL2025:12:30:00,UPI,S,A,EXP,21JUL2025,12,3

12,1012,Meena Kumari,F,58,Chennai,106,2,SNR,240,140,21JUL2025:15:45:00,21JUL2025:18:30:00,CASH,S,N,LOC,21JUL2025,0,5

13,1013,Arun Gupta,M,28,Hyderabad,102,2,REG,520,275,20JUL2025:20:00:00,21JUL2025:01:30:00,UPI,S,A,VOL,20JUL2025,15,4

14,1014,Sahana R,F,24,Bengaluru,104,4,STU,260,150,22JUL2025:16:40:00,22JUL2025:19:40:00,UPI,S,N,LOC,22JUL2025,0,5

15,1015,Vikram Mehta,M,39,Mumbai,110,3,REG,340,170,18JUL2025:06:20:00,18JUL2025:09:10:00,CARD,SL,A,EXP,18JUL2025,6,4

;

run;

proc print;run;

Output:

ObsTrip_IDPassenger_IDPassenger_NameGenderAgeCityRoute_IDOperator_IDTicket_TypeFare_INRDistance_kmBoarding_TimeAlighting_TimePayment_MethodSeat_TypeAC_FlagBus_TypeTrip_DateDelay_MinutesSatisfaction
111001Anil ReddyMaleAdult (25-44)Hyderabad1011Regular30015021JUL2025:07:10:0021JUL2025:10:05:00UPISeaterACExpress21JUL20255High
221002Neha SharmaFemaleYouth (18-24)Hyderabad1022Student45027522JUL2025:06:40:0022JUL2025:12:10:00CardSeaterACVolvo22JUL202510Very High
331003Rahul PatilMaleAdult (25-44)Mumbai1033Regular32015019JUL2025:08:05:0019JUL2025:11:05:00UPISeaterNon-ACLocal19JUL20250Medium
441004Priyanka RaoFemaleAdult (25-44)Bengaluru1044Regular34015018JUL2025:09:00:0018JUL2025:12:00:00WalletSeaterACExpress18JUL20257High
551005Amit SinghMaleAdult (25-44)Delhi1055Regular50023017JUL2025:13:20:0017JUL2025:17:50:00CashSleeperACVolvo17JUL202520Low
661006Sunita IyerFemaleSenior (60+)Chennai1062Senior22014016JUL2025:06:30:0016JUL2025:09:15:00UPISeaterNon-ACLocal16JUL20253High
771007Farhan ShaikhMaleYouth (18-24)Mumbai1033Student25015021JUL2025:18:30:0021JUL2025:21:30:00WalletSeaterACExpress21JUL20252Very High
881008Sruthi NairFemaleAdult (25-44)Bengaluru1094Weekly Pass1407521JUL2025:07:50:0021JUL2025:09:10:00UPISeaterNon-ACLocal21JUL20251High
991009Rakesh KumarMaleMid-age (45-59)Hyderabad1071Regular31016520JUL2025:05:55:0020JUL2025:09:05:00CardSleeperACVolvo20JUL20250Medium
10101010Kavya JoshiFemaleAdult (25-44)Pune1083Regular36021022JUL2025:10:30:0022JUL2025:14:15:00UPISeaterNon-ACExpress22JUL20254High
11111011Abhinav JainMaleAdult (25-44)Delhi1055Regular48023021JUL2025:08:00:0021JUL2025:12:30:00UPISeaterACExpress21JUL202512Medium
12121012Meena KumariFemaleMid-age (45-59)Chennai1062Senior24014021JUL2025:15:45:0021JUL2025:18:30:00CashSeaterNon-ACLocal21JUL20250Very High
13131013Arun GuptaMaleAdult (25-44)Hyderabad1022Regular52027520JUL2025:20:00:0021JUL2025:01:30:00UPISeaterACVolvo20JUL202515High
14141014Sahana RFemaleYouth (18-24)Bengaluru1044Student26015022JUL2025:16:40:0022JUL2025:19:40:00UPISeaterNon-ACLocal22JUL20250Very High
15151015Vikram MehtaMaleAdult (25-44)Mumbai1103Regular34017018JUL2025:06:20:0018JUL2025:09:10:00CardSleeperACExpress18JUL20256High


5) METADATA INSPECTION                   

proc contents data=passengers varnum; 

run;

Output:

The CONTENTS Procedure

Data Set NameWORK.PASSENGERSObservations15
Member TypeDATAVariables20
EngineV9Indexes0
Created08/21/2025 19:16:53Observation Length160
Last Modified08/21/2025 19:16:53Deleted Observations0
Protection CompressedNO
Data Set Type SortedNO
Label   
Data RepresentationSOLARIS_X86_64, LINUX_X86_64, ALPHA_TRU64, LINUX_IA64  
Encodingutf-8 Unicode (UTF-8)  
Engine/Host Dependent Information
Data Set Page Size131072
Number of Data Set Pages1
First Data Page1
Max Obs per Page818
Obs in First Data Page15
Number of Data Set Repairs0
Filename/saswork/SAS_work4EA100014E17_odaws01-apse1-2.oda.sas.com/SAS_work819100014E17_odaws01-apse1-2.oda.sas.com/passengers.sas7bdat
Release Created9.0401M8
Host CreatedLinux
Inode Number201326817
Access Permissionrw-r--r--
Owner Nameu63247146
File Size256KB
File Size (bytes)262144
Variables in Creation Order
#VariableTypeLenFormat
1Trip_IDNum8 
2Passenger_IDNum8 
3Passenger_NameChar28 
4GenderChar1$GENDER.
5AgeNum8AGESEG.
6CityChar20 
7Route_IDNum8 
8Operator_IDNum8 
9Ticket_TypeChar3$TKT.
10Fare_INRNum8 
11Distance_kmNum8 
12Boarding_TimeNum8DATETIME19.
13Alighting_TimeNum8DATETIME19.
14Payment_MethodChar4$PAY.
15Seat_TypeChar2$SEAT.
16AC_FlagChar1$AC.
17Bus_TypeChar3$BTYPE.
18Trip_DateNum8DATE9.
19Delay_MinutesNum8 
20SatisfactionNum8SATS.

6) ENRICH FACTS WITH LOOKUPS (SQL JOIN)  

proc sql outobs=5;

  create table trips as

  select p.*,

         o.Operator_Name, o.HQ_City, o.Rating as Operator_Rating,

         r.Origin, r.Destination, r.Distance_km as Route_Distance_km,

         r.Std_Fare as Route_Std_Fare

  from passengers p

  left join operators o on p.Operator_ID = o.Operator_ID

  left join routes r    on p.Route_ID    = r.Route_ID;

quit;

proc print;run;

Output:

ObsTrip_IDPassenger_IDPassenger_NameGenderAgeCityRoute_IDOperator_IDTicket_TypeFare_INRDistance_kmBoarding_TimeAlighting_TimePayment_MethodSeat_TypeAC_FlagBus_TypeTrip_DateDelay_MinutesSatisfactionOperator_NameHQ_CityOperator_RatingOriginDestinationRoute_Distance_kmRoute_Std_Fare
111001Anil ReddyMaleAdult (25-44)Hyderabad1011Regular30015021JUL2025:07:10:0021JUL2025:10:05:00UPISeaterACExpress21JUL20255HighTSRTCHyderabad4.2HyderabadWarangal150280
2131013Arun GuptaMaleAdult (25-44)Hyderabad1022Regular52027520JUL2025:20:00:0021JUL2025:01:30:00UPISeaterACVolvo20JUL202515HighAPSRTCVijayawada4.0HyderabadVijayawada275520
321002Neha SharmaFemaleYouth (18-24)Hyderabad1022Student45027522JUL2025:06:40:0022JUL2025:12:10:00CardSeaterACVolvo22JUL202510Very HighAPSRTCVijayawada4.0HyderabadVijayawada275520
471007Farhan ShaikhMaleYouth (18-24)Mumbai1033Student25015021JUL2025:18:30:0021JUL2025:21:30:00WalletSeaterACExpress21JUL20252Very HighMSRTCMumbai3.9MumbaiPune150300
531003Rahul PatilMaleAdult (25-44)Mumbai1033Regular32015019JUL2025:08:05:0019JUL2025:11:05:00UPISeaterNon-ACLocal19JUL20250MediumMSRTCMumbai3.9MumbaiPune150300


7) DERIVATIONS & QUALITY CHECKS          

data trips_enriched;

  set trips;

  /* Travel time in minutes */

  Travel_Minutes = intck('minute', Boarding_Time, Alighting_Time);

  /* On-time flag (<=5 min delay) */

  On_Time = (Delay_Minutes <= 5);

  /* Fare per km (observed) and deviation from standard */

  Fare_per_km = round(Fare_INR / Distance_km, .01);

  Std_Fare_per_km = round(Route_Std_Fare / Route_Distance_km, .01);

  Fare_Deviation = Fare_per_km - Std_Fare_per_km;

  /* Day part for operational analysis */

  length Day_Part $10;

  hour_b = hour(Boarding_Time);

  if 5 <= hour_b < 12 then Day_Part='Morning';

  else if 12 <= hour_b < 17 then Day_Part='Afternoon';

  else if 17 <= hour_b <= 23 then Day_Part='Evening';

  else Day_Part='Night';

  drop hour_b;

  label Travel_Minutes = "Trip Duration (mins)"

        On_Time        = "On-time (<=5 min delay)"

        Fare_per_km    = "Fare per km (INR)"

        Fare_Deviation = "Fare-per-km minus standard";

run;

proc print;run;

Output:

ObsTrip_IDPassenger_IDPassenger_NameGenderAgeCityRoute_IDOperator_IDTicket_TypeFare_INRDistance_kmBoarding_TimeAlighting_TimePayment_MethodSeat_TypeAC_FlagBus_TypeTrip_DateDelay_MinutesSatisfactionOperator_NameHQ_CityOperator_RatingOriginDestinationRoute_Distance_kmRoute_Std_FareTravel_MinutesOn_TimeFare_per_kmStd_Fare_per_kmFare_DeviationDay_Part
111001Anil ReddyMaleAdult (25-44)Hyderabad1011Regular30015021JUL2025:07:10:0021JUL2025:10:05:00UPISeaterACExpress21JUL20255HighTSRTCHyderabad4.2HyderabadWarangal15028017512.001.870.13Morning
2131013Arun GuptaMaleAdult (25-44)Hyderabad1022Regular52027520JUL2025:20:00:0021JUL2025:01:30:00UPISeaterACVolvo20JUL202515HighAPSRTCVijayawada4.0HyderabadVijayawada27552033001.891.890.00Evening
321002Neha SharmaFemaleYouth (18-24)Hyderabad1022Student45027522JUL2025:06:40:0022JUL2025:12:10:00CardSeaterACVolvo22JUL202510Very HighAPSRTCVijayawada4.0HyderabadVijayawada27552033001.641.89-0.25Morning
471007Farhan ShaikhMaleYouth (18-24)Mumbai1033Student25015021JUL2025:18:30:0021JUL2025:21:30:00WalletSeaterACExpress21JUL20252Very HighMSRTCMumbai3.9MumbaiPune15030018011.672.00-0.33Evening
531003Rahul PatilMaleAdult (25-44)Mumbai1033Regular32015019JUL2025:08:05:0019JUL2025:11:05:00UPISeaterNon-ACLocal19JUL20250MediumMSRTCMumbai3.9MumbaiPune15030018012.132.000.13Morning


/* Basic sort to prepare for reports */

proc sort data=trips_enriched out=trips_sorted;

  by City Trip_Date descending Fare_INR;

run;

proc print data=trips_sorted (obs=5);

run;

ObsTrip_IDPassenger_IDPassenger_NameGenderAgeCityRoute_IDOperator_IDTicket_TypeFare_INRDistance_kmBoarding_TimeAlighting_TimePayment_MethodSeat_TypeAC_FlagBus_TypeTrip_DateDelay_MinutesSatisfactionOperator_NameHQ_CityOperator_RatingOriginDestinationRoute_Distance_kmRoute_Std_FareTravel_MinutesOn_TimeFare_per_kmStd_Fare_per_kmFare_DeviationDay_Part
1131013Arun GuptaMaleAdult (25-44)Hyderabad1022Regular52027520JUL2025:20:00:0021JUL2025:01:30:00UPISeaterACVolvo20JUL202515HighAPSRTCVijayawada4.0HyderabadVijayawada27552033001.891.890.00Evening
211001Anil ReddyMaleAdult (25-44)Hyderabad1011Regular30015021JUL2025:07:10:0021JUL2025:10:05:00UPISeaterACExpress21JUL20255HighTSRTCHyderabad4.2HyderabadWarangal15028017512.001.870.13Morning
321002Neha SharmaFemaleYouth (18-24)Hyderabad1022Student45027522JUL2025:06:40:0022JUL2025:12:10:00CardSeaterACVolvo22JUL202510Very HighAPSRTCVijayawada4.0HyderabadVijayawada27552033001.641.89-0.25Morning
431003Rahul PatilMaleAdult (25-44)Mumbai1033Regular32015019JUL2025:08:05:0019JUL2025:11:05:00UPISeaterNon-ACLocal19JUL20250MediumMSRTCMumbai3.9MumbaiPune15030018012.132.000.13Morning
571007Farhan ShaikhMaleYouth (18-24)Mumbai1033Student25015021JUL2025:18:30:0021JUL2025:21:30:00WalletSeaterACExpress21JUL20252Very HighMSRTCMumbai3.9MumbaiPune15030018011.672.00-0.33Evening


8) QUICK VIEWS                           

proc print data=trips_sorted(obs=10) label noobs;

  title "Sample of Enriched Trips — First 10 Rows";

run;

title;

Output:

Sample of Enriched Trips — First 15 Rows

Trip_IDPassenger_IDPassenger_NameGenderAgeCityRoute_IDOperator_IDTicket_TypeFare_INRDistance_kmBoarding_TimeAlighting_TimePayment_MethodSeat_TypeAC_FlagBus_TypeTrip_DateDelay_MinutesSatisfactionOperator_NameHQ_CityOperator_RatingOriginDestinationRoute_Distance_kmRoute_Std_FareTrip Duration (mins)On-time (<=5 min delay)Fare per km (INR)Std_Fare_per_kmFare-per-km minus standardDay_Part
131013Arun GuptaMaleAdult (25-44)Hyderabad1022Regular52027520JUL2025:20:00:0021JUL2025:01:30:00UPISeaterACVolvo20JUL202515HighAPSRTCVijayawada4.0HyderabadVijayawada27552033001.891.890.00Evening
11001Anil ReddyMaleAdult (25-44)Hyderabad1011Regular30015021JUL2025:07:10:0021JUL2025:10:05:00UPISeaterACExpress21JUL20255HighTSRTCHyderabad4.2HyderabadWarangal15028017512.001.870.13Morning
21002Neha SharmaFemaleYouth (18-24)Hyderabad1022Student45027522JUL2025:06:40:0022JUL2025:12:10:00CardSeaterACVolvo22JUL202510Very HighAPSRTCVijayawada4.0HyderabadVijayawada27552033001.641.89-0.25Morning
31003Rahul PatilMaleAdult (25-44)Mumbai1033Regular32015019JUL2025:08:05:0019JUL2025:11:05:00UPISeaterNon-ACLocal19JUL20250MediumMSRTCMumbai3.9MumbaiPune15030018012.132.000.13Morning
71007Farhan ShaikhMaleYouth (18-24)Mumbai1033Student25015021JUL2025:18:30:0021JUL2025:21:30:00WalletSeaterACExpress21JUL20252Very HighMSRTCMumbai3.9MumbaiPune15030018011.672.00-0.33Evening

9) FREQUENCY & DISTRIBUTION ANALYSIS     

9.1 Ticket type by city 

proc freq data=trips_enriched;

  tables City*Ticket_Type / nocol nopercent norow;

  title "Ticket Type Mix by City";

run;

Output:

Ticket Type Mix by City

The FREQ Procedure

Frequency
Table of City by Ticket_Type
CityTicket_Type
RegularStudentTotal
Hyderabad
2
1
3
Mumbai
1
1
2
Total
3
2
5

9.2 Payment method and AC preference 

proc freq data=trips_enriched;

  tables Payment_Method*AC_Flag / chisq;

  title "Payment Method vs AC/Non-AC";

run;

Output:

Payment Method vs AC/Non-AC

The FREQ Procedure

Frequency
Percent
Row Pct
Col Pct
Table of Payment_Method by AC_Flag
Payment_MethodAC_Flag
ACNon-ACTotal
Card
1
20.00
100.00
25.00
0
0.00
0.00
0.00
1
20.00
 
 
UPI
2
40.00
66.67
50.00
1
20.00
33.33
100.00
3
60.00
 
 
Wallet
1
20.00
100.00
25.00
0
0.00
0.00
0.00
1
20.00
 
 
Total
4
80.00
1
20.00
5
100.00

Statistics for Table of Payment_Method by AC_Flag

StatisticDFValueProb
WARNING: 100% of the cells have expected counts less
than 5. Chi-Square may not be a valid test.
Chi-Square20.83330.6592
Likelihood Ratio Chi-Square21.18490.5530
Mantel-Haenszel Chi-Square10.00001.0000
Phi Coefficient 0.4082 
Contingency Coefficient 0.3780 
Cramer's V 0.4082 

Sample Size = 5

 9.3 Age distribution and satisfaction 

proc univariate data=trips_enriched noprint;

  var Fare_INR Travel_Minutes Delay_Minutes;

  histogram Fare_INR Travel_Minutes / normal;

  inset n mean std min max / position=ne;

run;

Output:

The UNIVARIATE Procedure

Histogram for Fare_INR


The UNIVARIATE Procedure

Fitted Normal Distribution for Fare_INR

Parameters for Normal Distribution
ParameterSymbolEstimate
MeanMu368
Std DevSigma112.5611
Goodness-of-Fit Tests for Normal Distribution
TestStatisticp Value
Kolmogorov-SmirnovD0.26510458Pr > D>0.150
Cramer-von MisesW-Sq0.05149527Pr > W-Sq>0.250
Anderson-DarlingA-Sq0.29400112Pr > A-Sq>0.250
Quantiles for Normal Distribution
PercentQuantile
ObservedEstimated
1.0250.000106.144
5.0250.000182.853
10.0250.000223.747
25.0300.000292.079
50.0320.000368.000
75.0450.000443.921
90.0520.000512.253
95.0520.000553.147
99.0520.000629.856

The UNIVARIATE Procedure

Histogram for Travel_Minutes


The UNIVARIATE Procedure

Fitted Normal Distribution for Travel_Minutes (Trip Duration (mins))

Parameters for Normal Distribution
ParameterSymbolEstimate
MeanMu239
Std DevSigma83.09633
Goodness-of-Fit Tests for Normal Distribution
TestStatisticp Value
Kolmogorov-SmirnovD0.36115392Pr > D0.031
Cramer-von MisesW-Sq0.13115599Pr > W-Sq0.029
Anderson-DarlingA-Sq0.75927167Pr > A-Sq0.019
Quantiles for Normal Distribution
PercentQuantile
ObservedEstimated
1.0175.00045.6890
5.0175.000102.3187
10.0175.000132.5078
25.0180.000182.9524
50.0180.000239.0000
75.0330.000295.0476
90.0330.000345.4922
95.0330.000375.6813
99.0330.000432.3110

10) NUMERIC SUMMARIES                    

proc means data=trips_enriched n mean std min p25 median p75 max maxdec=1;

  class City Bus_Type;

  var Fare_INR Distance_km Travel_Minutes Delay_Minutes Satisfaction;

  title "Key Measures by City and Bus Type";

run;

Output:

Key Measures by City and Bus Type

The MEANS Procedure

CityBus_TypeN ObsVariableLabelNMeanStd DevMinimum25th PctlMedian75th PctlMaximum
HyderabadExpress1
Fare_INR
Distance_km
Travel_Minutes
Delay_Minutes
Satisfaction
 
 
Trip Duration (mins)
 
 
1
1
1
1
1
300.0
150.0
175.0
5.0
4.0
 
 
 
 
 
300.0
150.0
175.0
5.0
4.0
300.0
150.0
175.0
5.0
4.0
300.0
150.0
175.0
5.0
4.0
300.0
150.0
175.0
5.0
4.0
300.0
150.0
175.0
5.0
4.0
 Volvo2
Fare_INR
Distance_km
Travel_Minutes
Delay_Minutes
Satisfaction
 
 
Trip Duration (mins)
 
 
2
2
2
2
2
485.0
275.0
330.0
12.5
4.5
49.5
0.0
0.0
3.5
0.7
450.0
275.0
330.0
10.0
4.0
450.0
275.0
330.0
10.0
4.0
485.0
275.0
330.0
12.5
4.5
520.0
275.0
330.0
15.0
5.0
520.0
275.0
330.0
15.0
5.0
MumbaiExpress1
Fare_INR
Distance_km
Travel_Minutes
Delay_Minutes
Satisfaction
 
 
Trip Duration (mins)
 
 
1
1
1
1
1
250.0
150.0
180.0
2.0
5.0
 
 
 
 
 
250.0
150.0
180.0
2.0
5.0
250.0
150.0
180.0
2.0
5.0
250.0
150.0
180.0
2.0
5.0
250.0
150.0
180.0
2.0
5.0
250.0
150.0
180.0
2.0
5.0
 Local1
Fare_INR
Distance_km
Travel_Minutes
Delay_Minutes
Satisfaction
 
 
Trip Duration (mins)
 
 
1
1
1
1
1
320.0
150.0
180.0
0.0
3.0
 
 
 
 
 
320.0
150.0
180.0
0.0
3.0
320.0
150.0
180.0
0.0
3.0
320.0
150.0
180.0
0.0
3.0
320.0
150.0
180.0
0.0
3.0
320.0
150.0
180.0
0.0
3.0

proc summary data=trips_enriched nway;

  class Operator_Name City;

  var Fare_INR Delay_Minutes;

  output out=op_city_summary(drop=_type_ _freq_) 

         n=Trips sum(Fare_INR)=Revenue_INR mean(Delay_Minutes)=Avg_Delay_Min;

run;

proc print;run;

Output:

ObsOperator_NameCityTripsRevenue_INRAvg_Delay_Min
1APSRTCHyderabad297012.5
2MSRTCMumbai25701.0
3TSRTCHyderabad13005.0


10.1 Rank operators by revenue within city 

proc rank data=op_city_summary out=op_city_rank ties=low descending;

  by City;

  var Revenue_INR;

  ranks Rev_Rank;

run;

Output:

ObsOperator_NameCityTripsRevenue_INRAvg_Delay_MinRev_Rank
1APSRTCHyderabad297012.51
2MSRTCMumbai25701.01
3TSRTCHyderabad13005.01


11) TABULATE & REPORT 

proc tabulate data=trips_enriched format=8.1;

  class City Ticket_Type Bus_Type;

  var Fare_INR Travel_Minutes;

  table City,

        (Ticket_Type all)*(Fare_INR*(mean) Travel_Minutes*(mean))

        Bus_Type*(Fare_INR*(mean)) / misstext='0';

  title "Average Fare and Duration by Ticket Type and Bus Type (City-wise)";

run;

Output:

Average Fare and Duration by Ticket Type and Bus Type (City-wise)

 Ticket_TypeAllBus_Type
RegularStudentExpressLocalVolvo
Fare_INRTrip Duration (mins)Fare_INRTrip Duration (mins)Fare_INRTrip Duration (mins)Fare_INRFare_INRFare_INR
MeanMeanMeanMeanMeanMeanMeanMeanMean
City410.0252.5450.0330.0423.3278.3300.00485.0
Hyderabad
Mumbai320.0180.0250.0180.0285.0180.0250.0320.00

proc report data=op_city_rank nowd headline headskip;

  columns City Operator_Name Trips Revenue_INR Rev_Rank Avg_Delay_Min;

  define City / group "City";

  define Operator_Name / display "Operator";

  define Trips / analysis sum "Trips";

  define Revenue_INR / analysis sum "Revenue (INR)" format=comma10.;

  define Rev_Rank / display "Revenue Rank (Within City)";

  define Avg_Delay_Min / analysis mean format=6.1 "Avg Delay (min)";

  title "Operator Performance by City — Ranked by Revenue";

run;

Output:

Operator Performance by City — Ranked by Revenue

CityOperatorTripsRevenue (INR)Revenue Rank (Within City)Avg Delay (min)
HyderabadAPSRTC2970112.5
 TSRTC130015.0
MumbaiMSRTC257011.0

12) TRANSPOSE EXAMPLE (WIDE PIVOT) 

12.1 Payment method revenue by city — then pivot to wide 

proc sql ;

  create table pay_city as

  select City, Payment_Method, sum(Fare_INR) as Revenue_INR

  from trips_enriched

  group by City, Payment_Method;

quit;

proc print;run;

Output:

ObsCityPayment_MethodRevenue_INR
1HyderabadCard450
2HyderabadUPI820
3MumbaiUPI320
4MumbaiWallet250


proc sort data=pay_city; 

    by City Payment_Method;

 run;


proc transpose data=pay_city out=pay_city_wide prefix=Rev_;

  by City;

  id Payment_Method;

  var Revenue_INR;

run;

proc print data=pay_city_wide;

run;

Output:

ObsCity_NAME_Rev_CardRev_UPIRev_Wallet
1HyderabadRevenue_INR450820 
2MumbaiRevenue_INR 320250

13) PROC SQL: KPIs, MONTH BUCKET, VIEWS  

proc sql;

/*   Month bucket (using Trip_Date) */

  create table kpi_month as

  select put(intnx('month', Trip_Date, 0, 'b'), monyy7.) as Mon label='Month',

         City,

         count(*) as Trips,

         mean(Fare_INR) as Avg_Fare,

         sum(Fare_INR)  as Revenue_INR,

         mean(Delay_Minutes) as Avg_Delay

  from trips_enriched

  group by calculated Mon, City;


/*   Route profitability metric (fare vs route standard) */

  create table route_perf as

  select Route_ID, Origin, Destination,

         count(*) as Trips,

         mean(Fare_per_km) as Avg_Fare_per_km,

         mean(Std_Fare_per_km) as Std_per_km,

         mean(Fare_Deviation)  as Avg_Deviation

  from trips_enriched

  group by Route_ID, Origin, Destination

  order by Avg_Deviation desc;


proc print data=kpi_month;

run;

Output:

ObsMonCityTripsAvg_FareRevenue_INRAvg_Delay
1JUL2025Hyderabad3423.333127010
2JUL2025Mumbai2285.0005701

proc print data=route_perf ;

run;

Output:

ObsRoute_IDOriginDestinationTripsAvg_Fare_per_kmStd_per_kmAvg_Deviation
1101HyderabadWarangal12.0001.870.130
2103MumbaiPune21.9002.00-0.100
3102HyderabadVijayawada21.7651.89-0.125

/*   A view for frequent student riders (ticket=STU or WKP) */

  create view v_student_frequent as

  select Passenger_ID, Passenger_Name, City, Ticket_Type, count(*) as Trips

  from trips_enriched

  where Ticket_Type in ('STU', 'WKP')

  group by Passenger_ID, Passenger_Name, City, Ticket_Type

  having calculated Trips >= 2;

quit;

proc print data=v_student_frequent ;

run;

Log:

NOTE: No observations in data set WORK.V_STUDENT_FREQUENT.
NOTE: There were 2 observations read from the data set WORK.TRIPS_ENRICHED.
WHERE Ticket_Type in ('STU', 'WKP');
NOTE: There were 0 observations read from the data set WORK.V_STUDENT_FREQUENT.
NOTE: PROCEDURE PRINT used (Total process time):
real time 0.00 seconds
user cpu time 0.01 seconds
system cpu time 0.00 seconds
memory 5528.34k
OS Memory 29608.00k
Timestamp 08/21/2025 02:04:04 PM
Step Count 227 Switch Count 0
Page Faults 0
Page Reclaims 205
Page Swaps 0
Voluntary Context Switches 10
Involuntary Context Switches 0
Block Input Operations 0
Block Output Operations 0

14) VISUAL: SGPLOT                       

ods graphics on;

proc sgplot data=trips_enriched;

  vbox Fare_per_km / category=City;

  title "Fare per km — Distribution by City";

run;


proc sgplot data=trips_enriched;

  vbar City / response=Satisfaction stat=mean datalabel;

  title "Average Satisfaction by City";

run;

ods graphics off;

Output:

The SGPlot Procedure

The SGPlot Procedure

15) MACROS FOR AUTOMATION                

15.1 Macro to print a city-specific report with parameterized filters 

%macro city_report(city=, bus=, mindate=, maxdate=);

  %put NOTE: Running city_report for &city. bus=&bus. date=&mindate - &maxdate;

  title "City Report: &city (Bus=&bus, Dates=&mindate to &maxdate)";

  proc sql;

    create table _city as

    select *

    from trips_enriched

    where City="&city"

      %if %length(&bus) %then and Bus_Type="&bus";

      %if %length(&mindate) %then and Trip_Date >= "&mindate"d;

      %if %length(&maxdate) %then and Trip_Date <= "&maxdate"d;

    ;

  quit;


  proc means data=_city n mean std min p25 median p75 max;

    class Ticket_Type;

    var Fare_INR Travel_Minutes Delay_Minutes Satisfaction;

    title2 "Descriptive Stats by Ticket Type";

  run;


  proc freq data=_city;

    tables Payment_Method*AC_Flag / chisq;

    title2 "Payment vs AC Flag — &city";

  run;


  proc report data=_city nowd;

    columns Route_ID Origin Destination Bus_Type Ticket_Type Fare_INR Travel_Minutes Delay_Minutes Satisfaction;

    define Route_ID / group;

    define Origin / display;

    define Destination / display;

    define Bus_Type / display;

    define Ticket_Type / display;

    define Fare_INR / analysis mean format=8.1 "Avg Fare";

    define Travel_Minutes / analysis mean format=6.1 "Avg Minutes";

    define Delay_Minutes / analysis mean format=5.1 "Avg Delay";

    define Satisfaction / analysis mean format=4.1 "Avg Sat";

  run;

  title;

%mend;


15.2 Macro to generate operator leaderboard by KPI 

%macro operator_leaderboard(metric=Revenue_INR);

  proc sql;

    create table _op as

    select Operator_Name,

           sum(Fare_INR) as Revenue_INR,

           count(*) as Trips,

           mean(Satisfaction) as Avg_Sat,

           mean(Delay_Minutes) as Avg_Delay

    from trips_enriched

    group by Operator_Name;

  quit;


  %if %upcase(&metric)=REVENUE_INR %then %do;

    proc sort data=_op; by descending Revenue_INR; run;

  %end;

  %else %if %upcase(&metric)=TRIPS %then %do;

    proc sort data=_op; by descending Trips; run;

  %end;

  %else %if %upcase(&metric)=AVG_SAT %then %do;

    proc sort data=_op; by descending Avg_Sat; run;

  %end;

  %else %if %upcase(&metric)=AVG_DELAY %then %do;

    proc sort data=_op; by Avg_Delay; run;

  %end;


  proc print data=_op label noobs;

    label Revenue_INR='Revenue (INR)' Trips='Trips' Avg_Sat='Avg Satisfaction' Avg_Delay='Avg Delay (min)';

    title "Operator Leaderboard — Sorted by &metric";

  run; title;

%mend;


15.3 Macro loop to print common city snapshots 

%macro common_cities;

  %city_report(city=Hyderabad, bus=EXP, mindate=16JUL2025, maxdate=24JUL2025);

  %city_report(city=Mumbai,    bus=EXP, mindate=16JUL2025, maxdate=24JUL2025);

  %city_report(city=Bengaluru, bus=,    mindate=16JUL2025, maxdate=24JUL2025);

%mend;


%common_cities;

Output:

City Report: Hyderabad (Bus=EXP, Dates=16JUL2025 to 24JUL2025)

Descriptive Stats by Ticket Type

The MEANS Procedure

Ticket_TypeN ObsVariableLabelNMeanStd DevMinimum25th PctlMedian75th PctlMaximum
Regular1
Fare_INR
Travel_Minutes
Delay_Minutes
Satisfaction
 
Trip Duration (mins)
 
 
1
1
1
1
300.0000000
175.0000000
5.0000000
4.0000000
 
 
 
 
300.0000000
175.0000000
5.0000000
4.0000000
300.0000000
175.0000000
5.0000000
4.0000000
300.0000000
175.0000000
5.0000000
4.0000000
300.0000000
175.0000000
5.0000000
4.0000000
300.0000000
175.0000000
5.0000000
4.0000000

City Report: Hyderabad (Bus=EXP, Dates=16JUL2025 to 24JUL2025)

Payment vs AC Flag — Hyderabad

The FREQ Procedure

Frequency
Percent
Row Pct
Col Pct
Table of Payment_Method by AC_Flag
Payment_MethodAC_Flag
ACTotal
UPI
1
100.00
100.00
100.00
1
100.00
 
 
Total
1
100.00
1
100.00

City Report: Hyderabad (Bus=EXP, Dates=16JUL2025 to 24JUL2025)

Payment vs AC Flag — Hyderabad

Route_IDOriginDestinationBus_TypeTicket_TypeAvg FareAvg MinutesAvg DelayAvg Sat
101HyderabadWarangalExpressRegular300.0175.05.04.0

City Report: Mumbai (Bus=EXP, Dates=16JUL2025 to 24JUL2025)

Descriptive Stats by Ticket Type

The MEANS Procedure

Ticket_TypeN ObsVariableLabelNMeanStd DevMinimum25th PctlMedian75th PctlMaximum
Student1
Fare_INR
Travel_Minutes
Delay_Minutes
Satisfaction
 
Trip Duration (mins)
 
 
1
1
1
1
250.0000000
180.0000000
2.0000000
5.0000000
 
 
 
 
250.0000000
180.0000000
2.0000000
5.0000000
250.0000000
180.0000000
2.0000000
5.0000000
250.0000000
180.0000000
2.0000000
5.0000000
250.0000000
180.0000000
2.0000000
5.0000000
250.0000000
180.0000000
2.0000000
5.0000000

City Report: Mumbai (Bus=EXP, Dates=16JUL2025 to 24JUL2025)

Payment vs AC Flag — Mumbai

The FREQ Procedure

Frequency
Percent
Row Pct
Col Pct
Table of Payment_Method by AC_Flag
Payment_MethodAC_Flag
ACTotal
Wallet
1
100.00
100.00
100.00
1
100.00
 
 
Total
1
100.00
1
100.00

City Report: Mumbai (Bus=EXP, Dates=16JUL2025 to 24JUL2025)

Payment vs AC Flag — Mumbai

Route_IDOriginDestinationBus_TypeTicket_TypeAvg FareAvg MinutesAvg DelayAvg Sat
103MumbaiPuneExpressStudent250.0180.02.05.0

%operator_leaderboard(metric=Revenue_INR);

Output:

Operator Leaderboard — Sorted by Revenue_INR

Operator_NameRevenue (INR)TripsAvg SatisfactionAvg Delay (min)
APSRTC97024.512.5
MSRTC57024.01.0
TSRTC30014.05.0

%operator_leaderboard(metric=AVG_Delay);

Output:

Operator Leaderboard — Sorted by AVG_Delay

Operator_NameRevenue (INR)TripsAvg SatisfactionAvg Delay (min)
MSRTC57024.01.0
TSRTC30014.05.0
APSRTC97024.512.5


To Visit My Previous Online Streaming Flatform Dataset:Click Here
To Visit My Previous Statistical Evaluation Of Clinical Trials:Click Here
To Visit My Previous Unlocking Retail Insights Dataset:Click Here
To Visit My Previous Sas Interview Questions-1:Click Here





--->PLEASE FOLLOW OUR BLOG FOR MORE INFORMATION.
--->PLEASE DO COMMENTS AND SHARE OUR BLOG.

PLEASE FOLLOW OUR TELEGRAM CHANNEL CLICK HERE

PLEASE FOLLOW OUR FACEBOOK PAGE  CLICK HERE

PLEASE FOLLOW OUR INSTAGRAM PAGE CLICK HERE

No comments:

Post a Comment