259.REAL-WORLD VEHICLES DATASET CREATION AND COMPARISON IN INDIA WITH DATA STEP | PROC FORMAT | PROC PRINT | PROC MEANS | PROC FREQ | PROC SQL | PROC SORT | PROC TRANSPOSE | PROC REPORT | PROC SGPLOT

REAL-WORLD VEHICLES DATASET CREATION AND COMPARISON IN INDIA WITH DATA STEP | PROC FORMAT | PROC PRINT | PROC MEANS | PROC FREQ | PROC SQL | PROC SORT | PROC TRANSPOSE | PROC REPORT | PROC SGPLOT

/*CREATING A DATASET OF REAL-WORLD VEHICLES COMPARISON IN INDIA*/

options nodate nonumber nocenter; 

1) Create dataset: vehicles

data vehicles;

length VehicleID 8 Vehicle_Type $25 Manufacturer $25 Region $15 Fuel_Type $12 Owner_Type $12 Emission_Std $8;

format Price_INR comma12. Mileage_kmpl 8.2 Running_km comma12. Year_Mfg 4.;

input VehicleID Vehicle_Type $ Manufacturer $ Year_Mfg Price_INR Mileage_kmpl Running_km Fuel_Type $ Seating Owner_Type $ Region $ Emission_Std $ Battery_kWh;

datalines;

1 Car Maruti 2018 550000 18.5 42000 Petrol 5 Individual North BSIV .

2 Motorcycle Bajaj 2020 75000 65.0 12000 Petrol 2 Individual West BSIV .

3 Scooter TVS 2019 62000 45.0 8000 Petrol 2 Individual South BSIV .

4 Truck Tata 2015 1500000 6.5 120000 Diesel 2 Commercial North BSIII .

5 Bus AshokLeyland 2014 3500000 3.2 250000 Diesel 40 Commercial South BSIII .

6 Auto_Rickshaw Bajaj 2017 65000 35.0 60000 CNG 3 Commercial East BSIV .

7 Electric_Car Tata 2021 1400000 0 15000 Electric 5 Individual West BSVI 22

8 E_Bike Ather 2022 170000 0 5000 Electric 2 Individual South BSVI 3.7

9 Van Maruti_Env 2016 850000 14.0 60000 Diesel 8 Commercial West BSIV .

10 Tractor Mahindra 2013 800000 0 90000 Diesel 2 Agricultural East BSIV .

11 Bicycle Hero 2020 4000 0 300 Individual 1 Individual North . .

12 Luxury_Car BMW 2019 4200000 12.0 25000 Petrol 5 Individual South BSIV .

13 Delivery_Van Tata 2018 720000 11.0 80000 Diesel 3 Commercial West BSIV .

14 Scooter_Hybrid Hero 2020 85000 55.0 9000 Petrol 2 Individual East BSIV .

15 Electric_Scooter Ola 2022 90000 0 3000 Electric 2 Individual North BSVI 2.5

16 Pickup Mahindra 2017 650000 12.5 40000 Diesel 3 Commercial South BSIV .

17 CNG_Car Maruti_CNG 2016 780000 22.0 50000 CNG 5 Individual West BSIV .

18 School_Bus Tata 2015 1800000 3.8 180000 Diesel 45 Commercial North BSIII .

19 Cargo_Truck Volvo 2018 6000000 5.8 100000 Diesel 2 Commercial East BSIV .

20 E_Rickshaw Kinetic 2021 42000 0 15000 Electric 3 Commercial South BSVI 1.5

21 Electric_Bicycle Yulu 2022 12000 0 2000 Electric 1 Individual West BSVI 0.5

22 SUV Hyundai 2020 1800000 14.0 22000 Petrol 7 Individual North BSIV .

;

run;

proc print;run;

Output:

ObsVehicleIDVehicle_TypeManufacturerRegionFuel_TypeOwner_TypeEmission_StdPrice_INRMileage_kmplRunning_kmYear_MfgSeatingBattery_kWh
11CarMarutiNorthPetrolIndividualBSIV550,00018.5042,00020185.
22MotorcycleBajajWestPetrolIndividualBSIV75,00065.0012,00020202.
33ScooterTVSSouthPetrolIndividualBSIV62,00045.008,00020192.
44TruckTataNorthDieselCommercialBSIII1,500,0006.50120,00020152.
55BusAshokLeylandSouthDieselCommercialBSIII3,500,0003.20250,000201440.
66Auto_RickshawBajajEastCNGCommercialBSIV65,00035.0060,00020173.
77Electric_CarTataWestElectricIndividualBSVI1,400,0000.0015,0002021522.0
88E_BikeAtherSouthElectricIndividualBSVI170,0000.005,000202223.7
99VanMaruti_EnvWestDieselCommercialBSIV850,00014.0060,00020168.
1010TractorMahindraEastDieselAgriculturalBSIV800,0000.0090,00020132.
1111BicycleHeroNorthIndividualIndividual 4,0000.0030020201.
1212Luxury_CarBMWSouthPetrolIndividualBSIV4,200,00012.0025,00020195.
1313Delivery_VanTataWestDieselCommercialBSIV720,00011.0080,00020183.
1414Scooter_HybridHeroEastPetrolIndividualBSIV85,00055.009,00020202.
1515Electric_ScooterOlaNorthElectricIndividualBSVI90,0000.003,000202222.5
1616PickupMahindraSouthDieselCommercialBSIV650,00012.5040,00020173.
1717CNG_CarMaruti_CNGWestCNGIndividualBSIV780,00022.0050,00020165.
1818School_BusTataNorthDieselCommercialBSIII1,800,0003.80180,000201545.
1919Cargo_TruckVolvoEastDieselCommercialBSIV6,000,0005.80100,00020182.
2020E_RickshawKineticSouthElectricCommercialBSVI42,0000.0015,000202131.5
2121Electric_BicycleYuluWestElectricIndividualBSVI12,0000.002,000202210.5
2222SUVHyundaiNorthPetrolIndividualBSIV1,800,00014.0022,00020207.


2) Format variables for reporting

proc format;

value $vtfmt

'Car'='Car'

'Motorcycle'='Motorcycle'

'Scooter'='Scooter'

'Truck'='Truck'

'Bus'='Bus'

'Auto_Rickshaw'='Auto Rickshaw'

'Electric_Car'='Electric Car'

'E_Bike'='E-Bike'

'Van'='Van'

'Tractor'='Tractor'

'Bicycle'='Bicycle'

'Luxury_Car'='Luxury Car'

'Delivery_Van'='Delivery Van'

'Scooter_Hybrid'='Scooter Hybrid'

'Electric_Scooter'='Electric Scooter'

'Pickup'='Pickup'

'CNG_Car'='CNG Car'

'School_Bus'='School Bus'

'Cargo_Truck'='Cargo Truck'

'E_Rickshaw'='E-Rickshaw'

'Electric_Bicycle'='Electric Bicycle'

'SUV'='SUV';

run;


3) Quick look

proc print data=vehicles(obs=10) label noobs;

title "SAMPLE: VEHICLES (FIRST 10)";

run;

title;

Output:

SAMPLE: VEHICLES (FIRST 10)

VehicleIDVehicle_TypeManufacturerRegionFuel_TypeOwner_TypeEmission_StdPrice_INRMileage_kmplRunning_kmYear_MfgSeatingBattery_kWh
1CarMarutiNorthPetrolIndividualBSIV550,00018.5042,00020185.
2MotorcycleBajajWestPetrolIndividualBSIV75,00065.0012,00020202.
3ScooterTVSSouthPetrolIndividualBSIV62,00045.008,00020192.
4TruckTataNorthDieselCommercialBSIII1,500,0006.50120,00020152.
5BusAshokLeylandSouthDieselCommercialBSIII3,500,0003.20250,000201440.
6Auto_RickshawBajajEastCNGCommercialBSIV65,00035.0060,00020173.
7Electric_CarTataWestElectricIndividualBSVI1,400,0000.0015,0002021522.0
8E_BikeAtherSouthElectricIndividualBSVI170,0000.005,000202223.7
9VanMaruti_EnvWestDieselCommercialBSIV850,00014.0060,00020168.
10TractorMahindraEastDieselAgriculturalBSIV800,0000.0090,00020132.

4) Summary statistics (continuous vars)

proc means data=vehicles n mean median min max std q1 q3;

var Price_INR Mileage_kmpl Running_km Battery_kWh;

class Vehicle_Type;

title "SUMMARY STATISTICS BY VEHICLE TYPE";

run;

title;

Output:

SUMMARY STATISTICS BY VEHICLE TYPE

The MEANS Procedure

Vehicle_TypeN ObsVariableNMeanMedianMinimumMaximumStd DevLower QuartileUpper Quartile
Auto_Rickshaw1
Price_INR
Mileage_kmpl
Running_km
Battery_kWh
1
1
1
0
65000.00
35.0000000
60000.00
.
65000.00
35.0000000
60000.00
.
65000.00
35.0000000
60000.00
.
65000.00
35.0000000
60000.00
.
.
.
.
.
65000.00
35.0000000
60000.00
.
65000.00
35.0000000
60000.00
.
Bicycle1
Price_INR
Mileage_kmpl
Running_km
Battery_kWh
1
1
1
0
4000.00
0
300.0000000
.
4000.00
0
300.0000000
.
4000.00
0
300.0000000
.
4000.00
0
300.0000000
.
.
.
.
.
4000.00
0
300.0000000
.
4000.00
0
300.0000000
.
Bus1
Price_INR
Mileage_kmpl
Running_km
Battery_kWh
1
1
1
0
3500000.00
3.2000000
250000.00
.
3500000.00
3.2000000
250000.00
.
3500000.00
3.2000000
250000.00
.
3500000.00
3.2000000
250000.00
.
.
.
.
.
3500000.00
3.2000000
250000.00
.
3500000.00
3.2000000
250000.00
.
CNG_Car1
Price_INR
Mileage_kmpl
Running_km
Battery_kWh
1
1
1
0
780000.00
22.0000000
50000.00
.
780000.00
22.0000000
50000.00
.
780000.00
22.0000000
50000.00
.
780000.00
22.0000000
50000.00
.
.
.
.
.
780000.00
22.0000000
50000.00
.
780000.00
22.0000000
50000.00
.
Car1
Price_INR
Mileage_kmpl
Running_km
Battery_kWh
1
1
1
0
550000.00
18.5000000
42000.00
.
550000.00
18.5000000
42000.00
.
550000.00
18.5000000
42000.00
.
550000.00
18.5000000
42000.00
.
.
.
.
.
550000.00
18.5000000
42000.00
.
550000.00
18.5000000
42000.00
.
Cargo_Truck1
Price_INR
Mileage_kmpl
Running_km
Battery_kWh
1
1
1
0
6000000.00
5.8000000
100000.00
.
6000000.00
5.8000000
100000.00
.
6000000.00
5.8000000
100000.00
.
6000000.00
5.8000000
100000.00
.
.
.
.
.
6000000.00
5.8000000
100000.00
.
6000000.00
5.8000000
100000.00
.
Delivery_Van1
Price_INR
Mileage_kmpl
Running_km
Battery_kWh
1
1
1
0
720000.00
11.0000000
80000.00
.
720000.00
11.0000000
80000.00
.
720000.00
11.0000000
80000.00
.
720000.00
11.0000000
80000.00
.
.
.
.
.
720000.00
11.0000000
80000.00
.
720000.00
11.0000000
80000.00
.
E_Bike1
Price_INR
Mileage_kmpl
Running_km
Battery_kWh
1
1
1
1
170000.00
0
5000.00
3.7000000
170000.00
0
5000.00
3.7000000
170000.00
0
5000.00
3.7000000
170000.00
0
5000.00
3.7000000
.
.
.
.
170000.00
0
5000.00
3.7000000
170000.00
0
5000.00
3.7000000
E_Rickshaw1
Price_INR
Mileage_kmpl
Running_km
Battery_kWh
1
1
1
1
42000.00
0
15000.00
1.5000000
42000.00
0
15000.00
1.5000000
42000.00
0
15000.00
1.5000000
42000.00
0
15000.00
1.5000000
.
.
.
.
42000.00
0
15000.00
1.5000000
42000.00
0
15000.00
1.5000000
Electric_Bicycle1
Price_INR
Mileage_kmpl
Running_km
Battery_kWh
1
1
1
1
12000.00
0
2000.00
0.5000000
12000.00
0
2000.00
0.5000000
12000.00
0
2000.00
0.5000000
12000.00
0
2000.00
0.5000000
.
.
.
.
12000.00
0
2000.00
0.5000000
12000.00
0
2000.00
0.5000000
Electric_Car1
Price_INR
Mileage_kmpl
Running_km
Battery_kWh
1
1
1
1
1400000.00
0
15000.00
22.0000000
1400000.00
0
15000.00
22.0000000
1400000.00
0
15000.00
22.0000000
1400000.00
0
15000.00
22.0000000
.
.
.
.
1400000.00
0
15000.00
22.0000000
1400000.00
0
15000.00
22.0000000
Electric_Scooter1
Price_INR
Mileage_kmpl
Running_km
Battery_kWh
1
1
1
1
90000.00
0
3000.00
2.5000000
90000.00
0
3000.00
2.5000000
90000.00
0
3000.00
2.5000000
90000.00
0
3000.00
2.5000000
.
.
.
.
90000.00
0
3000.00
2.5000000
90000.00
0
3000.00
2.5000000
Luxury_Car1
Price_INR
Mileage_kmpl
Running_km
Battery_kWh
1
1
1
0
4200000.00
12.0000000
25000.00
.
4200000.00
12.0000000
25000.00
.
4200000.00
12.0000000
25000.00
.
4200000.00
12.0000000
25000.00
.
.
.
.
.
4200000.00
12.0000000
25000.00
.
4200000.00
12.0000000
25000.00
.
Motorcycle1
Price_INR
Mileage_kmpl
Running_km
Battery_kWh
1
1
1
0
75000.00
65.0000000
12000.00
.
75000.00
65.0000000
12000.00
.
75000.00
65.0000000
12000.00
.
75000.00
65.0000000
12000.00
.
.
.
.
.
75000.00
65.0000000
12000.00
.
75000.00
65.0000000
12000.00
.
Pickup1
Price_INR
Mileage_kmpl
Running_km
Battery_kWh
1
1
1
0
650000.00
12.5000000
40000.00
.
650000.00
12.5000000
40000.00
.
650000.00
12.5000000
40000.00
.
650000.00
12.5000000
40000.00
.
.
.
.
.
650000.00
12.5000000
40000.00
.
650000.00
12.5000000
40000.00
.
SUV1
Price_INR
Mileage_kmpl
Running_km
Battery_kWh
1
1
1
0
1800000.00
14.0000000
22000.00
.
1800000.00
14.0000000
22000.00
.
1800000.00
14.0000000
22000.00
.
1800000.00
14.0000000
22000.00
.
.
.
.
.
1800000.00
14.0000000
22000.00
.
1800000.00
14.0000000
22000.00
.
School_Bus1
Price_INR
Mileage_kmpl
Running_km
Battery_kWh
1
1
1
0
1800000.00
3.8000000
180000.00
.
1800000.00
3.8000000
180000.00
.
1800000.00
3.8000000
180000.00
.
1800000.00
3.8000000
180000.00
.
.
.
.
.
1800000.00
3.8000000
180000.00
.
1800000.00
3.8000000
180000.00
.
Scooter1
Price_INR
Mileage_kmpl
Running_km
Battery_kWh
1
1
1
0
62000.00
45.0000000
8000.00
.
62000.00
45.0000000
8000.00
.
62000.00
45.0000000
8000.00
.
62000.00
45.0000000
8000.00
.
.
.
.
.
62000.00
45.0000000
8000.00
.
62000.00
45.0000000
8000.00
.
Scooter_Hybrid1
Price_INR
Mileage_kmpl
Running_km
Battery_kWh
1
1
1
0
85000.00
55.0000000
9000.00
.
85000.00
55.0000000
9000.00
.
85000.00
55.0000000
9000.00
.
85000.00
55.0000000
9000.00
.
.
.
.
.
85000.00
55.0000000
9000.00
.
85000.00
55.0000000
9000.00
.
Tractor1
Price_INR
Mileage_kmpl
Running_km
Battery_kWh
1
1
1
0
800000.00
0
90000.00
.
800000.00
0
90000.00
.
800000.00
0
90000.00
.
800000.00
0
90000.00
.
.
.
.
.
800000.00
0
90000.00
.
800000.00
0
90000.00
.
Truck1
Price_INR
Mileage_kmpl
Running_km
Battery_kWh
1
1
1
0
1500000.00
6.5000000
120000.00
.
1500000.00
6.5000000
120000.00
.
1500000.00
6.5000000
120000.00
.
1500000.00
6.5000000
120000.00
.
.
.
.
.
1500000.00
6.5000000
120000.00
.
1500000.00
6.5000000
120000.00
.
Van1
Price_INR
Mileage_kmpl
Running_km
Battery_kWh
1
1
1
0
850000.00
14.0000000
60000.00
.
850000.00
14.0000000
60000.00
.
850000.00
14.0000000
60000.00
.
850000.00
14.0000000
60000.00
.
.
.
.
.
850000.00
14.0000000
60000.00
.
850000.00
14.0000000
60000.00
.

5) Frequency / distribution (categorical vars)

proc freq data=vehicles;

tables Vehicle_Type Fuel_Type Region Owner_Type Emission_Std / nocum nopercent;

title "FREQUENCIES: CATEGORICAL VARIABLES";

run;

title;

Output:

FREQUENCIES: CATEGORICAL VARIABLES

The FREQ Procedure

Vehicle_TypeFrequency
Auto_Rickshaw1
Bicycle1
Bus1
CNG_Car1
Car1
Cargo_Truck1
Delivery_Van1
E_Bike1
E_Rickshaw1
Electric_Bicycle1
Electric_Car1
Electric_Scooter1
Luxury_Car1
Motorcycle1
Pickup1
SUV1
School_Bus1
Scooter1
Scooter_Hybrid1
Tractor1
Truck1
Van1
Fuel_TypeFrequency
CNG2
Diesel8
Electric5
Individual1
Petrol6
RegionFrequency
East4
North6
South6
West6
Owner_TypeFrequency
Agricultural1
Commercial9
Individual12
Emission_StdFrequency
Frequency Missing = 1
BSIII3
BSIV13
BSVI5

6) PROC SQL: Aggregations

proc sql;

create table agg_by_fuel as

select Fuel_Type, count(*) as N, mean(Price_INR) as Avg_Price format=comma12., mean(Mileage_kmpl) as Avg_Mileage format=8.2

from vehicles

group by Fuel_Type

order by Avg_Price desc;

quit;


proc print data=agg_by_fuel noobs; 

    title "AGGREGATE BY FUEL TYPE"; 

run; 

title;

Output:

AGGREGATE BY FUEL TYPE

Fuel_TypeNAvg_PriceAvg_Mileage
Diesel81,977,5007.10
Petrol61,128,66734.92
CNG2422,50028.50
Electric5342,8000.00
Individual14,0000.00

7) PROC TRANSPOSE: Price by Year (wide)

proc sort data=vehicles out=_vsorted; by Year_Mfg; run;

proc transpose data=_vsorted out=price_by_year prefix=Price_;

by Year_Mfg notsorted;

id Vehicle_Type;

var Price_INR;

run;


proc print data=price_by_year(obs=10) noobs; 

   title "TRANSPOSED: PRICE BY YEAR (WIDE)";

run; 

title;

Output:

TRANSPOSED: PRICE BY YEAR (WIDE)

Year_Mfg_NAME_Price_TractorPrice_BusPrice_TruckPrice_School_BusPrice_VanPrice_CNG_CarPrice_Auto_RickshawPrice_PickupPrice_CarPrice_Delivery_VanPrice_Cargo_TruckPrice_ScooterPrice_Luxury_CarPrice_MotorcyclePrice_BicyclePrice_Scooter_HybridPrice_SUVPrice_Electric_CarPrice_E_RickshawPrice_E_BikePrice_Electric_ScooterPrice_Electric_Bicycle
2013Price_INR800,000.....................
2014Price_INR.3,500,000....................
2015Price_INR..1,500,0001,800,000..................
2016Price_INR....850,000780,000................
2017Price_INR......65,000650,000..............
2018Price_INR........550,000720,0006,000,000...........
2019Price_INR...........62,0004,200,000.........
2020Price_INR.............75,0004,00085,0001,800,000.....
2021Price_INR.................1,400,00042,000...
2022Price_INR...................170,00090,00012,000

8) Macro: generate region-wise summary tables

%macro region_report(reg=);

%let reg_trim=%sysfunc(strip(&reg));

proc sql;

create table region_&reg_trim as

select Vehicle_Type, count(*) as Count, mean(Price_INR) as Avg_Price format=comma12., mean(Mileage_kmpl) as Avg_Mileage format=8.2

from vehicles

where upcase(Region)=upcase("&reg_trim")

group by Vehicle_Type;

quit;


proc report data=region_&reg_trim nowd headline;

column Vehicle_Type Count Avg_Price Avg_Mileage;

define Vehicle_Type / group;

define Count / "COUNT";

define Avg_Price / "AVG PRICE (INR)";

define Avg_Mileage / "AVG MILEAGE";

title "REGION REPORT: &reg_trim";

run;

title;

%mend region_report;


%region_report(reg=North);

Output:

REGION REPORT: North

Vehicle_TypeCOUNTAVG PRICE (INR)AVG MILEAGE
Bicycle14,0000.00
Car1550,00018.50
Electric_Scooter190,0000.00
SUV11,800,00014.00
School_Bus11,800,0003.80
Truck11,500,0006.50

%region_report(reg=South);

Output:

REGION REPORT: South

Vehicle_TypeCOUNTAVG PRICE (INR)AVG MILEAGE
Bus13,500,0003.20
E_Bike1170,0000.00
E_Rickshaw142,0000.00
Luxury_Car14,200,00012.00
Pickup1650,00012.50
Scooter162,00045.00

%region_report(reg=West);

Output:

REGION REPORT: West

Vehicle_TypeCOUNTAVG PRICE (INR)AVG MILEAGE
CNG_Car1780,00022.00
Delivery_Van1720,00011.00
Electric_Bicycle112,0000.00
Electric_Car11,400,0000.00
Motorcycle175,00065.00
Van1850,00014.00

9) PROC SGPLOT: visualize price vs mileage (scatter)

proc sgplot data=vehicles;

scatter x=Mileage_kmpl y=Price_INR / group=Fuel_Type markerattrs=(symbol=circlefilled) jitter;

xaxis label="Mileage (kmpl)";

yaxis label="Price (INR)";

title "PRICE vs MILEAGE (COLOURED BY FUEL TYPE)";

run;

title;

Output:



10) PROC SQL: Create a derived table for inexpensive vehicles under 1 Lakh

proc sql;

create table under_1lakh as

select VehicleID, Vehicle_Type, Manufacturer, Price_INR, Fuel_Type, Region

from vehicles

where Price_INR <= 100000

order by Price_INR;

quit;


proc print data=under_1lakh noobs; 

   title "VEHICLES UNDER INR 1,00,000"; 

run;

title;

Output:

VEHICLES UNDER INR 1,00,000

VehicleIDVehicle_TypeManufacturerPrice_INRFuel_TypeRegion
11BicycleHero4,000IndividualNorth
21Electric_BicycleYulu12,000ElectricWest
20E_RickshawKinetic42,000ElectricSouth
3ScooterTVS62,000PetrolSouth
6Auto_RickshawBajaj65,000CNGEast
2MotorcycleBajaj75,000PetrolWest
14Scooter_HybridHero85,000PetrolEast
15Electric_ScooterOla90,000ElectricNorth

11) Cleanup intermediate

proc datasets lib=work nolist;

delete _vsorted;

quit;

Log:
NOTE: Deleting WORK._VSORTED (memtype=DATA).
NOTE: PROCEDURE DATASETS used (Total process time):
real time 0.00 seconds
user cpu time 0.01 seconds
system cpu time 0.00 seconds
memory 493.25k
OS Memory 22180.00k
Timestamp 08/12/2025 12:59:16 PM
Step Count 137 Switch Count 2
Page Faults 0
Page Reclaims 53
Page Swaps 0
Voluntary Context Switches 11
Involuntary Context Switches 0
Block Input Operations 0
Block Output Operations 8







Comments