405.What Do Vehicles, Waiting Time, and Cashless Payments Really Say About Toll Plaza Performance?A Complete SAS Analytics Investigation

What Happens When High Traffic Meets Long Waiting Time?

--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

HERE WE USED THESE SAS STATEMENTS AND FUNCTIONS FOR THIS PROJECT:

DATA STEP | SET | INPUT | DATALINES | LENGTH | FORMAT | INFORMAT | IF-THEN-ELSE | RENAME | DROP | KEEP | LABEL | ROUND | INTNX | INTCK | MDY | STRIP | TRIM | CAT | CATX | PROPCASE | UPCASE | LOWCASE | COALESCE | PROC CONTENTS | PROC SQL | CREATE TABLE | SELECT | GROUP BY | ORDER BY | PROC FREQ | PROC MEANS | PROC UNIVARIATE | PROC CORR | PROC SGPLOT | SCATTER | PROC APPEND | PROC TRANSPOSE | PROC DATASETS DELETE | MERGE | BY | %MACRO | %MEND

--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

INTRODUCTION

Toll plazas are critical infrastructure points that directly impact:

·       Traffic flow

·       Revenue collection

·       Public satisfaction

·       Fraud risk

·       Digital payment adoption

This project simulates real-world toll plaza data and demonstrates how a SAS programmer would analyze operations, efficiency, utilization, and fraud patterns using Base SAS, PROC SQL, Macros, and Analytics procedures.

TABLE OF CONTENTS

1.     Business Context

2.     Project Objectives

3.     Dataset Design

4.     Raw Data Creation (with intentional error)

5.     Identifying the Error

6.     Correcting the Error

7.     Final Clean Dataset Creation

8.     Date Handling & Derivations

9.     Efficiency & Utilization Logic (Macro)

10.  Fraud Detection Logic (Macro)

11.  PROC SQL Analytics

12.  PROC FREQ, MEANS, UNIVARIATE

13.  Correlation Analysis

14.  Visualization (PROC SGPLOT)

15.  Append, Set, Merge, Transpose

16.  Character & Numeric Functions

17.  PROC DATASETS Delete

18.  Business Insights

19.  Key Points

20.  Conclusion

1. BUSINESS CONTEXT

Government authorities and private concessionaires monitor toll plazas to:

·       Reduce congestion

·       Improve digital payments (FASTag / RFID)

·       Detect revenue leakage

·       Improve operational efficiency

SAS is widely used in:

·       Transport analytics

·       Government data platforms

·       Fraud and compliance systems

2. PROJECT OBJECTIVES

·       Create a realistic toll plaza dataset

·       Introduce a deliberate data error

·       Detect and correct the error

·       Calculate Efficiency Score (%)

·       Classify Utilization Levels

·       Identify Fraud Risk patterns

·       Use full SAS toolchain

3. DATASET DESIGN

Variables

Variable

Description

Plaza_ID

Unique toll plaza ID

Highway_Name

Name of highway

Vehicles_Per_Day

Average daily vehicles

Revenue

Daily revenue (INR)

Avg_Wait_Time

Avg waiting time (minutes)

Cashless_Ratio

% of digital payments

Efficiency_Score

Calculated (%)

Collection_Date

Observation date

Region

Geographic region

Fraud_Flag

Fraud indicator

 

Plaza_ID

A unique identifier for each toll plaza.
Used as a primary reference across datasets.

Highway_Name

Indicates the highway on which the toll plaza operates.
Useful for infrastructure planning and comparative studies.

Vehicles_Per_Day

Represents average daily traffic volume.
This is a core operational KPI.

Revenue

Daily revenue collected at the toll plaza.
Highly sensitive and audit-critical.

Avg_Wait_Time

Average waiting time per vehicle (minutes).
Directly impacts public satisfaction.

Cashless_Ratio

Percentage of vehicles using digital payment methods.
Indicates modernization and efficiency.

Efficiency_Score

Derived metric measuring operational effectiveness.
Combines traffic, payment mode, and waiting time.

Collection_Date

Date of data collection.
Enables trend and time-series analysis.

Region

Geographical grouping for regional insights.

Fraud_Flag

Derived indicator highlighting suspicious patterns.

 

4. RAW DATA CREATION (WITH INTENTIONAL ERROR)

Intentional Error:

Revenue typed as character ($) instead of numeric

This will break calculations later

data toll_raw;

    length Plaza_ID $6 Highway_Name $30 Region $15 Revenue $12;

    input Plaza_ID $ Highway_Name $ Vehicles_Per_Day Revenue $ Avg_Wait_Time

          Cashless_Ratio Collection_Date : date9. Region $;

    format Collection_Date date9.;

datalines;

TP001 NH44 45000 850000 3.5 82 01JAN2026 South

TP002 NH48 38000 720000 4.2 78 01JAN2026 West

TP003 NH16 51000 910000 2.9 88 01JAN2026 East

TP004 NH65 29000 560000 6.8 60 01JAN2026 South

TP005 NH19 47000 880000 3.1 85 01JAN2026 North

TP006 NH27 26000 490000 7.5 55 01JAN2026 West

TP007 NH30 34000 650000 5.2 70 01JAN2026 Central

TP008 NH66 52000 970000 2.7 90 01JAN2026 South

TP009 NH52 31000 600000 6.1 62 01JAN2026 North

TP010 NH40 28000 540000 7.0 58 01JAN2026 East

TP011 NH75 36000 690000 4.8 73 01JAN2026 West

TP012 NH22 24000 460000 8.0 50 01JAN2026 North

TP013 NH13 49000 920000 3.0 86 01JAN2026 South

;

run;

proc print data=toll_raw;

run;

OUTPUT:

ObsPlaza_IDHighway_NameRegionRevenueVehicles_Per_DayAvg_Wait_TimeCashless_RatioCollection_Date
1TP001NH44South850000450003.58201JAN2026
2TP002NH48West720000380004.27801JAN2026
3TP003NH16East910000510002.98801JAN2026
4TP004NH65South560000290006.86001JAN2026
5TP005NH19North880000470003.18501JAN2026
6TP006NH27West490000260007.55501JAN2026
7TP007NH30Central650000340005.27001JAN2026
8TP008NH66South970000520002.79001JAN2026
9TP009NH52North600000310006.16201JAN2026
10TP010NH40East540000280007.05801JAN2026
11TP011NH75West690000360004.87301JAN2026
12TP012NH22North460000240008.05001JAN2026
13TP013NH13South920000490003.08601JAN2026

Common issues:

·       Numeric values stored as character

·       Missing values

·       Format mismatches

·       Inconsistent lengths

This project intentionally introduces an error where Revenue is stored as a character variable instead of numeric.

This simulates:

·       Manual data entry mistakes

·       Poor upstream system design

·       CSV or Excel ingestion issues

5. IDENTIFYING THE ERROR

proc contents data=toll_raw;

run;

OUTPUT:

The CONTENTS Procedure

Data Set NameWORK.TOLL_RAWObservations13
Member TypeDATAVariables8
EngineV9Indexes0
Created02/18/2026 19:27:00Observation Length96
Last Modified02/18/2026 19:27:00Deleted Observations0
Protection CompressedNO
Data Set Type SortedNO
Label   
Data RepresentationSOLARIS_X86_64, LINUX_X86_64, ALPHA_TRU64, LINUX_IA64  
Encodingutf-8 Unicode (UTF-8)  
Engine/Host Dependent Information
Data Set Page Size131072
Number of Data Set Pages1
First Data Page1
Max Obs per Page1363
Obs in First Data Page13
Number of Data Set Repairs0
Filename/saswork/SAS_work9BFC00019166_odaws02-apse1-2.oda.sas.com/SAS_workD59100019166_odaws02-apse1-2.oda.sas.com/toll_raw.sas7bdat
Release Created9.0401M8
Host CreatedLinux
Inode Number201335737
Access Permissionrw-r--r--
Owner Nameu63247146
File Size256KB
File Size (bytes)262144
Alphabetic List of Variables and Attributes
#VariableTypeLenFormat
6Avg_Wait_TimeNum8 
7Cashless_RatioNum8 
8Collection_DateNum8DATE9.
2Highway_NameChar30 
1Plaza_IDChar6 
3RegionChar15 
4RevenueChar12 
5Vehicles_Per_DayNum8 

PROC CONTENTS is one of the most underrated but critical procedures in SAS.

It allows us to:

·  Validate variable types

·  Check lengths and formats

·  Confirm dataset structure

Problem Identified

·  Revenue is character

·  Cannot compute efficiency or fraud metrics

If Revenue is character:

·  Arithmetic operations fail

·  Efficiency calculations break

·  Fraud logic becomes invalid

·  Reports become misleading

6. CORRECTING THE ERROR

data toll_clean;

    set toll_raw;

    Revenue_Num = input(strip(Revenue), best12.);

    drop Revenue;

    rename Revenue_Num = Revenue;

run;

proc print data=toll_clean;

run;

OUTPUT:

ObsPlaza_IDHighway_NameRegionVehicles_Per_DayAvg_Wait_TimeCashless_RatioCollection_DateRevenue
1TP001NH44South450003.58201JAN2026850000
2TP002NH48West380004.27801JAN2026720000
3TP003NH16East510002.98801JAN2026910000
4TP004NH65South290006.86001JAN2026560000
5TP005NH19North470003.18501JAN2026880000
6TP006NH27West260007.55501JAN2026490000
7TP007NH30Central340005.27001JAN2026650000
8TP008NH66South520002.79001JAN2026970000
9TP009NH52North310006.16201JAN2026600000
10TP010NH40East280007.05801JAN2026540000
11TP011NH75West360004.87301JAN2026690000
12TP012NH22North240008.05001JAN2026460000
13TP013NH13South490003.08601JAN2026920000

·  STRIP ( ) removes spaces

·  INPUT ( ) converts character → numeric

·  Dataset becomes analytics-ready

7. FINAL CLEAN DATASET

data toll_final;

    set toll_clean;

    Efficiency_Score =

        round((Cashless_Ratio * Vehicles_Per_Day) /(Avg_Wait_Time * 1000),0.01);

    format Efficiency_Score percent8.2;

run;

proc print data=toll_final;

run;

OUTPUT:

ObsPlaza_IDHighway_NameRegionVehicles_Per_DayAvg_Wait_TimeCashless_RatioCollection_DateRevenueEfficiency_Score
1TP001NH44South450003.58201JAN2026850000105E3%
2TP002NH48West380004.27801JAN202672000070571%
3TP003NH16East510002.98801JAN2026910000155E3%
4TP004NH65South290006.86001JAN202656000025588%
5TP005NH19North470003.18501JAN2026880000129E3%
6TP006NH27West260007.55501JAN202649000019067%
7TP007NH30Central340005.27001JAN202665000045769%
8TP008NH66South520002.79001JAN2026970000173E3%
9TP009NH52North310006.16201JAN202660000031508%
10TP010NH40East280007.05801JAN202654000023200%
11TP011NH75West360004.87301JAN202669000054750%
12TP012NH22North240008.05001JAN202646000015000%
13TP013NH13South490003.08601JAN202692000014E4%

WHY PERCENT FORMAT IS USED:

Efficiency is expressed as a percentage because:

·  It is easy to compare across plazas

·  It is intuitive for business users

·  It aligns with KPI dashboards

8. DATE HANDLING & DERIVATIONS

data toll_dates;

    set toll_final;

    Month = month(Collection_Date);

    Next_Month = intnx('month', Collection_Date, 1);

    Days_Since = intck('day', Collection_Date, today());

    format Next_Month date9.;

run;

proc print data=toll_dates;

run;

OUTPUT:

ObsPlaza_IDHighway_NameRegionVehicles_Per_DayAvg_Wait_TimeCashless_RatioCollection_DateRevenueEfficiency_ScoreMonthNext_MonthDays_Since
1TP001NH44South450003.58201JAN2026850000105E3%101FEB202648
2TP002NH48West380004.27801JAN202672000070571%101FEB202648
3TP003NH16East510002.98801JAN2026910000155E3%101FEB202648
4TP004NH65South290006.86001JAN202656000025588%101FEB202648
5TP005NH19North470003.18501JAN2026880000129E3%101FEB202648
6TP006NH27West260007.55501JAN202649000019067%101FEB202648
7TP007NH30Central340005.27001JAN202665000045769%101FEB202648
8TP008NH66South520002.79001JAN2026970000173E3%101FEB202648
9TP009NH52North310006.16201JAN202660000031508%101FEB202648
10TP010NH40East280007.05801JAN202654000023200%101FEB202648
11TP011NH75West360004.87301JAN202669000054750%101FEB202648
12TP012NH22North240008.05001JAN202646000015000%101FEB202648
13TP013NH13South490003.08601JAN202692000014E4%101FEB202648

Functions used:

·  INTCK() → difference between dates

·  INTNX() → shifting dates

In toll systems, fraud indicators include:

·  High traffic but low revenue

·  Sudden revenue drops

·  Inconsistent payment ratios

9. UTILIZATION CLASSIFICATION MACRO

%macro utilization;

data toll_util;

    set toll_dates;

    length Utilization $10.;

    if Vehicles_Per_Day >= 50000 then Utilization = 'Very High';

    else if Vehicles_Per_Day >= 35000 then Utilization = 'High';

    else if Vehicles_Per_Day >= 25000 then Utilization = 'Medium';

    else Utilization = 'Low';

run;

proc print data=toll_util;

run;

%mend;


%utilization;

OUTPUT:

ObsPlaza_IDHighway_NameRegionVehicles_Per_DayAvg_Wait_TimeCashless_RatioCollection_DateRevenueEfficiency_ScoreMonthNext_MonthDays_SinceUtilization
1TP001NH44South450003.58201JAN2026850000105E3%101FEB202648High
2TP002NH48West380004.27801JAN202672000070571%101FEB202648High
3TP003NH16East510002.98801JAN2026910000155E3%101FEB202648Very High
4TP004NH65South290006.86001JAN202656000025588%101FEB202648Medium
5TP005NH19North470003.18501JAN2026880000129E3%101FEB202648High
6TP006NH27West260007.55501JAN202649000019067%101FEB202648Medium
7TP007NH30Central340005.27001JAN202665000045769%101FEB202648Medium
8TP008NH66South520002.79001JAN2026970000173E3%101FEB202648Very High
9TP009NH52North310006.16201JAN202660000031508%101FEB202648Medium
10TP010NH40East280007.05801JAN202654000023200%101FEB202648Medium
11TP011NH75West360004.87301JAN202669000054750%101FEB202648High
12TP012NH22North240008.05001JAN202646000015000%101FEB202648Low
13TP013NH13South490003.08601JAN202692000014E4%101FEB202648High

 

10. FRAUD DETECTION MACRO

%macro fraud_check;

data toll_fraud;

    set toll_util;

    length Fraud_Flag $3.;

    if Revenue < 500000 and Vehicles_Per_Day > 20000 then Fraud_Flag = 'YES';

    else Fraud_Flag = 'NO';

run;

proc print data=toll_fraud;

run;

%mend;


%fraud_check;

OUTPUT:

ObsPlaza_IDHighway_NameRegionVehicles_Per_DayAvg_Wait_TimeCashless_RatioCollection_DateRevenueEfficiency_ScoreMonthNext_MonthDays_SinceUtilizationFraud_Flag
1TP001NH44South450003.58201JAN2026850000105E3%101FEB202648HighNO
2TP002NH48West380004.27801JAN202672000070571%101FEB202648HighNO
3TP003NH16East510002.98801JAN2026910000155E3%101FEB202648Very HighNO
4TP004NH65South290006.86001JAN202656000025588%101FEB202648MediumNO
5TP005NH19North470003.18501JAN2026880000129E3%101FEB202648HighNO
6TP006NH27West260007.55501JAN202649000019067%101FEB202648MediumYES
7TP007NH30Central340005.27001JAN202665000045769%101FEB202648MediumNO
8TP008NH66South520002.79001JAN2026970000173E3%101FEB202648Very HighNO
9TP009NH52North310006.16201JAN202660000031508%101FEB202648MediumNO
10TP010NH40East280007.05801JAN202654000023200%101FEB202648MediumNO
11TP011NH75West360004.87301JAN202669000054750%101FEB202648HighNO
12TP012NH22North240008.05001JAN202646000015000%101FEB202648LowYES
13TP013NH13South490003.08601JAN202692000014E4%101FEB202648HighNO

In toll systems, fraud indicators include:

·  High traffic but low revenue

·  Sudden revenue drops

·  Inconsistent payment ratios

11. PROC SQL ANALYTICS

proc sql;

    create table region_summary as

    select Region,

           count(*) as Plaza_Count,

           avg(Revenue) as Avg_Revenue format=comma12.,

           avg(Efficiency_Score) as Avg_Efficiency

    from toll_fraud

    group by Region;

quit;

proc print data=region_summary;

run;

OUTPUT:

ObsRegionPlaza_CountAvg_RevenueAvg_Efficiency
1Central1650,000457.69
2East2725,000889.80
3North3646,667584.60
4South4825,0001112.04
5West3633,333481.29

·  It is powerful for aggregation

·  Business users understand SQL logic

·  It simplifies grouping and summarization

12. PROC FREQ, MEANS, UNIVARIATE

proc freq data=toll_fraud;

    tables Utilization Fraud_Flag;

run;

OUTPUT:

The FREQ Procedure

UtilizationFrequencyPercentCumulative
Frequency
Cumulative
Percent
High538.46538.46
Low17.69646.15
Medium538.461184.62
Very High215.3813100.00
Fraud_FlagFrequencyPercentCumulative
Frequency
Cumulative
Percent
NO1184.621184.62
YES215.3813100.00

proc means data=toll_fraud mean min max;

    var Revenue Vehicles_Per_Day Avg_Wait_Time;

run;

OUTPUT:

The MEANS Procedure

VariableMeanMinimumMaximum
Revenue
Vehicles_Per_Day
Avg_Wait_Time
710769.23
37692.31
4.9846154
460000.00
24000.00
2.7000000
970000.00
52000.00
8.0000000

PROC MEANS

·  Quick summary

·  Operational monitoring

proc univariate data=toll_fraud;

    var Revenue;

run;

OUTPUT:

The UNIVARIATE Procedure

Variable: Revenue

Moments
N13Sum Weights13
Mean710769.231Sum Observations9240000
Std Deviation177644.849Variance3.15577E10
Skewness0.096711Kurtosis-1.5645914
Uncorrected SS6.9462E12Corrected SS3.78692E11
Coeff Variation24.9933229Std Error Mean49269.8162
Basic Statistical Measures
LocationVariability
Mean710769.2Std Deviation177645
Median690000.0Variance3.15577E10
Mode.Range510000
  Interquartile Range320000
Tests for Location: Mu0=0
TestStatisticp Value
Student's tt14.42606Pr > |t|<.0001
SignM6.5Pr >= |M|0.0002
Signed RankS45.5Pr >= |S|0.0002
Quantiles (Definition 5)
LevelQuantile
100% Max970000
99%970000
95%970000
90%920000
75% Q3880000
50% Median690000
25% Q1560000
10%490000
5%460000
1%460000
0% Min460000
Extreme Observations
LowestHighest
ValueObsValueObs
460000128500001
49000068800005
540000109100003
560000492000013
60000099700008

PROC UNIVARIATE

·  Detailed distribution

·  Outlier detection

·  Statistical depth

13. CORRELATION ANALYSIS

proc corr data=toll_fraud;

    var Vehicles_Per_Day Revenue Cashless_Ratio Avg_Wait_Time;

run;

OUTPUT:

The CORR Procedure

4 Variables:Vehicles_Per_Day Revenue Cashless_Ratio Avg_Wait_Time
Simple Statistics
VariableNMeanStd DevSumMinimumMaximum
Vehicles_Per_Day1337692100204900002400052000
Revenue137107691776459240000460000970000
Cashless_Ratio1372.0769213.84113937.0000050.0000090.00000
Avg_Wait_Time134.984621.9143564.800002.700008.00000
Pearson Correlation Coefficients, N = 13
Prob > |r| under H0: Rho=0
 Vehicles_Per_DayRevenueCashless_RatioAvg_Wait_Time
Vehicles_Per_Day
1.00000
 
0.99735
<.0001
0.98382
<.0001
-0.97951
<.0001
Revenue
0.99735
<.0001
1.00000
 
0.98656
<.0001
-0.98455
<.0001
Cashless_Ratio
0.98382
<.0001
0.98656
<.0001
1.00000
 
-0.99756
<.0001
Avg_Wait_Time
-0.97951
<.0001
-0.98455
<.0001
-0.99756
<.0001
1.00000
 

14. VISUALIZATION

proc sgplot data=toll_fraud;

    scatter x=Vehicles_Per_Day y=Revenue;

    title "Vehicles vs Revenue";

run;

OUTPUT:

The SGPlot Procedure

15. APPEND,TRANSPOSE

proc append base=toll_fraud 

            data=toll_fraud force;

run;

proc print data=toll_fraud;

run;

OUTPUT:

ObsPlaza_IDHighway_NameRegionVehicles_Per_DayAvg_Wait_TimeCashless_RatioCollection_DateRevenueEfficiency_ScoreMonthNext_MonthDays_SinceUtilizationFraud_Flag
1TP001NH44South450003.58201JAN2026850000105E3%101FEB202648HighNO
2TP002NH48West380004.27801JAN202672000070571%101FEB202648HighNO
3TP003NH16East510002.98801JAN2026910000155E3%101FEB202648Very HighNO
4TP004NH65South290006.86001JAN202656000025588%101FEB202648MediumNO
5TP005NH19North470003.18501JAN2026880000129E3%101FEB202648HighNO
6TP006NH27West260007.55501JAN202649000019067%101FEB202648MediumYES
7TP007NH30Central340005.27001JAN202665000045769%101FEB202648MediumNO
8TP008NH66South520002.79001JAN2026970000173E3%101FEB202648Very HighNO
9TP009NH52North310006.16201JAN202660000031508%101FEB202648MediumNO
10TP010NH40East280007.05801JAN202654000023200%101FEB202648MediumNO
11TP011NH75West360004.87301JAN202669000054750%101FEB202648HighNO
12TP012NH22North240008.05001JAN202646000015000%101FEB202648LowYES
13TP013NH13South490003.08601JAN202692000014E4%101FEB202648HighNO
14TP001NH44South450003.58201JAN2026850000105E3%101FEB202648HighNO
15TP002NH48West380004.27801JAN202672000070571%101FEB202648HighNO
16TP003NH16East510002.98801JAN2026910000155E3%101FEB202648Very HighNO
17TP004NH65South290006.86001JAN202656000025588%101FEB202648MediumNO
18TP005NH19North470003.18501JAN2026880000129E3%101FEB202648HighNO
19TP006NH27West260007.55501JAN202649000019067%101FEB202648MediumYES
20TP007NH30Central340005.27001JAN202665000045769%101FEB202648MediumNO
21TP008NH66South520002.79001JAN2026970000173E3%101FEB202648Very HighNO
22TP009NH52North310006.16201JAN202660000031508%101FEB202648MediumNO
23TP010NH40East280007.05801JAN202654000023200%101FEB202648MediumNO
24TP011NH75West360004.87301JAN202669000054750%101FEB202648HighNO
25TP012NH22North240008.05001JAN202646000015000%101FEB202648LowYES
26TP013NH13South490003.08601JAN202692000014E4%101FEB202648HighNO

What Is Happening Here?

we are trying to append a dataset to itself.

·       BASE=toll_fraud

·       DATA=toll_fraud

·       Both point to the same physical dataset

Step 1:New Day Toll Data

data toll_fraud_new;

    set toll_fraud;

    Collection_Date = intnx('day', Collection_Date, 1);

    Revenue = Revenue * 1.03;

run;

proc print data=toll_fraud_new;

run;

OUTPUT:

ObsPlaza_IDHighway_NameRegionVehicles_Per_DayAvg_Wait_TimeCashless_RatioCollection_DateRevenueEfficiency_ScoreMonthNext_MonthDays_SinceUtilizationFraud_Flag
1TP001NH44South450003.58202JAN2026875500105E3%101FEB202648HighNO
2TP002NH48West380004.27802JAN202674160070571%101FEB202648HighNO
3TP003NH16East510002.98802JAN2026937300155E3%101FEB202648Very HighNO
4TP004NH65South290006.86002JAN202657680025588%101FEB202648MediumNO
5TP005NH19North470003.18502JAN2026906400129E3%101FEB202648HighNO
6TP006NH27West260007.55502JAN202650470019067%101FEB202648MediumYES
7TP007NH30Central340005.27002JAN202666950045769%101FEB202648MediumNO
8TP008NH66South520002.79002JAN2026999100173E3%101FEB202648Very HighNO
9TP009NH52North310006.16202JAN202661800031508%101FEB202648MediumNO
10TP010NH40East280007.05802JAN202655620023200%101FEB202648MediumNO
11TP011NH75West360004.87302JAN202671070054750%101FEB202648HighNO
12TP012NH22North240008.05002JAN202647380015000%101FEB202648LowYES
13TP013NH13South490003.08602JAN202694760014E4%101FEB202648HighNO

Step 2:Correct Append

proc append base=toll_fraud

            data=toll_fraud_new force;

run;

proc print data=toll_fraud;

run;

OUTPUT:

ObsPlaza_IDHighway_NameRegionVehicles_Per_DayAvg_Wait_TimeCashless_RatioCollection_DateRevenueEfficiency_ScoreMonthNext_MonthDays_SinceUtilizationFraud_Flag
1TP001NH44South450003.58201JAN2026850000105E3%101FEB202648HighNO
2TP002NH48West380004.27801JAN202672000070571%101FEB202648HighNO
3TP003NH16East510002.98801JAN2026910000155E3%101FEB202648Very HighNO
4TP004NH65South290006.86001JAN202656000025588%101FEB202648MediumNO
5TP005NH19North470003.18501JAN2026880000129E3%101FEB202648HighNO
6TP006NH27West260007.55501JAN202649000019067%101FEB202648MediumYES
7TP007NH30Central340005.27001JAN202665000045769%101FEB202648MediumNO
8TP008NH66South520002.79001JAN2026970000173E3%101FEB202648Very HighNO
9TP009NH52North310006.16201JAN202660000031508%101FEB202648MediumNO
10TP010NH40East280007.05801JAN202654000023200%101FEB202648MediumNO
11TP011NH75West360004.87301JAN202669000054750%101FEB202648HighNO
12TP012NH22North240008.05001JAN202646000015000%101FEB202648LowYES
13TP013NH13South490003.08601JAN202692000014E4%101FEB202648HighNO
14TP001NH44South450003.58202JAN2026875500105E3%101FEB202648HighNO
15TP002NH48West380004.27802JAN202674160070571%101FEB202648HighNO
16TP003NH16East510002.98802JAN2026937300155E3%101FEB202648Very HighNO
17TP004NH65South290006.86002JAN202657680025588%101FEB202648MediumNO
18TP005NH19North470003.18502JAN2026906400129E3%101FEB202648HighNO
19TP006NH27West260007.55502JAN202650470019067%101FEB202648MediumYES
20TP007NH30Central340005.27002JAN202666950045769%101FEB202648MediumNO
21TP008NH66South520002.79002JAN2026999100173E3%101FEB202648Very HighNO
22TP009NH52North310006.16202JAN202661800031508%101FEB202648MediumNO
23TP010NH40East280007.05802JAN202655620023200%101FEB202648MediumNO
24TP011NH75West360004.87302JAN202671070054750%101FEB202648HighNO
25TP012NH22North240008.05002JAN202647380015000%101FEB202648LowYES
26TP013NH13South490003.08602JAN202694760014E4%101FEB202648HighNO

PROC APPEND is designed for:

·  Daily incremental loads

·  Monthly data ingestion

·  Combining historical + current data

·  Faster than SET (no full rewrite)

Step 1: Sort first

proc sort data=toll_fraud;

by Plaza_ID;

run;

proc print data=toll_fraud;

run;

OUTPUT:

ObsPlaza_IDHighway_NameRegionVehicles_Per_DayAvg_Wait_TimeCashless_RatioCollection_DateRevenueEfficiency_ScoreMonthNext_MonthDays_SinceUtilizationFraud_Flag
1TP001NH44South450003.58201JAN2026850000105E3%101FEB202648HighNO
2TP001NH44South450003.58202JAN2026875500105E3%101FEB202648HighNO
3TP002NH48West380004.27801JAN202672000070571%101FEB202648HighNO
4TP002NH48West380004.27802JAN202674160070571%101FEB202648HighNO
5TP003NH16East510002.98801JAN2026910000155E3%101FEB202648Very HighNO
6TP003NH16East510002.98802JAN2026937300155E3%101FEB202648Very HighNO
7TP004NH65South290006.86001JAN202656000025588%101FEB202648MediumNO
8TP004NH65South290006.86002JAN202657680025588%101FEB202648MediumNO
9TP005NH19North470003.18501JAN2026880000129E3%101FEB202648HighNO
10TP005NH19North470003.18502JAN2026906400129E3%101FEB202648HighNO
11TP006NH27West260007.55501JAN202649000019067%101FEB202648MediumYES
12TP006NH27West260007.55502JAN202650470019067%101FEB202648MediumYES
13TP007NH30Central340005.27001JAN202665000045769%101FEB202648MediumNO
14TP007NH30Central340005.27002JAN202666950045769%101FEB202648MediumNO
15TP008NH66South520002.79001JAN2026970000173E3%101FEB202648Very HighNO
16TP008NH66South520002.79002JAN2026999100173E3%101FEB202648Very HighNO
17TP009NH52North310006.16201JAN202660000031508%101FEB202648MediumNO
18TP009NH52North310006.16202JAN202661800031508%101FEB202648MediumNO
19TP010NH40East280007.05801JAN202654000023200%101FEB202648MediumNO
20TP010NH40East280007.05802JAN202655620023200%101FEB202648MediumNO
21TP011NH75West360004.87301JAN202669000054750%101FEB202648HighNO
22TP011NH75West360004.87302JAN202671070054750%101FEB202648HighNO
23TP012NH22North240008.05001JAN202646000015000%101FEB202648LowYES
24TP012NH22North240008.05002JAN202647380015000%101FEB202648LowYES
25TP013NH13South490003.08601JAN202692000014E4%101FEB202648HighNO
26TP013NH13South490003.08602JAN202694760014E4%101FEB202648HighNO

Step 2:Transpose with BY

proc transpose data=toll_fraud out=toll_transposed;

    by Plaza_ID  NotSorted;

    var Revenue Vehicles_Per_Day;

run;

proc print data=toll_transposed;

run;

OUTPUT:

ObsPlaza_ID_NAME_COL1COL2
1TP001Revenue850000875500
2TP001Vehicles_Per_Day4500045000
3TP002Revenue720000741600
4TP002Vehicles_Per_Day3800038000
5TP003Revenue910000937300
6TP003Vehicles_Per_Day5100051000
7TP004Revenue560000576800
8TP004Vehicles_Per_Day2900029000
9TP005Revenue880000906400
10TP005Vehicles_Per_Day4700047000
11TP006Revenue490000504700
12TP006Vehicles_Per_Day2600026000
13TP007Revenue650000669500
14TP007Vehicles_Per_Day3400034000
15TP008Revenue970000999100
16TP008Vehicles_Per_Day5200052000
17TP009Revenue600000618000
18TP009Vehicles_Per_Day3100031000
19TP010Revenue540000556200
20TP010Vehicles_Per_Day2800028000
21TP011Revenue690000710700
22TP011Vehicles_Per_Day3600036000
23TP012Revenue460000473800
24TP012Vehicles_Per_Day2400024000
25TP013Revenue920000947600
26TP013Vehicles_Per_Day4900049000

16. CHARACTER & NUMERIC FUNCTIONS

data toll_text;

    set toll_fraud;

    Highway_Clean = propcase(strip(Highway_Name));

    Region_Upper = upcase(Region);

    Combined = catx('-', Plaza_ID, Region);

run;

proc print data=toll_text;

 var Highway_Name Highway_Clean Region Region_Upper Plaza_ID Combined;

run;

OUTPUT:

ObsHighway_NameHighway_CleanRegionRegion_UpperPlaza_IDCombined
1NH44Nh44SouthSOUTHTP001TP001-South
2NH44Nh44SouthSOUTHTP001TP001-South
3NH48Nh48WestWESTTP002TP002-West
4NH48Nh48WestWESTTP002TP002-West
5NH16Nh16EastEASTTP003TP003-East
6NH16Nh16EastEASTTP003TP003-East
7NH65Nh65SouthSOUTHTP004TP004-South
8NH65Nh65SouthSOUTHTP004TP004-South
9NH19Nh19NorthNORTHTP005TP005-North
10NH19Nh19NorthNORTHTP005TP005-North
11NH27Nh27WestWESTTP006TP006-West
12NH27Nh27WestWESTTP006TP006-West
13NH30Nh30CentralCENTRALTP007TP007-Central
14NH30Nh30CentralCENTRALTP007TP007-Central
15NH66Nh66SouthSOUTHTP008TP008-South
16NH66Nh66SouthSOUTHTP008TP008-South
17NH52Nh52NorthNORTHTP009TP009-North
18NH52Nh52NorthNORTHTP009TP009-North
19NH40Nh40EastEASTTP010TP010-East
20NH40Nh40EastEASTTP010TP010-East
21NH75Nh75WestWESTTP011TP011-West
22NH75Nh75WestWESTTP011TP011-West
23NH22Nh22NorthNORTHTP012TP012-North
24NH22Nh22NorthNORTHTP012TP012-North
25NH13Nh13SouthSOUTHTP013TP013-South
26NH13Nh13SouthSOUTHTP013TP013-South

17. PROC DATASETS DELETE

proc datasets lib=work;

    delete toll_raw toll_clean;

quit;

LOG:

NOTE: Deleting WORK.TOLL_RAW (memtype=DATA).
NOTE: Deleting WORK.TOLL_CLEAN (memtype=DATA).

Deleting intermediate datasets:

·  Saves memory

·  Reduces confusion

·  Improves workflow hygiene

18. BUSINESS INSIGHTS

·  High cashless ratio improves efficiency

·  Low revenue + high traffic = fraud risk

·  Waiting time is a critical KPI

·  Digital tolling directly impacts congestion

19. 5 KEY POINTS ABOUT THIS PROJECT

1.This project covers the complete SAS workflow — from raw data creation and validation to cleaning, ransformation, analysis, visualization, and reporting. It demonstrates how real-world toll plaza data is handled in structured enterprise environments.

2. It calculates key performance indicators such as Vehicles_Per_Day, Revenue, Avg_Wait_Time, Cashless_Ratio, and derives an Efficiency_Score (%). This helps evaluate how effectively each toll plaza operates under varying traffic conditions.

3. The project includes rule-based fraud identification (e.g., high traffic but low revenue scenarios). It simulates how analytics teams detect potential revenue leakage or operational manipulation in infrastructure systems.

4. It integrates DATA step, PROC SQL, PROC MEANS, PROC UNIVARIATE, PROC FREQ, PROC CORR, PROC SGPLOT, APPEND, MERGE, TRANSPOSE, and macro programming. This shows strong command over Base SAS and advanced programming concepts.

5. An intentional data-type error is introduced and corrected to demonstrate real-world debugging, validation, and best practices. This strengthens production readiness and audit reliability.

20. CONCLUSION

The Toll Plaza Operations, Revenue, Efficiency, and Fraud Analytics project demonstrates how SAS can be applied to real-world infrastructure and financial monitoring systems. By simulating daily toll plaza data, the project highlights the importance of structured data management, validation, and analytical modeling. It shows how operational metrics such as traffic volume, waiting time, and digital payment adoption directly influence revenue generation and overall efficiency.

The inclusion of fraud detection logic adds practical business value, as revenue leakage and irregular patterns are critical concerns in toll management systems. Through procedures like PROC SQL, PROC MEANS, PROC FREQ, PROC CORR, and visualization tools, the project transforms raw operational data into meaningful insights.

Overall, this project not only strengthens technical SAS skills but also builds business understanding, analytical thinking, and data governance awareness. It is a strong portfolio- demonstrating both programming proficiency and the ability to translate data into strategic decision-making insights.


INTERVIEW QUESTIONS FOR YOU

·  What is the difference between the DATA step and PROC step in SAS, and when would you choose one over the other?

·  What is the difference between SET and MERGE statements in SAS, and how does SAS handle matching observations during a MERGE with BY variables?

·  How do you debug a SAS program if the log shows warnings or errors, and what is the importance of checking the SAS log before validating output?

--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

About the Author:

SAS Learning Hub is a data analytics and SAS programming platform focused on clinical, financial, and real-world data analysis. The content is created by professionals with academic training in Pharmaceutics and hands-on experience in Base SAS, PROC SQL, Macros, SDTM, and ADaM, providing practical and industry-relevant SAS learning resources.


Disclaimer:

The datasets and analysis in this article are created for educational and demonstration purposes only. They do not represent TOLL data.


Our Mission:

This blog provides industry-focused SAS programming tutorials and analytics projects covering finance, healthcare, and technology.


This project is suitable for:

·  Students learning SAS

·  Data analysts building portfolios

·  Professionals preparing for SAS interviews

·  Bloggers writing about analytics 

--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Follow Us On : 


 
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

--->Follow our blog for more SAS-based analytics projects and industry data models.

---> Support Us By Following Our Blog..

To deepen your understanding of SAS analytics, please refer to our other data science and industry-focused projects listed below:

1.Can SAS analytics reveal which programming languages are popular but surprisingly difficult to master?

2.Is SDTM DM quality control in SAS the key to avoiding last-minute FDA rejection risks?

3.Can SAS analytics reveal which world tourist places truly attract visitors year after year?

--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

About Us | Contact Privacy Policy



Comments

Popular posts from this blog

397.If a satellite has excellent signal strength but very high latency, can it still deliver good quality communication? Why or why not?A Sas Study

401.How Efficient Are Global Data Centers? A Complete SAS Analytics Study

383.Which Pharma Distributors Are High-Risk? Can SAS Detect Delays, Temperature Violations, and Fraud?