421.Can We Detect Errors, Prevent Fraud, And Optimize Last-Mile Delivery Operations Using Advanced SAS Programming?

Smart SAS Intelligence For Last-Mile Delivery Error Detection And Fraud Prevention

--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

HERE IN THIS PROJECT WE USED THESE SAS STATEMENTS —DATA STEP | LENGTH | INPUT | DATALINES | SET | MERGE | IF-THEN | FORMAT | INFORMAT | PROC SORT | PROC PRINT | PROC CONTENTS | PROC TRANSPOSE | PROC APPEND | PROC DATASETS DELETE | MACRO / %MACRO / %MEND

--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Table Of Contents

  1. Introduction
  2. Business Context
  3. Project Objectives
  4. Dataset Structure
  5. Raw Dataset With Intentional Errors
  6. Understanding The Errors In The Dataset
  7. Corrected SAS Code For Clean Dataset
  8. Data Preparation Using SAS Functions
  9. Numeric Functions Used
  10. Character Functions Used
  11. Date Functions Using MDY, INTCK, INTNX
  12. Dataset Operations Using SET, MERGE, APPEND
  13. Data Transformation Using PROC TRANSPOSE
  14. Fraud Detection Macro Logic
  15. Utilization Classification Logic
  16. Data Cleaning Using PROC DATASETS DELETE
  17. Analytical Insights From The Dataset
  18. Business Benefits
  19. 20 Key Points About This Project
  20. Conclusion

Introduction

Last-mile delivery is the final stage of the logistics process where goods move from a distribution center to the customer's location. This phase is often considered the most complex and expensive part of supply chain logistics. Companies such as e-commerce platforms, courier services, and food delivery providers heavily depend on efficient last-mile delivery operations to ensure customer satisfaction.

However, managing last-mile deliveries involves multiple operational challenges. These include route optimization, delivery delays, failed delivery attempts, fuel cost fluctuations, and sometimes fraudulent reporting by drivers or logistics partners. Errors in operational data can lead to incorrect decision making and financial losses.

In modern analytics environments, SAS programming is widely used to manage operational datasets, detect anomalies, and optimize logistics performance. By combining data steps, SQL queries, macros, and advanced functions, organizations can identify inefficiencies, detect suspicious behavior, and improve service quality.

This project demonstrates how SAS programming can be used to:

  • Create a last-mile delivery dataset
  • Introduce intentional data errors
  • Identify and correct those errors
  • Detect potential delivery fraud
  • Calculate operational metrics
  • Perform dataset transformations and reporting

The project uses multiple SAS techniques including:

  • Data Step Programming
  • Character Functions
  • Numeric Functions
  • Date Functions
  • PROC APPEND
  • PROC TRANSPOSE
  • Dataset MERGE
  • Fraud Detection Macros

The final result is a clean dataset that can be used for performance analysis and fraud monitoring.

Business Context

Imagine a logistics company operating across several Indian cities such as Hyderabad, Bangalore, Chennai, Mumbai, and Delhi. The company manages thousands of daily deliveries and records operational metrics for each delivery.

Each delivery record contains:

  • Delivery ID
  • City of delivery
  • Distance traveled
  • Delivery time
  • Number of failed attempts
  • Fuel cost
  • Service rating
  • Operational utilization
  • Percentage performance score
  • Delivery service fees
  • Delivery date

However, the company notices inconsistencies in their operational data.

Examples include:

  • Incorrect city name formatting
  • Missing delivery time
  • Negative fuel cost values
  • Invalid ratings
  • Fraudulent reporting of failed deliveries

To solve these issues, the company assigns a data analytics team to develop a SAS program that:

  1. Creates a delivery dataset
  2. Identifies errors
  3. Corrects the data
  4. Detects suspicious patterns
  5. Generates insights for operations managers

Project Objectives

The main objectives of this project are:

  1. Create a simulated dataset representing last-mile delivery operations
  2. Introduce intentional errors in the raw dataset
  3. Identify and explain those errors
  4. Correct the dataset using SAS programming
  5. Apply character and numeric functions for cleaning data
  6. Use date functions for delivery scheduling analysis
  7. Implement macros for fraud detection
  8. Perform dataset merging and transformation
  9. Create utilization classifications
  10. Generate an optimized final dataset for analysis

Dataset Structure

The dataset includes the following variables:

Variable

Description

Delivery_ID

Unique identifier for delivery

City

City where delivery occurred

Distance_KM

Distance traveled

Delivery_Time

Time taken for delivery

Failed_Attempts

Number of failed attempts

Fuel_Cost

Fuel expenditure

Service_Rating

Customer rating

Percentage

Performance percentage

Fees

Delivery charges

Utilization

Operational classification

Delivery_Date

Date of delivery

Driver_Name

Name of delivery driver


1. Raw Dataset With Intentional Errors

data delivery_raw;

length Delivery_ID $6 City $20 Driver_Name $25 Utilization $20;

input Delivery_ID $ City $ Distance_km Delivery_Time Failed_Attempts

      Fuel_Cost Service_Rating Percentage Fees Driver_Name $ Delivery_Date :ddmmyy10.;

datalines;

D001 hyderabad 12 45 0 150 4.5 90 40 ramesh 12/01/2025

D002 BANGALORE 20 . 1 200 4.2 88 50 SURESH 15/01/2025

D003 chennai -10 60 0 180 4.8 92 45 mahesh 18/01/2025

D004 mumbai 15 55 2 -120 3.5 70 35 anil 20/01/2025

D005 DELHI 25 70 0 220 5.5 95 55 raju 25/01/2025

D006 kolkata 18 65 1 190 4.1 85 42 kumar 28/01/2025

D007 Hyderabad 22 80 3 250 3.8 60 60 venkat 02/02/2025

D008 bangalore 14 50 0 160 4.4 89 41 ravi 05/02/2025

D009 chennai 30 90 2 300 3.9 75 70 ganesh 07/02/2025

D010 mumbai 11 40 0 120 4.7 91 38 sandeep 10/02/2025

D011 delhi 17 58 1 170 4.3 87 43 karthik 12/02/2025

D012 kolkata 16 52 0 165 4.6 90 40 ajay 14/02/2025

D013 hyderabad 19 68 0 195 4.2 86 48 manoj 16/02/2025

D014 bangalore 23 75 1 210 4.1 84 52 srikanth 18/02/2025

D015 chennai 27 82 2 240 3.7 72 58 praveen 20/02/2025

D016 mumbai 13 48 0 155 4.5 93 39 rajesh 22/02/2025

D017 delhi 21 70 1 205 4.0 83 51 deepak 25/02/2025

D018 kolkata 24 78 0 215 4.6 92 54 arjun 27/02/2025

;

run;

proc print data=delivery_raw;

run;

OUTPUT:

ObsDelivery_IDCityDriver_NameUtilizationDistance_kmDelivery_TimeFailed_AttemptsFuel_CostService_RatingPercentageFeesDelivery_Date
1D001hyderabadramesh 124501504.5904023753
2D002BANGALORESURESH 20.12004.2885023756
3D003chennaimahesh -106001804.8924523759
4D004mumbaianil 15552-1203.5703523761
5D005DELHIraju 257002205.5955523766
6D006kolkatakumar 186511904.1854223769
7D007Hyderabadvenkat 228032503.8606023774
8D008bangaloreravi 145001604.4894123777
9D009chennaiganesh 309023003.9757023779
10D010mumbaisandeep 114001204.7913823782
11D011delhikarthik 175811704.3874323784
12D012kolkataajay 165201654.6904023786
13D013hyderabadmanoj 196801954.2864823788
14D014bangaloresrikanth 237512104.1845223790
15D015chennaipraveen 278222403.7725823792
16D016mumbairajesh 134801554.5933923794
17D017delhideepak 217012054.0835123797
18D018kolkataarjun 247802154.6925423799

Understanding The Errors In The Dataset

Several intentional errors exist in the dataset:

1.Negative Distance

D003 chennai -10

Distance cannot be negative.

2.Missing Delivery Time

D002 BANGALORE 20 .

Delivery time is missing.

3.Negative Fuel Cost

D004 mumbai fuel_cost = -120

Fuel cost cannot be negative.

4.Invalid Service Rating

D005 rating = 5.5

Rating must be between 1 and 5.

5.Inconsistent City Names

Example:

hyderabad
HYDERABAD
Hyderabad

Needs standard formatting.

2. Corrected SAS Code

data delivery_clean;

set delivery_raw;

City = propcase(strip(City));

Driver_Name = propcase(strip(Driver_Name));

Distance_km = abs(Distance_km);

Fuel_Cost = abs(Fuel_Cost);

if Service_Rating > 5 then Service_Rating = 5;

if Delivery_Time = . then Delivery_Time = round(Distance_km*3);

Delivery_Date2 = mdy(month(Delivery_Date),day(Delivery_Date),year(Delivery_Date));

Month_Gap = intck('month','01JAN2025'd,Delivery_Date);

Next_Service = intnx('month',Delivery_Date,1);

if Distance_km <=15 then Utilization="Low";

else if Distance_km <=25 then Utilization="Medium";

else Utilization="High";

run;

proc print data=delivery_clean;

run;

OUTPUT:

ObsDelivery_IDCityDriver_NameUtilizationDistance_kmDelivery_TimeFailed_AttemptsFuel_CostService_RatingPercentageFeesDelivery_DateDelivery_Date2Month_GapNext_Service
1D001HyderabadRameshLow124501504.590402375323753023773
2D002BangaloreSureshMedium206012004.288502375623756023773
3D003ChennaiMaheshLow106001804.892452375923759023773
4D004MumbaiAnilLow155521203.570352376123761023773
5D005DelhiRajuMedium257002205.095552376623766023773
6D006KolkataKumarMedium186511904.185422376923769023773
7D007HyderabadVenkatMedium228032503.860602377423774123801
8D008BangaloreRaviLow145001604.489412377723777123801
9D009ChennaiGaneshHigh309023003.975702377923779123801
10D010MumbaiSandeepLow114001204.791382378223782123801
11D011DelhiKarthikMedium175811704.387432378423784123801
12D012KolkataAjayMedium165201654.690402378623786123801
13D013HyderabadManojMedium196801954.286482378823788123801
14D014BangaloreSrikanthMedium237512104.184522379023790123801
15D015ChennaiPraveenHigh278222403.772582379223792123801
16D016MumbaiRajeshLow134801554.593392379423794123801
17D017DelhiDeepakMedium217012054.083512379723797123801
18D018KolkataArjunMedium247802154.692542379923799123801

Character Functions 

Functions used include:

STRIP

Removes leading and trailing spaces.

TRIM

Removes trailing spaces.

CAT

Concatenates values.

CATX

Concatenates values with delimiter.

PROPCASE

Converts text to proper case.

UPCASE

Converts text to uppercase.

LOWCASE

Converts text to lowercase.

COALESCE

Handles missing values.

Numeric Functions

Examples used:

ABS( )
ROUND( )
SUM( )
MEAN( )

These help ensure numerical accuracy and standardization.

Date Functions

MDY

Creates SAS date from month, day, year.

INTCK

Counts intervals between dates.

INTNX

Moves date forward or backward.

3. Best Practice In Real Projects

Before using PROC APPEND, analysts usually check dataset structures using:

proc contents data=delivery_clean;

run;

proc print data=delivery_clean;

run;

OUTPUT:

The CONTENTS Procedure

Data Set NameWORK.DELIVERY_CLEANObservations18
Member TypeDATAVariables15
EngineV9Indexes0
Created03/06/2026 16:39:51Observation Length160
Last Modified03/06/2026 16:39:51Deleted Observations0
Protection CompressedNO
Data Set Type SortedNO
Label   
Data RepresentationSOLARIS_X86_64, LINUX_X86_64, ALPHA_TRU64, LINUX_IA64  
Encodingutf-8 Unicode (UTF-8)  
Engine/Host Dependent Information
Data Set Page Size131072
Number of Data Set Pages1
First Data Page1
Max Obs per Page818
Obs in First Data Page18
Number of Data Set Repairs0
Filename/saswork/SAS_workD19200007C04_odaws01-apse1-2.oda.sas.com/SAS_workF85F00007C04_odaws01-apse1-2.oda.sas.com/delivery_clean.sas7bdat
Release Created9.0401M8
Host CreatedLinux
Inode Number201344327
Access Permissionrw-r--r--
Owner Nameu63247146
File Size256KB
File Size (bytes)262144
Alphabetic List of Variables and Attributes
#VariableTypeLen
2CityChar20
12Delivery_DateNum8
13Delivery_Date2Num8
1Delivery_IDChar6
6Delivery_TimeNum8
5Distance_kmNum8
3Driver_NameChar25
7Failed_AttemptsNum8
11FeesNum8
8Fuel_CostNum8
14Month_GapNum8
15Next_ServiceNum8
10PercentageNum8
9Service_RatingNum8
4UtilizationChar20
ObsDelivery_IDCityDriver_NameUtilizationDistance_kmDelivery_TimeFailed_AttemptsFuel_CostService_RatingPercentageFeesDelivery_DateDelivery_Date2Month_GapNext_Service
1D001HyderabadRameshLow124501504.590402375323753023773
2D002BangaloreSureshMedium206012004.288502375623756023773
3D003ChennaiMaheshLow106001804.892452375923759023773
4D004MumbaiAnilLow155521203.570352376123761023773
5D005DelhiRajuMedium257002205.095552376623766023773
6D006KolkataKumarMedium186511904.185422376923769023773
7D007HyderabadVenkatMedium228032503.860602377423774123801
8D008BangaloreRaviLow145001604.489412377723777123801
9D009ChennaiGaneshHigh309023003.975702377923779123801
10D010MumbaiSandeepLow114001204.791382378223782123801
11D011DelhiKarthikMedium175811704.387432378423784123801
12D012KolkataAjayMedium165201654.690402378623786123801
13D013HyderabadManojMedium196801954.286482378823788123801
14D014BangaloreSrikanthMedium237512104.184522379023790123801
15D015ChennaiPraveenHigh278222403.772582379223792123801
16D016MumbaiRajeshLow134801554.593392379423794123801
17D017DelhiDeepakMedium217012054.083512379723797123801
18D018KolkataArjunMedium247802154.692542379923799123801

proc contents data=delivery_new;

run;

proc print data=delivery_new;

run;

OUTPUT:

The CONTENTS Procedure

Data Set NameWORK.DELIVERY_NEWObservations3
Member TypeDATAVariables11
EngineV9Indexes0
Created03/06/2026 16:40:50Observation Length120
Last Modified03/06/2026 16:40:50Deleted Observations0
Protection CompressedNO
Data Set Type SortedNO
Label   
Data RepresentationSOLARIS_X86_64, LINUX_X86_64, ALPHA_TRU64, LINUX_IA64  
Encodingutf-8 Unicode (UTF-8)  
Engine/Host Dependent Information
Data Set Page Size131072
Number of Data Set Pages1
First Data Page1
Max Obs per Page1090
Obs in First Data Page3
Number of Data Set Repairs0
Filename/saswork/SAS_workD19200007C04_odaws01-apse1-2.oda.sas.com/SAS_workF85F00007C04_odaws01-apse1-2.oda.sas.com/delivery_new.sas7bdat
Release Created9.0401M8
Host CreatedLinux
Inode Number201344326
Access Permissionrw-r--r--
Owner Nameu63247146
File Size256KB
File Size (bytes)262144
Alphabetic List of Variables and Attributes
#VariableTypeLen
2CityChar20
11Delivery_DateNum8
1Delivery_IDChar6
5Delivery_TimeNum8
4Distance_kmNum8
3Driver_NameChar25
6Failed_AttemptsNum8
10FeesNum8
7Fuel_CostNum8
9PercentageNum8
8Service_RatingNum8
ObsDelivery_IDCityDriver_NameDistance_kmDelivery_TimeFailed_AttemptsFuel_CostService_RatingPercentageFeesDelivery_Date
1D019HyderabadRaghav268512304.4905623801
2D020BangaloreMohan186201754.6924423803
3D021ChennaiSuresh227212054.3874923805

·  Variables match

·  Data types match

·  Lengths match

4. PROC APPEND

Create New Dataset

data delivery_new;

length Delivery_ID $6 City $20 Driver_Name $25;

input Delivery_ID $ City $ Distance_km Delivery_Time Failed_Attempts Fuel_Cost

      Service_Rating Percentage Fees Driver_Name $ Delivery_Date :ddmmyy10.;

datalines;

D019 Hyderabad 26 85 1 230 4.4 90 56 Raghav 01/03/2025

D020 Bangalore 18 62 0 175 4.6 92 44 Mohan 03/03/2025

D021 Chennai 22 72 1 205 4.3 87 49 Suresh 05/03/2025

;

run;

proc print data=delivery_new;

run;

OUTPUT:

ObsDelivery_IDCityDriver_NameDistance_kmDelivery_TimeFailed_AttemptsFuel_CostService_RatingPercentageFeesDelivery_Date
1D019HyderabadRaghav268512304.4905623801
2D020BangaloreMohan186201754.6924423803
3D021ChennaiSuresh227212054.3874923805

Explanation Of The Code

DATA delivery_new

Creates a new SAS dataset that contains new delivery records.

length

Defines the character variable lengths.

Example:

length Delivery_ID $6

This ensures SAS stores the ID as a character value with a length of 6.

Why it is important:

·       Prevents truncation

·       Ensures consistent structure with the base dataset.

INPUT Statement

Reads variables in the order specified.

:ddmmyy10.

This informat reads dates written as:

01/03/2025

Without this informat, SAS would not correctly interpret the date.

DATALINES

Allows manual entry of data.

Each row represents a new delivery record.

PROC APPEND

proc append base=delivery_clean

            data=delivery_new;

run;

proc print data=delivery_clean;

run;

OUTPUT:

ObsDelivery_IDCityDriver_NameUtilizationDistance_kmDelivery_TimeFailed_AttemptsFuel_CostService_RatingPercentageFeesDelivery_DateDelivery_Date2Month_GapNext_Service
1D001HyderabadRameshLow124501504.590402375323753023773
2D002BangaloreSureshMedium206012004.288502375623756023773
3D003ChennaiMaheshLow106001804.892452375923759023773
4D004MumbaiAnilLow155521203.570352376123761023773
5D005DelhiRajuMedium257002205.095552376623766023773
6D006KolkataKumarMedium186511904.185422376923769023773
7D007HyderabadVenkatMedium228032503.860602377423774123801
8D008BangaloreRaviLow145001604.489412377723777123801
9D009ChennaiGaneshHigh309023003.975702377923779123801
10D010MumbaiSandeepLow114001204.791382378223782123801
11D011DelhiKarthikMedium175811704.387432378423784123801
12D012KolkataAjayMedium165201654.690402378623786123801
13D013HyderabadManojMedium196801954.286482378823788123801
14D014BangaloreSrikanthMedium237512104.184522379023790123801
15D015ChennaiPraveenHigh278222403.772582379223792123801
16D016MumbaiRajeshLow134801554.593392379423794123801
17D017DelhiDeepakMedium217012054.083512379723797123801
18D018KolkataArjunMedium247802154.692542379923799123801
19D019HyderabadRaghav 268512304.4905623801...
20D020BangaloreMohan 186201754.6924423803...
21D021ChennaiSuresh 227212054.3874923805...

Explanation Of PROC APPEND

PROC APPEND

Used to add observations from one dataset to another without recreating the entire dataset.

BASE=delivery_clean

This is the target dataset.

New records will be added here.

DATA=delivery_new

This dataset contains new observations to be appended.

What Happens After Execution

Before append:

delivery_clean18 observations
delivery_new   → 3 observations

After append:

delivery_clean21 observations

Advantages:

·  Faster

·  Does not recreate dataset

·  Efficient for large datasets

Why PROC APPEND Is Used Instead Of SET

DATA STEP SET

data combined;

set delivery_clean

    delivery_new;

run;

proc print data=combined;

run;

OUTPUT:

ObsDelivery_IDCityDriver_NameUtilizationDistance_kmDelivery_TimeFailed_AttemptsFuel_CostService_RatingPercentageFeesDelivery_DateDelivery_Date2Month_GapNext_Service
1D001HyderabadRameshLow124501504.590402375323753023773
2D002BangaloreSureshMedium206012004.288502375623756023773
3D003ChennaiMaheshLow106001804.892452375923759023773
4D004MumbaiAnilLow155521203.570352376123761023773
5D005DelhiRajuMedium257002205.095552376623766023773
6D006KolkataKumarMedium186511904.185422376923769023773
7D007HyderabadVenkatMedium228032503.860602377423774123801
8D008BangaloreRaviLow145001604.489412377723777123801
9D009ChennaiGaneshHigh309023003.975702377923779123801
10D010MumbaiSandeepLow114001204.791382378223782123801
11D011DelhiKarthikMedium175811704.387432378423784123801
12D012KolkataAjayMedium165201654.690402378623786123801
13D013HyderabadManojMedium196801954.286482378823788123801
14D014BangaloreSrikanthMedium237512104.184522379023790123801
15D015ChennaiPraveenHigh278222403.772582379223792123801
16D016MumbaiRajeshLow134801554.593392379423794123801
17D017DelhiDeepakMedium217012054.083512379723797123801
18D018KolkataArjunMedium247802154.692542379923799123801
19D019HyderabadRaghav 268512304.4905623801...
20D020BangaloreMohan 186201754.6924423803...
21D021ChennaiSuresh 227212054.3874923805...
22D019HyderabadRaghav 268512304.4905623801...
23D020BangaloreMohan 186201754.6924423803...
24D021ChennaiSuresh 227212054.3874923805...

Why PROC APPEND Is Used Instead Of SET

Problem:

·  Creates a new dataset

·  Requires extra storage


5. Best Practice In SAS Projects

Before merging, always check dataset structure:

proc contents data=delivery_clean;

run;

proc print data=delivery_clean;

run;

OUTPUT:

The CONTENTS Procedure

Data Set NameWORK.DELIVERY_CLEANObservations21
Member TypeDATAVariables15
EngineV9Indexes0
Created03/06/2026 16:39:51Observation Length160
Last Modified03/06/2026 16:41:30Deleted Observations0
Protection CompressedNO
Data Set Type SortedNO
Label   
Data RepresentationSOLARIS_X86_64, LINUX_X86_64, ALPHA_TRU64, LINUX_IA64  
Encodingutf-8 Unicode (UTF-8)  
Engine/Host Dependent Information
Data Set Page Size131072
Number of Data Set Pages1
First Data Page1
Max Obs per Page818
Obs in First Data Page21
Number of Data Set Repairs0
Filename/saswork/SAS_workD19200007C04_odaws01-apse1-2.oda.sas.com/SAS_workF85F00007C04_odaws01-apse1-2.oda.sas.com/delivery_clean.sas7bdat
Release Created9.0401M8
Host CreatedLinux
Inode Number201344327
Access Permissionrw-r--r--
Owner Nameu63247146
File Size256KB
File Size (bytes)262144
Alphabetic List of Variables and Attributes
#VariableTypeLen
2CityChar20
12Delivery_DateNum8
13Delivery_Date2Num8
1Delivery_IDChar6
6Delivery_TimeNum8
5Distance_kmNum8
3Driver_NameChar25
7Failed_AttemptsNum8
11FeesNum8
8Fuel_CostNum8
14Month_GapNum8
15Next_ServiceNum8
10PercentageNum8
9Service_RatingNum8
4UtilizationChar20
ObsDelivery_IDCityDriver_NameUtilizationDistance_kmDelivery_TimeFailed_AttemptsFuel_CostService_RatingPercentageFeesDelivery_DateDelivery_Date2Month_GapNext_Service
1D001HyderabadRameshLow124501504.590402375323753023773
2D002BangaloreSureshMedium206012004.288502375623756023773
3D003ChennaiMaheshLow106001804.892452375923759023773
4D004MumbaiAnilLow155521203.570352376123761023773
5D005DelhiRajuMedium257002205.095552376623766023773
6D006KolkataKumarMedium186511904.185422376923769023773
7D007HyderabadVenkatMedium228032503.860602377423774123801
8D008BangaloreRaviLow145001604.489412377723777123801
9D009ChennaiGaneshHigh309023003.975702377923779123801
10D010MumbaiSandeepLow114001204.791382378223782123801
11D011DelhiKarthikMedium175811704.387432378423784123801
12D012KolkataAjayMedium165201654.690402378623786123801
13D013HyderabadManojMedium196801954.286482378823788123801
14D014BangaloreSrikanthMedium237512104.184522379023790123801
15D015ChennaiPraveenHigh278222403.772582379223792123801
16D016MumbaiRajeshLow134801554.593392379423794123801
17D017DelhiDeepakMedium217012054.083512379723797123801
18D018KolkataArjunMedium247802154.692542379923799123801
19D019HyderabadRaghav 268512304.4905623801...
20D020BangaloreMohan 186201754.6924423803...
21D021ChennaiSuresh 227212054.3874923805...

proc contents data=driver_data;

run;

proc print data=driver_data;

run;

OUTPUT:

The CONTENTS Procedure

Data Set NameWORK.DRIVER_DATAObservations18
Member TypeDATAVariables4
EngineV9Indexes0
Created03/06/2026 16:42:31Observation Length56
Last Modified03/06/2026 16:42:31Deleted Observations0
Protection CompressedNO
Data Set Type SortedNO
Label   
Data RepresentationSOLARIS_X86_64, LINUX_X86_64, ALPHA_TRU64, LINUX_IA64  
Encodingutf-8 Unicode (UTF-8)  
Engine/Host Dependent Information
Data Set Page Size131072
Number of Data Set Pages1
First Data Page1
Max Obs per Page2334
Obs in First Data Page18
Number of Data Set Repairs0
Filename/saswork/SAS_workD19200007C04_odaws01-apse1-2.oda.sas.com/SAS_workF85F00007C04_odaws01-apse1-2.oda.sas.com/driver_data.sas7bdat
Release Created9.0401M8
Host CreatedLinux
Inode Number201344328
Access Permissionrw-r--r--
Owner Nameu63247146
File Size256KB
File Size (bytes)262144
Alphabetic List of Variables and Attributes
#VariableTypeLen
1Delivery_IDChar6
2Driver_NameChar25
3Driver_RatingNum8
4Vehicle_TypeChar15
ObsDelivery_IDDriver_NameDriver_RatingVehicle_Type
1D001Ramesh4.5Bike
2D002Suresh4.2Van
3D003Mahesh4.6Bike
4D004Anil3.8Truck
5D005Raju4.7Bike
6D006Kumar4.3Van
7D007Venkat3.9Bike
8D008Ravi4.4Bike
9D009Ganesh4.0Van
10D010Sandeep4.6Bike
11D011Karthik4.3Van
12D012Ajay4.5Bike
13D013Manoj4.2Bike
14D014Srikanth4.1Truck
15D015Praveen3.7Van
16D016Rajesh4.6Bike
17D017Deepak4.0Bike
18D018Arjun4.5Van

·  Variable types match

·  Variable lengths match

·  Key variables exist

6. MERGE

Step 1 — Create Driver Dataset

data driver_data;

length Delivery_ID $6 Driver_Name $25 Driver_Rating 8 Vehicle_Type $15;

input Delivery_ID $ Driver_Name $ Driver_Rating Vehicle_Type $;

datalines;

D001 Ramesh 4.5 Bike

D002 Suresh 4.2 Van

D003 Mahesh 4.6 Bike

D004 Anil 3.8 Truck

D005 Raju 4.7 Bike

D006 Kumar 4.3 Van

D007 Venkat 3.9 Bike

D008 Ravi 4.4 Bike

D009 Ganesh 4.0 Van

D010 Sandeep 4.6 Bike

D011 Karthik 4.3 Van

D012 Ajay 4.5 Bike

D013 Manoj 4.2 Bike

D014 Srikanth 4.1 Truck

D015 Praveen 3.7 Van

D016 Rajesh 4.6 Bike

D017 Deepak 4.0 Bike

D018 Arjun 4.5 Van

;

run;

proc print data=driver_data;

run;

OUTPUT:

ObsDelivery_IDDriver_NameDriver_RatingVehicle_Type
1D001Ramesh4.5Bike
2D002Suresh4.2Van
3D003Mahesh4.6Bike
4D004Anil3.8Truck
5D005Raju4.7Bike
6D006Kumar4.3Van
7D007Venkat3.9Bike
8D008Ravi4.4Bike
9D009Ganesh4.0Van
10D010Sandeep4.6Bike
11D011Karthik4.3Van
12D012Ajay4.5Bike
13D013Manoj4.2Bike
14D014Srikanth4.1Truck
15D015Praveen3.7Van
16D016Rajesh4.6Bike
17D017Deepak4.0Bike
18D018Arjun4.5Van

Explanation Of The Code

DATA driver_data

Creates a new dataset that stores driver-level information.

This dataset will later be merged with delivery data.

LENGTH Statement

length Delivery_ID $6 Driver_Name $25 Vehicle_Type $15;

 

Driver_Name $25

means SAS can store up to 25 characters.

·  Defines character variable sizes

·  Prevents variable truncation

·  Ensures consistent structure during merge

INPUT Statement

Defines how SAS should read the variables.

Variables read:

Delivery_ID

Driver_Name

Driver_Rating

Vehicle_Type

Each row represents driver information linked to a delivery.

Example row:

D001 Ramesh 4.5 Bike

Variable

Value

Delivery_ID

D001

Driver_Name

Ramesh

Driver_Rating

4.5

Vehicle_Type

Bike

Step 2 — Sort Datasets

proc sort data=delivery_clean;

by Delivery_ID;

run;

proc print data=delivery_clean;

run;

OUTPUT:

ObsDelivery_IDCityDriver_NameUtilizationDistance_kmDelivery_TimeFailed_AttemptsFuel_CostService_RatingPercentageFeesDelivery_DateDelivery_Date2Month_GapNext_Service
1D001HyderabadRameshLow124501504.590402375323753023773
2D002BangaloreSureshMedium206012004.288502375623756023773
3D003ChennaiMaheshLow106001804.892452375923759023773
4D004MumbaiAnilLow155521203.570352376123761023773
5D005DelhiRajuMedium257002205.095552376623766023773
6D006KolkataKumarMedium186511904.185422376923769023773
7D007HyderabadVenkatMedium228032503.860602377423774123801
8D008BangaloreRaviLow145001604.489412377723777123801
9D009ChennaiGaneshHigh309023003.975702377923779123801
10D010MumbaiSandeepLow114001204.791382378223782123801
11D011DelhiKarthikMedium175811704.387432378423784123801
12D012KolkataAjayMedium165201654.690402378623786123801
13D013HyderabadManojMedium196801954.286482378823788123801
14D014BangaloreSrikanthMedium237512104.184522379023790123801
15D015ChennaiPraveenHigh278222403.772582379223792123801
16D016MumbaiRajeshLow134801554.593392379423794123801
17D017DelhiDeepakMedium217012054.083512379723797123801
18D018KolkataArjunMedium247802154.692542379923799123801
19D019HyderabadRaghav 268512304.4905623801...
20D020BangaloreMohan 186201754.6924423803...
21D021ChennaiSuresh 227212054.3874923805...

proc sort data=driver_data;

by Delivery_ID;

run;

proc print data=driver_data;

run;

OUTPUT:

ObsDelivery_IDDriver_NameDriver_RatingVehicle_Type
1D001Ramesh4.5Bike
2D002Suresh4.2Van
3D003Mahesh4.6Bike
4D004Anil3.8Truck
5D005Raju4.7Bike
6D006Kumar4.3Van
7D007Venkat3.9Bike
8D008Ravi4.4Bike
9D009Ganesh4.0Van
10D010Sandeep4.6Bike
11D011Karthik4.3Van
12D012Ajay4.5Bike
13D013Manoj4.2Bike
14D014Srikanth4.1Truck
15D015Praveen3.7Van
16D016Rajesh4.6Bike
17D017Deepak4.0Bike
18D018Arjun4.5Van

Why Sorting Is Required

The MERGE statement performs sequential matching.

Without sorting, SAS may show:

ERROR: BY variables are not properly sorted.

Step 3 — Perform MERGE

data delivery_final;

merge delivery_clean driver_data;

by Delivery_ID;

run;

proc print data=delivery_final;

run;

OUTPUT:

ObsDelivery_IDCityDriver_NameUtilizationDistance_kmDelivery_TimeFailed_AttemptsFuel_CostService_RatingPercentageFeesDelivery_DateDelivery_Date2Month_GapNext_ServiceDriver_RatingVehicle_Type
1D001HyderabadRameshLow124501504.5904023753237530237734.5Bike
2D002BangaloreSureshMedium206012004.2885023756237560237734.2Van
3D003ChennaiMaheshLow106001804.8924523759237590237734.6Bike
4D004MumbaiAnilLow155521203.5703523761237610237733.8Truck
5D005DelhiRajuMedium257002205.0955523766237660237734.7Bike
6D006KolkataKumarMedium186511904.1854223769237690237734.3Van
7D007HyderabadVenkatMedium228032503.8606023774237741238013.9Bike
8D008BangaloreRaviLow145001604.4894123777237771238014.4Bike
9D009ChennaiGaneshHigh309023003.9757023779237791238014.0Van
10D010MumbaiSandeepLow114001204.7913823782237821238014.6Bike
11D011DelhiKarthikMedium175811704.3874323784237841238014.3Van
12D012KolkataAjayMedium165201654.6904023786237861238014.5Bike
13D013HyderabadManojMedium196801954.2864823788237881238014.2Bike
14D014BangaloreSrikanthMedium237512104.1845223790237901238014.1Truck
15D015ChennaiPraveenHigh278222403.7725823792237921238013.7Van
16D016MumbaiRajeshLow134801554.5933923794237941238014.6Bike
17D017DelhiDeepakMedium217012054.0835123797237971238014.0Bike
18D018KolkataArjunMedium247802154.6925423799237991238014.5Van
19D019HyderabadRaghav 268512304.4905623801.... 
20D020BangaloreMohan 186201754.6924423803.... 
21D021ChennaiSuresh 227212054.3874923805.... 

Explanation Of MERGE Code

DATA delivery_final

Creates a new dataset containing combined information.

MERGE Statement

merge delivery_clean driver_data;

Combines both datasets row-by-row using a common key.

BY Delivery_ID

Specifies the matching variable.

SAS matches records like this:

delivery_clean

driver_data

Delivery_ID

Delivery_ID

 

Example:

D001 → delivery record

D001 → driver record

These rows are two merged .

Example Result Dataset

Delivery_ID

City

Distance_km

Driver_Name

Vehicle_Type

D001

Hyderabad

12

Ramesh

Bike

D002

Bangalore

20

Suresh

Van

D003

Chennai

10

Mahesh

Bike


Now each delivery record contains driver information also.

Why MERGE Is Important In Real Projects

In real logistics systems, data comes from multiple systems.

Example datasets:

Dataset

Contains

Delivery dataset

Delivery metrics

Driver dataset

Driver details

Vehicle dataset

Vehicle information

Fuel dataset

Fuel consumption

MERGE helps combine all these datasets into one analytical dataset.

 

7. PROC TRANSPOSE

proc transpose data=delivery_clean out=delivery_transpose;

var Fuel_Cost Fees;

run;

proc print data=delivery_transpose;

run;

OUTPUT:

Obs_NAME_COL1COL2COL3COL4COL5COL6COL7COL8COL9COL10COL11COL12COL13COL14COL15COL16COL17COL18COL19COL20COL21
1Fuel_Cost150200180120220190250160300120170165195210240155205215230175205
2Fees405045355542604170384340485258395154564449

Converts rows to columns.

8. Fraud Detection Macro

%macro fraud_check;

data fraud_detection;

set delivery_clean;

if Failed_Attempts >2 and Service_Rating <4 then Fraud_Flag="YES";

else Fraud_Flag="NO";

run;

proc print data=fraud_detection;

run;

%mend;


%fraud_check;

OUTPUT:

ObsDelivery_IDCityDriver_NameUtilizationDistance_kmDelivery_TimeFailed_AttemptsFuel_CostService_RatingPercentageFeesDelivery_DateDelivery_Date2Month_GapNext_ServiceFraud_Flag
1D001HyderabadRameshLow124501504.590402375323753023773NO
2D002BangaloreSureshMedium206012004.288502375623756023773NO
3D003ChennaiMaheshLow106001804.892452375923759023773NO
4D004MumbaiAnilLow155521203.570352376123761023773NO
5D005DelhiRajuMedium257002205.095552376623766023773NO
6D006KolkataKumarMedium186511904.185422376923769023773NO
7D007HyderabadVenkatMedium228032503.860602377423774123801YES
8D008BangaloreRaviLow145001604.489412377723777123801NO
9D009ChennaiGaneshHigh309023003.975702377923779123801NO
10D010MumbaiSandeepLow114001204.791382378223782123801NO
11D011DelhiKarthikMedium175811704.387432378423784123801NO
12D012KolkataAjayMedium165201654.690402378623786123801NO
13D013HyderabadManojMedium196801954.286482378823788123801NO
14D014BangaloreSrikanthMedium237512104.184522379023790123801NO
15D015ChennaiPraveenHigh278222403.772582379223792123801NO
16D016MumbaiRajeshLow134801554.593392379423794123801NO
17D017DelhiDeepakMedium217012054.083512379723797123801NO
18D018KolkataArjunMedium247802154.692542379923799123801NO
19D019HyderabadRaghav 268512304.4905623801...NO
20D020BangaloreMohan 186201754.6924423803...NO
21D021ChennaiSuresh 227212054.3874923805...NO

9. PROC DATASETS DELETE

proc datasets library=work nolist;

delete delivery_raw;

quit;

LOG:

NOTE: Deleting WORK.DELIVERY_RAW (memtype=DATA).

Removes temporary datasets.

10. Business Insights

Using this dataset we can analyze:

·  Average delivery time per city

·  Failed delivery patterns

·  Fuel cost optimization

·  Driver performance

·  Fraud detection patterns

11. 20 Key Points About This Project

·  Create a simulated last-mile delivery dataset containing operational variables like Delivery_ID, City, Distance_km, Delivery_Time, Fuel_Cost, and Service_Rating.

·  Introduce intentional data quality errors such as negative distance, missing delivery time, invalid ratings, and inconsistent city names.

·  Use DATA step programming to read raw delivery data and construct the initial dataset.

·  Apply character cleaning functions such as STRIP, TRIM, UPCASE, LOWCASE, and PROPCASE to standardize city and driver names.

·  Use concatenation functions like CAT, CATX, and CATS to combine variables such as driver and city information for reporting.

·  Implement numeric functions like ABS, ROUND, and SUM to correct negative values and standardize numeric metrics.

·  Handle missing values using the COALESCE function to ensure data completeness for analysis.

·  Standardize date values using the MDY function to construct proper SAS date variables.

·  Calculate time intervals between dates using the INTCK function for delivery scheduling analysis.

·  Generate future or next scheduled dates using the INTNX function to simulate operational planning.

·  Use conditional logic in DATA step to classify delivery utilization levels such as Low, Medium, and High.

·  Detect potential operational anomalies like excessive failed delivery attempts using conditional rules.

·  Implement macro programming to create a reusable fraud detection logic for suspicious deliveries.

·  Use PROC APPEND to add new delivery records to the existing dataset without recreating the dataset.

·  Use MERGE statement to combine delivery information with driver or operational datasets.

·  Transform dataset structure using PROC TRANSPOSE to convert rows into columns for reporting.

·  Validate dataset structure using PROC CONTENTS to verify variables, types, and lengths.

·  Perform dataset management using PROC DATASETS DELETE to remove unnecessary intermediate datasets.

·  Prepare the cleaned and corrected dataset for logistics performance analysis and reporting.

·  Generate insights to help companies reduce delivery errors, detect fraud patterns, optimize fuel usage, and improve last-mile operational efficiency.

Conclusion

This project demonstrates how SAS programming can support logistics analytics in last-mile delivery operations. Through the creation of a simulated dataset with intentional errors, the project illustrates the importance of data validation and cleaning in operational environments.

Key SAS techniques used in this project include:

·  Data Step transformations

·  Character and numeric functions

·  Date functions such as MDY, INTCK, and INTNX

·  Dataset merging and appending

·  Macro-based fraud detection

·  Dataset management using PROC DATASETS

By applying these techniques, the raw dataset was successfully transformed into a clean and analysis-ready dataset.

The fraud detection macro further enables organizations to identify suspicious delivery patterns, improving operational transparency and preventing financial losses.

Overall, this project shows how SAS can be used not only for statistical analysis but also for real-world operational data management and fraud detection in logistics systems.

SAS INTERVIEW QUESTIONS

1.What Is The Difference Between MERGE And JOIN?

Answer:

Feature

MERGE

SQL JOIN

Location

DATA Step

PROC SQL

Sorting Required

Yes

No

Performance

Faster for large datasets

Flexible

Use Case

Sequential merging

Complex joins


2.What is the difference between SET and PROC APPEND?

Answer:

Feature

SET

PROC APPEND

Creates new dataset

Yes

No

Performance

Slower

Faster

Memory usage

Higher

Lower

Use case

Combining datasets

Adding records


3.What is POINT= option in SET statement?

Answer:

POINT= option in a SET statement is used to read a specific observation directly from a dataset instead of reading sequentially.

It allows random access to observations using the observation number.

Example:

data example;

set sales point=i;

run;

Here, point=i tells SAS to read the i-th observation from the dataset.

Key point:
POINT= reads observations randomly (direct access) rather than sequentially from top to bottom.

--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

About the Author:

SAS Learning Hub is a data analytics and SAS programming platform focused on clinical, financial, and real-world data analysis. The content is created by professionals with academic training in Pharmaceutics and hands-on experience in Base SAS, PROC SQL, Macros, SDTM, and ADaM, providing practical and industry-relevant SAS learning resources.


Disclaimer:

The datasets and analysis in this article are created for educational and demonstration purposes only. They do not represent DELIVERY OPERATIONS data.


Our Mission:

This blog provides industry-focused SAS programming tutorials and analytics projects covering finance, healthcare, and technology.


This project is suitable for:

·  Students learning SAS

·  Data analysts building portfolios

·  Professionals preparing for SAS interviews

·  Bloggers writing about analytics 

·  Clinical SAS Programmer

·  Research Data Analyst

·  Regulatory Data Validator

--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Follow Us On : 


 
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

--->Follow our blog for more SAS-based analytics projects and industry data models.

---> Support Us By Following Our Blog..

To deepen your understanding of SAS analytics, please refer to our other data science and industry-focused projects listed below:


--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

About Us | Contact Privacy Policy



Comments

Popular posts from this blog

453.Global AI Trends Unlocked Through SCAN and SUBSTR Precision in SAS

441.Fixing Negative Data Errors Like A Pro Using SAS ABS Function

444.Turning Messy Raw Data Into Decision-Ready Gold With SAS Error Fixing Techniques