421.Can We Detect Errors, Prevent Fraud, And Optimize Last-Mile Delivery Operations Using Advanced SAS Programming?
Smart SAS Intelligence For Last-Mile Delivery Error Detection And Fraud Prevention
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
HERE IN THIS PROJECT WE USED THESE SAS STATEMENTS —DATA STEP | LENGTH | INPUT | DATALINES | SET | MERGE | IF-THEN | FORMAT | INFORMAT | PROC SORT | PROC PRINT | PROC CONTENTS | PROC TRANSPOSE | PROC APPEND | PROC DATASETS DELETE | MACRO / %MACRO / %MEND
Table Of Contents
- Introduction
- Business Context
- Project Objectives
- Dataset Structure
- Raw Dataset With Intentional
Errors
- Understanding The Errors In
The Dataset
- Corrected SAS Code For Clean
Dataset
- Data Preparation Using SAS
Functions
- Numeric Functions Used
- Character Functions Used
- Date Functions Using MDY,
INTCK, INTNX
- Dataset Operations Using
SET, MERGE, APPEND
- Data Transformation Using
PROC TRANSPOSE
- Fraud Detection Macro Logic
- Utilization Classification
Logic
- Data Cleaning Using PROC DATASETS
DELETE
- Analytical Insights From The
Dataset
- Business Benefits
- 20 Key Points About This Project
- Conclusion
Introduction
Last-mile
delivery is the final stage of the logistics process where goods move from a
distribution center to the customer's location. This phase is often considered
the most complex and expensive part of supply chain logistics. Companies such
as e-commerce platforms, courier services, and food delivery providers heavily
depend on efficient last-mile delivery operations to ensure customer
satisfaction.
However,
managing last-mile deliveries involves multiple operational challenges. These
include route optimization, delivery delays, failed delivery attempts, fuel
cost fluctuations, and sometimes fraudulent reporting by drivers or logistics
partners. Errors in operational data can lead to incorrect decision making and
financial losses.
In modern
analytics environments, SAS programming is widely used to manage
operational datasets, detect anomalies, and optimize logistics performance. By
combining data steps, SQL queries, macros, and advanced functions,
organizations can identify inefficiencies, detect suspicious behavior, and
improve service quality.
This
project demonstrates how SAS programming can be used to:
- Create a last-mile delivery
dataset
- Introduce intentional data
errors
- Identify and correct those
errors
- Detect potential delivery
fraud
- Calculate operational
metrics
- Perform dataset
transformations and reporting
The
project uses multiple SAS techniques including:
- Data Step Programming
- Character Functions
- Numeric Functions
- Date Functions
- PROC APPEND
- PROC TRANSPOSE
- Dataset MERGE
- Fraud Detection Macros
The final
result is a clean dataset that can be used for performance analysis and fraud
monitoring.
Business Context
Imagine a
logistics company operating across several Indian cities such as Hyderabad,
Bangalore, Chennai, Mumbai, and Delhi. The company manages thousands of daily
deliveries and records operational metrics for each delivery.
Each
delivery record contains:
- Delivery ID
- City of delivery
- Distance traveled
- Delivery time
- Number of failed attempts
- Fuel cost
- Service rating
- Operational utilization
- Percentage performance score
- Delivery service fees
- Delivery date
However,
the company notices inconsistencies in their operational data.
Examples
include:
- Incorrect city name
formatting
- Missing delivery time
- Negative fuel cost values
- Invalid ratings
- Fraudulent reporting of
failed deliveries
To solve
these issues, the company assigns a data analytics team to develop a SAS
program that:
- Creates a delivery dataset
- Identifies errors
- Corrects the data
- Detects suspicious patterns
- Generates insights for
operations managers
Project Objectives
The main
objectives of this project are:
- Create a simulated dataset
representing last-mile delivery operations
- Introduce intentional errors
in the raw dataset
- Identify and explain those
errors
- Correct the dataset using
SAS programming
- Apply character and numeric
functions for cleaning data
- Use date functions for
delivery scheduling analysis
- Implement macros for fraud
detection
- Perform dataset merging and
transformation
- Create utilization
classifications
- Generate an optimized final
dataset for analysis
Dataset Structure
The
dataset includes the following variables:
|
Variable |
Description |
|
Delivery_ID |
Unique
identifier for delivery |
|
City |
City
where delivery occurred |
|
Distance_KM |
Distance
traveled |
|
Delivery_Time |
Time
taken for delivery |
|
Failed_Attempts |
Number
of failed attempts |
|
Fuel_Cost |
Fuel
expenditure |
|
Service_Rating |
Customer
rating |
|
Percentage |
Performance
percentage |
|
Fees |
Delivery
charges |
|
Utilization |
Operational
classification |
|
Delivery_Date |
Date of
delivery |
|
Driver_Name |
Name of
delivery driver |
1. Raw Dataset With Intentional Errors
data delivery_raw;
length Delivery_ID $6 City $20 Driver_Name $25 Utilization $20;
input Delivery_ID $ City $ Distance_km Delivery_Time Failed_Attempts
Fuel_Cost Service_Rating Percentage Fees Driver_Name $ Delivery_Date :ddmmyy10.;
datalines;
D001 hyderabad 12 45 0 150 4.5 90 40 ramesh 12/01/2025
D002 BANGALORE 20 . 1 200 4.2 88 50 SURESH 15/01/2025
D003 chennai -10 60 0 180 4.8 92 45 mahesh 18/01/2025
D004 mumbai 15 55 2 -120 3.5 70 35 anil 20/01/2025
D005 DELHI 25 70 0 220 5.5 95 55 raju 25/01/2025
D006 kolkata 18 65 1 190 4.1 85 42 kumar 28/01/2025
D007 Hyderabad 22 80 3 250 3.8 60 60 venkat 02/02/2025
D008 bangalore 14 50 0 160 4.4 89 41 ravi 05/02/2025
D009 chennai 30 90 2 300 3.9 75 70 ganesh 07/02/2025
D010 mumbai 11 40 0 120 4.7 91 38 sandeep 10/02/2025
D011 delhi 17 58 1 170 4.3 87 43 karthik 12/02/2025
D012 kolkata 16 52 0 165 4.6 90 40 ajay 14/02/2025
D013 hyderabad 19 68 0 195 4.2 86 48 manoj 16/02/2025
D014 bangalore 23 75 1 210 4.1 84 52 srikanth 18/02/2025
D015 chennai 27 82 2 240 3.7 72 58 praveen 20/02/2025
D016 mumbai 13 48 0 155 4.5 93 39 rajesh 22/02/2025
D017 delhi 21 70 1 205 4.0 83 51 deepak 25/02/2025
D018 kolkata 24 78 0 215 4.6 92 54 arjun 27/02/2025
;
run;
proc print data=delivery_raw;
run;
OUTPUT:
| Obs | Delivery_ID | City | Driver_Name | Utilization | Distance_km | Delivery_Time | Failed_Attempts | Fuel_Cost | Service_Rating | Percentage | Fees | Delivery_Date |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | D001 | hyderabad | ramesh | 12 | 45 | 0 | 150 | 4.5 | 90 | 40 | 23753 | |
| 2 | D002 | BANGALORE | SURESH | 20 | . | 1 | 200 | 4.2 | 88 | 50 | 23756 | |
| 3 | D003 | chennai | mahesh | -10 | 60 | 0 | 180 | 4.8 | 92 | 45 | 23759 | |
| 4 | D004 | mumbai | anil | 15 | 55 | 2 | -120 | 3.5 | 70 | 35 | 23761 | |
| 5 | D005 | DELHI | raju | 25 | 70 | 0 | 220 | 5.5 | 95 | 55 | 23766 | |
| 6 | D006 | kolkata | kumar | 18 | 65 | 1 | 190 | 4.1 | 85 | 42 | 23769 | |
| 7 | D007 | Hyderabad | venkat | 22 | 80 | 3 | 250 | 3.8 | 60 | 60 | 23774 | |
| 8 | D008 | bangalore | ravi | 14 | 50 | 0 | 160 | 4.4 | 89 | 41 | 23777 | |
| 9 | D009 | chennai | ganesh | 30 | 90 | 2 | 300 | 3.9 | 75 | 70 | 23779 | |
| 10 | D010 | mumbai | sandeep | 11 | 40 | 0 | 120 | 4.7 | 91 | 38 | 23782 | |
| 11 | D011 | delhi | karthik | 17 | 58 | 1 | 170 | 4.3 | 87 | 43 | 23784 | |
| 12 | D012 | kolkata | ajay | 16 | 52 | 0 | 165 | 4.6 | 90 | 40 | 23786 | |
| 13 | D013 | hyderabad | manoj | 19 | 68 | 0 | 195 | 4.2 | 86 | 48 | 23788 | |
| 14 | D014 | bangalore | srikanth | 23 | 75 | 1 | 210 | 4.1 | 84 | 52 | 23790 | |
| 15 | D015 | chennai | praveen | 27 | 82 | 2 | 240 | 3.7 | 72 | 58 | 23792 | |
| 16 | D016 | mumbai | rajesh | 13 | 48 | 0 | 155 | 4.5 | 93 | 39 | 23794 | |
| 17 | D017 | delhi | deepak | 21 | 70 | 1 | 205 | 4.0 | 83 | 51 | 23797 | |
| 18 | D018 | kolkata | arjun | 24 | 78 | 0 | 215 | 4.6 | 92 | 54 | 23799 |
Understanding The Errors In The Dataset
Several intentional errors exist in the
dataset:
1.Negative Distance
D003 chennai -10
Distance cannot be negative.
2.Missing Delivery Time
D002 BANGALORE 20 .
Delivery time is missing.
3.Negative Fuel Cost
D004 mumbai fuel_cost = -120
Fuel cost cannot be negative.
4.Invalid Service Rating
D005 rating = 5.5
Rating must be between 1 and 5.
5.Inconsistent City Names
Example:
hyderabadHYDERABADHyderabad
Needs standard formatting.
2. Corrected SAS Code
data delivery_clean;
set delivery_raw;
City = propcase(strip(City));
Driver_Name = propcase(strip(Driver_Name));
Distance_km = abs(Distance_km);
Fuel_Cost = abs(Fuel_Cost);
if Service_Rating > 5 then Service_Rating = 5;
if Delivery_Time = . then Delivery_Time = round(Distance_km*3);
Delivery_Date2 = mdy(month(Delivery_Date),day(Delivery_Date),year(Delivery_Date));
Month_Gap = intck('month','01JAN2025'd,Delivery_Date);
Next_Service = intnx('month',Delivery_Date,1);
if Distance_km <=15 then Utilization="Low";
else if Distance_km <=25 then Utilization="Medium";
else Utilization="High";
run;
proc print data=delivery_clean;
run;
OUTPUT:
| Obs | Delivery_ID | City | Driver_Name | Utilization | Distance_km | Delivery_Time | Failed_Attempts | Fuel_Cost | Service_Rating | Percentage | Fees | Delivery_Date | Delivery_Date2 | Month_Gap | Next_Service |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | D001 | Hyderabad | Ramesh | Low | 12 | 45 | 0 | 150 | 4.5 | 90 | 40 | 23753 | 23753 | 0 | 23773 |
| 2 | D002 | Bangalore | Suresh | Medium | 20 | 60 | 1 | 200 | 4.2 | 88 | 50 | 23756 | 23756 | 0 | 23773 |
| 3 | D003 | Chennai | Mahesh | Low | 10 | 60 | 0 | 180 | 4.8 | 92 | 45 | 23759 | 23759 | 0 | 23773 |
| 4 | D004 | Mumbai | Anil | Low | 15 | 55 | 2 | 120 | 3.5 | 70 | 35 | 23761 | 23761 | 0 | 23773 |
| 5 | D005 | Delhi | Raju | Medium | 25 | 70 | 0 | 220 | 5.0 | 95 | 55 | 23766 | 23766 | 0 | 23773 |
| 6 | D006 | Kolkata | Kumar | Medium | 18 | 65 | 1 | 190 | 4.1 | 85 | 42 | 23769 | 23769 | 0 | 23773 |
| 7 | D007 | Hyderabad | Venkat | Medium | 22 | 80 | 3 | 250 | 3.8 | 60 | 60 | 23774 | 23774 | 1 | 23801 |
| 8 | D008 | Bangalore | Ravi | Low | 14 | 50 | 0 | 160 | 4.4 | 89 | 41 | 23777 | 23777 | 1 | 23801 |
| 9 | D009 | Chennai | Ganesh | High | 30 | 90 | 2 | 300 | 3.9 | 75 | 70 | 23779 | 23779 | 1 | 23801 |
| 10 | D010 | Mumbai | Sandeep | Low | 11 | 40 | 0 | 120 | 4.7 | 91 | 38 | 23782 | 23782 | 1 | 23801 |
| 11 | D011 | Delhi | Karthik | Medium | 17 | 58 | 1 | 170 | 4.3 | 87 | 43 | 23784 | 23784 | 1 | 23801 |
| 12 | D012 | Kolkata | Ajay | Medium | 16 | 52 | 0 | 165 | 4.6 | 90 | 40 | 23786 | 23786 | 1 | 23801 |
| 13 | D013 | Hyderabad | Manoj | Medium | 19 | 68 | 0 | 195 | 4.2 | 86 | 48 | 23788 | 23788 | 1 | 23801 |
| 14 | D014 | Bangalore | Srikanth | Medium | 23 | 75 | 1 | 210 | 4.1 | 84 | 52 | 23790 | 23790 | 1 | 23801 |
| 15 | D015 | Chennai | Praveen | High | 27 | 82 | 2 | 240 | 3.7 | 72 | 58 | 23792 | 23792 | 1 | 23801 |
| 16 | D016 | Mumbai | Rajesh | Low | 13 | 48 | 0 | 155 | 4.5 | 93 | 39 | 23794 | 23794 | 1 | 23801 |
| 17 | D017 | Delhi | Deepak | Medium | 21 | 70 | 1 | 205 | 4.0 | 83 | 51 | 23797 | 23797 | 1 | 23801 |
| 18 | D018 | Kolkata | Arjun | Medium | 24 | 78 | 0 | 215 | 4.6 | 92 | 54 | 23799 | 23799 | 1 | 23801 |
Character Functions
Functions used include:
STRIP
Removes leading and trailing spaces.
TRIM
Removes trailing spaces.
CAT
Concatenates values.
CATX
Concatenates values with delimiter.
PROPCASE
Converts text to proper case.
UPCASE
Converts text to uppercase.
LOWCASE
Converts text to lowercase.
COALESCE
Handles missing values.
Numeric Functions
Examples used:
ABS( )ROUND( )SUM( )MEAN( )
These help ensure numerical accuracy and
standardization.
Date Functions
MDY
Creates SAS date from month, day, year.
INTCK
Counts intervals between dates.
INTNX
Moves date forward or backward.
3. Best Practice In Real Projects
Before using PROC APPEND, analysts usually check dataset structures using:
proc contents data=delivery_clean;
run;
proc print data=delivery_clean;
run;
OUTPUT:
The CONTENTS Procedure
| Data Set Name | WORK.DELIVERY_CLEAN | Observations | 18 |
|---|---|---|---|
| Member Type | DATA | Variables | 15 |
| Engine | V9 | Indexes | 0 |
| Created | 03/06/2026 16:39:51 | Observation Length | 160 |
| Last Modified | 03/06/2026 16:39:51 | Deleted Observations | 0 |
| Protection | Compressed | NO | |
| Data Set Type | Sorted | NO | |
| Label | |||
| Data Representation | SOLARIS_X86_64, LINUX_X86_64, ALPHA_TRU64, LINUX_IA64 | ||
| Encoding | utf-8 Unicode (UTF-8) |
| Engine/Host Dependent Information | |
|---|---|
| Data Set Page Size | 131072 |
| Number of Data Set Pages | 1 |
| First Data Page | 1 |
| Max Obs per Page | 818 |
| Obs in First Data Page | 18 |
| Number of Data Set Repairs | 0 |
| Filename | /saswork/SAS_workD19200007C04_odaws01-apse1-2.oda.sas.com/SAS_workF85F00007C04_odaws01-apse1-2.oda.sas.com/delivery_clean.sas7bdat |
| Release Created | 9.0401M8 |
| Host Created | Linux |
| Inode Number | 201344327 |
| Access Permission | rw-r--r-- |
| Owner Name | u63247146 |
| File Size | 256KB |
| File Size (bytes) | 262144 |
| Alphabetic List of Variables and Attributes | |||
|---|---|---|---|
| # | Variable | Type | Len |
| 2 | City | Char | 20 |
| 12 | Delivery_Date | Num | 8 |
| 13 | Delivery_Date2 | Num | 8 |
| 1 | Delivery_ID | Char | 6 |
| 6 | Delivery_Time | Num | 8 |
| 5 | Distance_km | Num | 8 |
| 3 | Driver_Name | Char | 25 |
| 7 | Failed_Attempts | Num | 8 |
| 11 | Fees | Num | 8 |
| 8 | Fuel_Cost | Num | 8 |
| 14 | Month_Gap | Num | 8 |
| 15 | Next_Service | Num | 8 |
| 10 | Percentage | Num | 8 |
| 9 | Service_Rating | Num | 8 |
| 4 | Utilization | Char | 20 |
| Obs | Delivery_ID | City | Driver_Name | Utilization | Distance_km | Delivery_Time | Failed_Attempts | Fuel_Cost | Service_Rating | Percentage | Fees | Delivery_Date | Delivery_Date2 | Month_Gap | Next_Service |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | D001 | Hyderabad | Ramesh | Low | 12 | 45 | 0 | 150 | 4.5 | 90 | 40 | 23753 | 23753 | 0 | 23773 |
| 2 | D002 | Bangalore | Suresh | Medium | 20 | 60 | 1 | 200 | 4.2 | 88 | 50 | 23756 | 23756 | 0 | 23773 |
| 3 | D003 | Chennai | Mahesh | Low | 10 | 60 | 0 | 180 | 4.8 | 92 | 45 | 23759 | 23759 | 0 | 23773 |
| 4 | D004 | Mumbai | Anil | Low | 15 | 55 | 2 | 120 | 3.5 | 70 | 35 | 23761 | 23761 | 0 | 23773 |
| 5 | D005 | Delhi | Raju | Medium | 25 | 70 | 0 | 220 | 5.0 | 95 | 55 | 23766 | 23766 | 0 | 23773 |
| 6 | D006 | Kolkata | Kumar | Medium | 18 | 65 | 1 | 190 | 4.1 | 85 | 42 | 23769 | 23769 | 0 | 23773 |
| 7 | D007 | Hyderabad | Venkat | Medium | 22 | 80 | 3 | 250 | 3.8 | 60 | 60 | 23774 | 23774 | 1 | 23801 |
| 8 | D008 | Bangalore | Ravi | Low | 14 | 50 | 0 | 160 | 4.4 | 89 | 41 | 23777 | 23777 | 1 | 23801 |
| 9 | D009 | Chennai | Ganesh | High | 30 | 90 | 2 | 300 | 3.9 | 75 | 70 | 23779 | 23779 | 1 | 23801 |
| 10 | D010 | Mumbai | Sandeep | Low | 11 | 40 | 0 | 120 | 4.7 | 91 | 38 | 23782 | 23782 | 1 | 23801 |
| 11 | D011 | Delhi | Karthik | Medium | 17 | 58 | 1 | 170 | 4.3 | 87 | 43 | 23784 | 23784 | 1 | 23801 |
| 12 | D012 | Kolkata | Ajay | Medium | 16 | 52 | 0 | 165 | 4.6 | 90 | 40 | 23786 | 23786 | 1 | 23801 |
| 13 | D013 | Hyderabad | Manoj | Medium | 19 | 68 | 0 | 195 | 4.2 | 86 | 48 | 23788 | 23788 | 1 | 23801 |
| 14 | D014 | Bangalore | Srikanth | Medium | 23 | 75 | 1 | 210 | 4.1 | 84 | 52 | 23790 | 23790 | 1 | 23801 |
| 15 | D015 | Chennai | Praveen | High | 27 | 82 | 2 | 240 | 3.7 | 72 | 58 | 23792 | 23792 | 1 | 23801 |
| 16 | D016 | Mumbai | Rajesh | Low | 13 | 48 | 0 | 155 | 4.5 | 93 | 39 | 23794 | 23794 | 1 | 23801 |
| 17 | D017 | Delhi | Deepak | Medium | 21 | 70 | 1 | 205 | 4.0 | 83 | 51 | 23797 | 23797 | 1 | 23801 |
| 18 | D018 | Kolkata | Arjun | Medium | 24 | 78 | 0 | 215 | 4.6 | 92 | 54 | 23799 | 23799 | 1 | 23801 |
proc contents data=delivery_new;
run;
proc print data=delivery_new;
run;
OUTPUT:
The CONTENTS Procedure
| Data Set Name | WORK.DELIVERY_NEW | Observations | 3 |
|---|---|---|---|
| Member Type | DATA | Variables | 11 |
| Engine | V9 | Indexes | 0 |
| Created | 03/06/2026 16:40:50 | Observation Length | 120 |
| Last Modified | 03/06/2026 16:40:50 | Deleted Observations | 0 |
| Protection | Compressed | NO | |
| Data Set Type | Sorted | NO | |
| Label | |||
| Data Representation | SOLARIS_X86_64, LINUX_X86_64, ALPHA_TRU64, LINUX_IA64 | ||
| Encoding | utf-8 Unicode (UTF-8) |
| Engine/Host Dependent Information | |
|---|---|
| Data Set Page Size | 131072 |
| Number of Data Set Pages | 1 |
| First Data Page | 1 |
| Max Obs per Page | 1090 |
| Obs in First Data Page | 3 |
| Number of Data Set Repairs | 0 |
| Filename | /saswork/SAS_workD19200007C04_odaws01-apse1-2.oda.sas.com/SAS_workF85F00007C04_odaws01-apse1-2.oda.sas.com/delivery_new.sas7bdat |
| Release Created | 9.0401M8 |
| Host Created | Linux |
| Inode Number | 201344326 |
| Access Permission | rw-r--r-- |
| Owner Name | u63247146 |
| File Size | 256KB |
| File Size (bytes) | 262144 |
| Alphabetic List of Variables and Attributes | |||
|---|---|---|---|
| # | Variable | Type | Len |
| 2 | City | Char | 20 |
| 11 | Delivery_Date | Num | 8 |
| 1 | Delivery_ID | Char | 6 |
| 5 | Delivery_Time | Num | 8 |
| 4 | Distance_km | Num | 8 |
| 3 | Driver_Name | Char | 25 |
| 6 | Failed_Attempts | Num | 8 |
| 10 | Fees | Num | 8 |
| 7 | Fuel_Cost | Num | 8 |
| 9 | Percentage | Num | 8 |
| 8 | Service_Rating | Num | 8 |
| Obs | Delivery_ID | City | Driver_Name | Distance_km | Delivery_Time | Failed_Attempts | Fuel_Cost | Service_Rating | Percentage | Fees | Delivery_Date |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | D019 | Hyderabad | Raghav | 26 | 85 | 1 | 230 | 4.4 | 90 | 56 | 23801 |
| 2 | D020 | Bangalore | Mohan | 18 | 62 | 0 | 175 | 4.6 | 92 | 44 | 23803 |
| 3 | D021 | Chennai | Suresh | 22 | 72 | 1 | 205 | 4.3 | 87 | 49 | 23805 |
·
Variables match
·
Data types match
·
Lengths match
4. PROC APPEND
Create New Dataset
data delivery_new;
length Delivery_ID $6 City $20 Driver_Name $25;
input Delivery_ID $ City $ Distance_km Delivery_Time Failed_Attempts Fuel_Cost
Service_Rating Percentage Fees Driver_Name $ Delivery_Date :ddmmyy10.;
datalines;
D019 Hyderabad 26 85 1 230 4.4 90 56 Raghav 01/03/2025
D020 Bangalore 18 62 0 175 4.6 92 44 Mohan 03/03/2025
D021 Chennai 22 72 1 205 4.3 87 49 Suresh 05/03/2025
;
run;
proc print data=delivery_new;
run;
OUTPUT:
| Obs | Delivery_ID | City | Driver_Name | Distance_km | Delivery_Time | Failed_Attempts | Fuel_Cost | Service_Rating | Percentage | Fees | Delivery_Date |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | D019 | Hyderabad | Raghav | 26 | 85 | 1 | 230 | 4.4 | 90 | 56 | 23801 |
| 2 | D020 | Bangalore | Mohan | 18 | 62 | 0 | 175 | 4.6 | 92 | 44 | 23803 |
| 3 | D021 | Chennai | Suresh | 22 | 72 | 1 | 205 | 4.3 | 87 | 49 | 23805 |
Explanation Of The
Code
DATA
delivery_new
Creates a new SAS dataset that contains new delivery records.
length
Defines the character variable lengths.
Example:
length Delivery_ID $6
This ensures SAS stores the ID as a character
value with a length of 6.
Why it is important:
·
Prevents truncation
·
Ensures consistent structure with the base
dataset.
INPUT
Statement
Reads variables in the order specified.
:ddmmyy10.
This informat
reads dates written as:
01/03/2025
Without this informat, SAS would not correctly
interpret the date.
DATALINES
Allows manual entry of data.
Each row represents a new delivery record.
PROC APPEND
proc append base=delivery_clean
data=delivery_new;
run;
proc print data=delivery_clean;
run;
OUTPUT:
| Obs | Delivery_ID | City | Driver_Name | Utilization | Distance_km | Delivery_Time | Failed_Attempts | Fuel_Cost | Service_Rating | Percentage | Fees | Delivery_Date | Delivery_Date2 | Month_Gap | Next_Service |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | D001 | Hyderabad | Ramesh | Low | 12 | 45 | 0 | 150 | 4.5 | 90 | 40 | 23753 | 23753 | 0 | 23773 |
| 2 | D002 | Bangalore | Suresh | Medium | 20 | 60 | 1 | 200 | 4.2 | 88 | 50 | 23756 | 23756 | 0 | 23773 |
| 3 | D003 | Chennai | Mahesh | Low | 10 | 60 | 0 | 180 | 4.8 | 92 | 45 | 23759 | 23759 | 0 | 23773 |
| 4 | D004 | Mumbai | Anil | Low | 15 | 55 | 2 | 120 | 3.5 | 70 | 35 | 23761 | 23761 | 0 | 23773 |
| 5 | D005 | Delhi | Raju | Medium | 25 | 70 | 0 | 220 | 5.0 | 95 | 55 | 23766 | 23766 | 0 | 23773 |
| 6 | D006 | Kolkata | Kumar | Medium | 18 | 65 | 1 | 190 | 4.1 | 85 | 42 | 23769 | 23769 | 0 | 23773 |
| 7 | D007 | Hyderabad | Venkat | Medium | 22 | 80 | 3 | 250 | 3.8 | 60 | 60 | 23774 | 23774 | 1 | 23801 |
| 8 | D008 | Bangalore | Ravi | Low | 14 | 50 | 0 | 160 | 4.4 | 89 | 41 | 23777 | 23777 | 1 | 23801 |
| 9 | D009 | Chennai | Ganesh | High | 30 | 90 | 2 | 300 | 3.9 | 75 | 70 | 23779 | 23779 | 1 | 23801 |
| 10 | D010 | Mumbai | Sandeep | Low | 11 | 40 | 0 | 120 | 4.7 | 91 | 38 | 23782 | 23782 | 1 | 23801 |
| 11 | D011 | Delhi | Karthik | Medium | 17 | 58 | 1 | 170 | 4.3 | 87 | 43 | 23784 | 23784 | 1 | 23801 |
| 12 | D012 | Kolkata | Ajay | Medium | 16 | 52 | 0 | 165 | 4.6 | 90 | 40 | 23786 | 23786 | 1 | 23801 |
| 13 | D013 | Hyderabad | Manoj | Medium | 19 | 68 | 0 | 195 | 4.2 | 86 | 48 | 23788 | 23788 | 1 | 23801 |
| 14 | D014 | Bangalore | Srikanth | Medium | 23 | 75 | 1 | 210 | 4.1 | 84 | 52 | 23790 | 23790 | 1 | 23801 |
| 15 | D015 | Chennai | Praveen | High | 27 | 82 | 2 | 240 | 3.7 | 72 | 58 | 23792 | 23792 | 1 | 23801 |
| 16 | D016 | Mumbai | Rajesh | Low | 13 | 48 | 0 | 155 | 4.5 | 93 | 39 | 23794 | 23794 | 1 | 23801 |
| 17 | D017 | Delhi | Deepak | Medium | 21 | 70 | 1 | 205 | 4.0 | 83 | 51 | 23797 | 23797 | 1 | 23801 |
| 18 | D018 | Kolkata | Arjun | Medium | 24 | 78 | 0 | 215 | 4.6 | 92 | 54 | 23799 | 23799 | 1 | 23801 |
| 19 | D019 | Hyderabad | Raghav | 26 | 85 | 1 | 230 | 4.4 | 90 | 56 | 23801 | . | . | . | |
| 20 | D020 | Bangalore | Mohan | 18 | 62 | 0 | 175 | 4.6 | 92 | 44 | 23803 | . | . | . | |
| 21 | D021 | Chennai | Suresh | 22 | 72 | 1 | 205 | 4.3 | 87 | 49 | 23805 | . | . | . |
Explanation Of PROC APPEND
PROC APPEND
Used to add observations from one dataset to another without recreating the entire dataset.
BASE=delivery_clean
This is the target dataset.
New records will be added here.
DATA=delivery_new
This dataset contains new observations to be appended.
What Happens After Execution
Before append:
delivery_clean → 18 observationsdelivery_new → 3 observationsAfter append:
delivery_clean → 21 observationsAdvantages:
· Faster
· Does not recreate dataset
· Efficient for large datasets
Why PROC APPEND Is Used Instead Of SET
DATA STEP SET
data combined;
set delivery_clean
delivery_new;
run;
proc print data=combined;
run;
OUTPUT:
| Obs | Delivery_ID | City | Driver_Name | Utilization | Distance_km | Delivery_Time | Failed_Attempts | Fuel_Cost | Service_Rating | Percentage | Fees | Delivery_Date | Delivery_Date2 | Month_Gap | Next_Service |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | D001 | Hyderabad | Ramesh | Low | 12 | 45 | 0 | 150 | 4.5 | 90 | 40 | 23753 | 23753 | 0 | 23773 |
| 2 | D002 | Bangalore | Suresh | Medium | 20 | 60 | 1 | 200 | 4.2 | 88 | 50 | 23756 | 23756 | 0 | 23773 |
| 3 | D003 | Chennai | Mahesh | Low | 10 | 60 | 0 | 180 | 4.8 | 92 | 45 | 23759 | 23759 | 0 | 23773 |
| 4 | D004 | Mumbai | Anil | Low | 15 | 55 | 2 | 120 | 3.5 | 70 | 35 | 23761 | 23761 | 0 | 23773 |
| 5 | D005 | Delhi | Raju | Medium | 25 | 70 | 0 | 220 | 5.0 | 95 | 55 | 23766 | 23766 | 0 | 23773 |
| 6 | D006 | Kolkata | Kumar | Medium | 18 | 65 | 1 | 190 | 4.1 | 85 | 42 | 23769 | 23769 | 0 | 23773 |
| 7 | D007 | Hyderabad | Venkat | Medium | 22 | 80 | 3 | 250 | 3.8 | 60 | 60 | 23774 | 23774 | 1 | 23801 |
| 8 | D008 | Bangalore | Ravi | Low | 14 | 50 | 0 | 160 | 4.4 | 89 | 41 | 23777 | 23777 | 1 | 23801 |
| 9 | D009 | Chennai | Ganesh | High | 30 | 90 | 2 | 300 | 3.9 | 75 | 70 | 23779 | 23779 | 1 | 23801 |
| 10 | D010 | Mumbai | Sandeep | Low | 11 | 40 | 0 | 120 | 4.7 | 91 | 38 | 23782 | 23782 | 1 | 23801 |
| 11 | D011 | Delhi | Karthik | Medium | 17 | 58 | 1 | 170 | 4.3 | 87 | 43 | 23784 | 23784 | 1 | 23801 |
| 12 | D012 | Kolkata | Ajay | Medium | 16 | 52 | 0 | 165 | 4.6 | 90 | 40 | 23786 | 23786 | 1 | 23801 |
| 13 | D013 | Hyderabad | Manoj | Medium | 19 | 68 | 0 | 195 | 4.2 | 86 | 48 | 23788 | 23788 | 1 | 23801 |
| 14 | D014 | Bangalore | Srikanth | Medium | 23 | 75 | 1 | 210 | 4.1 | 84 | 52 | 23790 | 23790 | 1 | 23801 |
| 15 | D015 | Chennai | Praveen | High | 27 | 82 | 2 | 240 | 3.7 | 72 | 58 | 23792 | 23792 | 1 | 23801 |
| 16 | D016 | Mumbai | Rajesh | Low | 13 | 48 | 0 | 155 | 4.5 | 93 | 39 | 23794 | 23794 | 1 | 23801 |
| 17 | D017 | Delhi | Deepak | Medium | 21 | 70 | 1 | 205 | 4.0 | 83 | 51 | 23797 | 23797 | 1 | 23801 |
| 18 | D018 | Kolkata | Arjun | Medium | 24 | 78 | 0 | 215 | 4.6 | 92 | 54 | 23799 | 23799 | 1 | 23801 |
| 19 | D019 | Hyderabad | Raghav | 26 | 85 | 1 | 230 | 4.4 | 90 | 56 | 23801 | . | . | . | |
| 20 | D020 | Bangalore | Mohan | 18 | 62 | 0 | 175 | 4.6 | 92 | 44 | 23803 | . | . | . | |
| 21 | D021 | Chennai | Suresh | 22 | 72 | 1 | 205 | 4.3 | 87 | 49 | 23805 | . | . | . | |
| 22 | D019 | Hyderabad | Raghav | 26 | 85 | 1 | 230 | 4.4 | 90 | 56 | 23801 | . | . | . | |
| 23 | D020 | Bangalore | Mohan | 18 | 62 | 0 | 175 | 4.6 | 92 | 44 | 23803 | . | . | . | |
| 24 | D021 | Chennai | Suresh | 22 | 72 | 1 | 205 | 4.3 | 87 | 49 | 23805 | . | . | . |
Why PROC APPEND Is Used
Instead Of SET
Problem:
·
Creates a new dataset
·
Requires extra storage
5. Best Practice In SAS Projects
Before merging, always check dataset structure:
proc contents data=delivery_clean;
run;
proc print data=delivery_clean;
run;
OUTPUT:
The CONTENTS Procedure
| Data Set Name | WORK.DELIVERY_CLEAN | Observations | 21 |
|---|---|---|---|
| Member Type | DATA | Variables | 15 |
| Engine | V9 | Indexes | 0 |
| Created | 03/06/2026 16:39:51 | Observation Length | 160 |
| Last Modified | 03/06/2026 16:41:30 | Deleted Observations | 0 |
| Protection | Compressed | NO | |
| Data Set Type | Sorted | NO | |
| Label | |||
| Data Representation | SOLARIS_X86_64, LINUX_X86_64, ALPHA_TRU64, LINUX_IA64 | ||
| Encoding | utf-8 Unicode (UTF-8) |
| Engine/Host Dependent Information | |
|---|---|
| Data Set Page Size | 131072 |
| Number of Data Set Pages | 1 |
| First Data Page | 1 |
| Max Obs per Page | 818 |
| Obs in First Data Page | 21 |
| Number of Data Set Repairs | 0 |
| Filename | /saswork/SAS_workD19200007C04_odaws01-apse1-2.oda.sas.com/SAS_workF85F00007C04_odaws01-apse1-2.oda.sas.com/delivery_clean.sas7bdat |
| Release Created | 9.0401M8 |
| Host Created | Linux |
| Inode Number | 201344327 |
| Access Permission | rw-r--r-- |
| Owner Name | u63247146 |
| File Size | 256KB |
| File Size (bytes) | 262144 |
| Alphabetic List of Variables and Attributes | |||
|---|---|---|---|
| # | Variable | Type | Len |
| 2 | City | Char | 20 |
| 12 | Delivery_Date | Num | 8 |
| 13 | Delivery_Date2 | Num | 8 |
| 1 | Delivery_ID | Char | 6 |
| 6 | Delivery_Time | Num | 8 |
| 5 | Distance_km | Num | 8 |
| 3 | Driver_Name | Char | 25 |
| 7 | Failed_Attempts | Num | 8 |
| 11 | Fees | Num | 8 |
| 8 | Fuel_Cost | Num | 8 |
| 14 | Month_Gap | Num | 8 |
| 15 | Next_Service | Num | 8 |
| 10 | Percentage | Num | 8 |
| 9 | Service_Rating | Num | 8 |
| 4 | Utilization | Char | 20 |
| Obs | Delivery_ID | City | Driver_Name | Utilization | Distance_km | Delivery_Time | Failed_Attempts | Fuel_Cost | Service_Rating | Percentage | Fees | Delivery_Date | Delivery_Date2 | Month_Gap | Next_Service |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | D001 | Hyderabad | Ramesh | Low | 12 | 45 | 0 | 150 | 4.5 | 90 | 40 | 23753 | 23753 | 0 | 23773 |
| 2 | D002 | Bangalore | Suresh | Medium | 20 | 60 | 1 | 200 | 4.2 | 88 | 50 | 23756 | 23756 | 0 | 23773 |
| 3 | D003 | Chennai | Mahesh | Low | 10 | 60 | 0 | 180 | 4.8 | 92 | 45 | 23759 | 23759 | 0 | 23773 |
| 4 | D004 | Mumbai | Anil | Low | 15 | 55 | 2 | 120 | 3.5 | 70 | 35 | 23761 | 23761 | 0 | 23773 |
| 5 | D005 | Delhi | Raju | Medium | 25 | 70 | 0 | 220 | 5.0 | 95 | 55 | 23766 | 23766 | 0 | 23773 |
| 6 | D006 | Kolkata | Kumar | Medium | 18 | 65 | 1 | 190 | 4.1 | 85 | 42 | 23769 | 23769 | 0 | 23773 |
| 7 | D007 | Hyderabad | Venkat | Medium | 22 | 80 | 3 | 250 | 3.8 | 60 | 60 | 23774 | 23774 | 1 | 23801 |
| 8 | D008 | Bangalore | Ravi | Low | 14 | 50 | 0 | 160 | 4.4 | 89 | 41 | 23777 | 23777 | 1 | 23801 |
| 9 | D009 | Chennai | Ganesh | High | 30 | 90 | 2 | 300 | 3.9 | 75 | 70 | 23779 | 23779 | 1 | 23801 |
| 10 | D010 | Mumbai | Sandeep | Low | 11 | 40 | 0 | 120 | 4.7 | 91 | 38 | 23782 | 23782 | 1 | 23801 |
| 11 | D011 | Delhi | Karthik | Medium | 17 | 58 | 1 | 170 | 4.3 | 87 | 43 | 23784 | 23784 | 1 | 23801 |
| 12 | D012 | Kolkata | Ajay | Medium | 16 | 52 | 0 | 165 | 4.6 | 90 | 40 | 23786 | 23786 | 1 | 23801 |
| 13 | D013 | Hyderabad | Manoj | Medium | 19 | 68 | 0 | 195 | 4.2 | 86 | 48 | 23788 | 23788 | 1 | 23801 |
| 14 | D014 | Bangalore | Srikanth | Medium | 23 | 75 | 1 | 210 | 4.1 | 84 | 52 | 23790 | 23790 | 1 | 23801 |
| 15 | D015 | Chennai | Praveen | High | 27 | 82 | 2 | 240 | 3.7 | 72 | 58 | 23792 | 23792 | 1 | 23801 |
| 16 | D016 | Mumbai | Rajesh | Low | 13 | 48 | 0 | 155 | 4.5 | 93 | 39 | 23794 | 23794 | 1 | 23801 |
| 17 | D017 | Delhi | Deepak | Medium | 21 | 70 | 1 | 205 | 4.0 | 83 | 51 | 23797 | 23797 | 1 | 23801 |
| 18 | D018 | Kolkata | Arjun | Medium | 24 | 78 | 0 | 215 | 4.6 | 92 | 54 | 23799 | 23799 | 1 | 23801 |
| 19 | D019 | Hyderabad | Raghav | 26 | 85 | 1 | 230 | 4.4 | 90 | 56 | 23801 | . | . | . | |
| 20 | D020 | Bangalore | Mohan | 18 | 62 | 0 | 175 | 4.6 | 92 | 44 | 23803 | . | . | . | |
| 21 | D021 | Chennai | Suresh | 22 | 72 | 1 | 205 | 4.3 | 87 | 49 | 23805 | . | . | . |
proc contents data=driver_data;
run;
proc print data=driver_data;
run;
OUTPUT:
The CONTENTS Procedure
| Data Set Name | WORK.DRIVER_DATA | Observations | 18 |
|---|---|---|---|
| Member Type | DATA | Variables | 4 |
| Engine | V9 | Indexes | 0 |
| Created | 03/06/2026 16:42:31 | Observation Length | 56 |
| Last Modified | 03/06/2026 16:42:31 | Deleted Observations | 0 |
| Protection | Compressed | NO | |
| Data Set Type | Sorted | NO | |
| Label | |||
| Data Representation | SOLARIS_X86_64, LINUX_X86_64, ALPHA_TRU64, LINUX_IA64 | ||
| Encoding | utf-8 Unicode (UTF-8) |
| Engine/Host Dependent Information | |
|---|---|
| Data Set Page Size | 131072 |
| Number of Data Set Pages | 1 |
| First Data Page | 1 |
| Max Obs per Page | 2334 |
| Obs in First Data Page | 18 |
| Number of Data Set Repairs | 0 |
| Filename | /saswork/SAS_workD19200007C04_odaws01-apse1-2.oda.sas.com/SAS_workF85F00007C04_odaws01-apse1-2.oda.sas.com/driver_data.sas7bdat |
| Release Created | 9.0401M8 |
| Host Created | Linux |
| Inode Number | 201344328 |
| Access Permission | rw-r--r-- |
| Owner Name | u63247146 |
| File Size | 256KB |
| File Size (bytes) | 262144 |
| Alphabetic List of Variables and Attributes | |||
|---|---|---|---|
| # | Variable | Type | Len |
| 1 | Delivery_ID | Char | 6 |
| 2 | Driver_Name | Char | 25 |
| 3 | Driver_Rating | Num | 8 |
| 4 | Vehicle_Type | Char | 15 |
| Obs | Delivery_ID | Driver_Name | Driver_Rating | Vehicle_Type |
|---|---|---|---|---|
| 1 | D001 | Ramesh | 4.5 | Bike |
| 2 | D002 | Suresh | 4.2 | Van |
| 3 | D003 | Mahesh | 4.6 | Bike |
| 4 | D004 | Anil | 3.8 | Truck |
| 5 | D005 | Raju | 4.7 | Bike |
| 6 | D006 | Kumar | 4.3 | Van |
| 7 | D007 | Venkat | 3.9 | Bike |
| 8 | D008 | Ravi | 4.4 | Bike |
| 9 | D009 | Ganesh | 4.0 | Van |
| 10 | D010 | Sandeep | 4.6 | Bike |
| 11 | D011 | Karthik | 4.3 | Van |
| 12 | D012 | Ajay | 4.5 | Bike |
| 13 | D013 | Manoj | 4.2 | Bike |
| 14 | D014 | Srikanth | 4.1 | Truck |
| 15 | D015 | Praveen | 3.7 | Van |
| 16 | D016 | Rajesh | 4.6 | Bike |
| 17 | D017 | Deepak | 4.0 | Bike |
| 18 | D018 | Arjun | 4.5 | Van |
·
Variable types match
·
Variable lengths match
·
Key variables exist
6. MERGE
Step 1 — Create Driver Dataset
data driver_data;
length Delivery_ID $6 Driver_Name $25 Driver_Rating 8 Vehicle_Type $15;
input Delivery_ID $ Driver_Name $ Driver_Rating Vehicle_Type $;
datalines;
D001 Ramesh 4.5 Bike
D002 Suresh 4.2 Van
D003 Mahesh 4.6 Bike
D004 Anil 3.8 Truck
D005 Raju 4.7 Bike
D006 Kumar 4.3 Van
D007 Venkat 3.9 Bike
D008 Ravi 4.4 Bike
D009 Ganesh 4.0 Van
D010 Sandeep 4.6 Bike
D011 Karthik 4.3 Van
D012 Ajay 4.5 Bike
D013 Manoj 4.2 Bike
D014 Srikanth 4.1 Truck
D015 Praveen 3.7 Van
D016 Rajesh 4.6 Bike
D017 Deepak 4.0 Bike
D018 Arjun 4.5 Van
;
run;
proc print data=driver_data;
run;
OUTPUT:
| Obs | Delivery_ID | Driver_Name | Driver_Rating | Vehicle_Type |
|---|---|---|---|---|
| 1 | D001 | Ramesh | 4.5 | Bike |
| 2 | D002 | Suresh | 4.2 | Van |
| 3 | D003 | Mahesh | 4.6 | Bike |
| 4 | D004 | Anil | 3.8 | Truck |
| 5 | D005 | Raju | 4.7 | Bike |
| 6 | D006 | Kumar | 4.3 | Van |
| 7 | D007 | Venkat | 3.9 | Bike |
| 8 | D008 | Ravi | 4.4 | Bike |
| 9 | D009 | Ganesh | 4.0 | Van |
| 10 | D010 | Sandeep | 4.6 | Bike |
| 11 | D011 | Karthik | 4.3 | Van |
| 12 | D012 | Ajay | 4.5 | Bike |
| 13 | D013 | Manoj | 4.2 | Bike |
| 14 | D014 | Srikanth | 4.1 | Truck |
| 15 | D015 | Praveen | 3.7 | Van |
| 16 | D016 | Rajesh | 4.6 | Bike |
| 17 | D017 | Deepak | 4.0 | Bike |
| 18 | D018 | Arjun | 4.5 | Van |
Explanation Of The Code
DATA
driver_data
Creates a new dataset that stores driver-level information.
This dataset will later be merged with delivery data.
LENGTH
Statement
length Delivery_ID $6 Driver_Name $25 Vehicle_Type $15;
Driver_Name $25
means SAS can store up to 25 characters.
·
Defines character variable sizes
·
Prevents variable truncation
·
Ensures consistent structure during
merge
INPUT
Statement
Defines how SAS should read the variables.
Variables read:
Delivery_ID
Driver_Name
Driver_Rating
Vehicle_Type
Each row represents driver information linked to a delivery.
Example row:
D001 Ramesh 4.5 Bike
|
Variable |
Value |
|
Delivery_ID |
D001 |
|
Driver_Name |
Ramesh |
|
Driver_Rating |
4.5 |
|
Vehicle_Type |
Bike |
Step 2 — Sort Datasets
proc sort data=delivery_clean;
by Delivery_ID;
run;
proc print data=delivery_clean;
run;
OUTPUT:
| Obs | Delivery_ID | City | Driver_Name | Utilization | Distance_km | Delivery_Time | Failed_Attempts | Fuel_Cost | Service_Rating | Percentage | Fees | Delivery_Date | Delivery_Date2 | Month_Gap | Next_Service |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | D001 | Hyderabad | Ramesh | Low | 12 | 45 | 0 | 150 | 4.5 | 90 | 40 | 23753 | 23753 | 0 | 23773 |
| 2 | D002 | Bangalore | Suresh | Medium | 20 | 60 | 1 | 200 | 4.2 | 88 | 50 | 23756 | 23756 | 0 | 23773 |
| 3 | D003 | Chennai | Mahesh | Low | 10 | 60 | 0 | 180 | 4.8 | 92 | 45 | 23759 | 23759 | 0 | 23773 |
| 4 | D004 | Mumbai | Anil | Low | 15 | 55 | 2 | 120 | 3.5 | 70 | 35 | 23761 | 23761 | 0 | 23773 |
| 5 | D005 | Delhi | Raju | Medium | 25 | 70 | 0 | 220 | 5.0 | 95 | 55 | 23766 | 23766 | 0 | 23773 |
| 6 | D006 | Kolkata | Kumar | Medium | 18 | 65 | 1 | 190 | 4.1 | 85 | 42 | 23769 | 23769 | 0 | 23773 |
| 7 | D007 | Hyderabad | Venkat | Medium | 22 | 80 | 3 | 250 | 3.8 | 60 | 60 | 23774 | 23774 | 1 | 23801 |
| 8 | D008 | Bangalore | Ravi | Low | 14 | 50 | 0 | 160 | 4.4 | 89 | 41 | 23777 | 23777 | 1 | 23801 |
| 9 | D009 | Chennai | Ganesh | High | 30 | 90 | 2 | 300 | 3.9 | 75 | 70 | 23779 | 23779 | 1 | 23801 |
| 10 | D010 | Mumbai | Sandeep | Low | 11 | 40 | 0 | 120 | 4.7 | 91 | 38 | 23782 | 23782 | 1 | 23801 |
| 11 | D011 | Delhi | Karthik | Medium | 17 | 58 | 1 | 170 | 4.3 | 87 | 43 | 23784 | 23784 | 1 | 23801 |
| 12 | D012 | Kolkata | Ajay | Medium | 16 | 52 | 0 | 165 | 4.6 | 90 | 40 | 23786 | 23786 | 1 | 23801 |
| 13 | D013 | Hyderabad | Manoj | Medium | 19 | 68 | 0 | 195 | 4.2 | 86 | 48 | 23788 | 23788 | 1 | 23801 |
| 14 | D014 | Bangalore | Srikanth | Medium | 23 | 75 | 1 | 210 | 4.1 | 84 | 52 | 23790 | 23790 | 1 | 23801 |
| 15 | D015 | Chennai | Praveen | High | 27 | 82 | 2 | 240 | 3.7 | 72 | 58 | 23792 | 23792 | 1 | 23801 |
| 16 | D016 | Mumbai | Rajesh | Low | 13 | 48 | 0 | 155 | 4.5 | 93 | 39 | 23794 | 23794 | 1 | 23801 |
| 17 | D017 | Delhi | Deepak | Medium | 21 | 70 | 1 | 205 | 4.0 | 83 | 51 | 23797 | 23797 | 1 | 23801 |
| 18 | D018 | Kolkata | Arjun | Medium | 24 | 78 | 0 | 215 | 4.6 | 92 | 54 | 23799 | 23799 | 1 | 23801 |
| 19 | D019 | Hyderabad | Raghav | 26 | 85 | 1 | 230 | 4.4 | 90 | 56 | 23801 | . | . | . | |
| 20 | D020 | Bangalore | Mohan | 18 | 62 | 0 | 175 | 4.6 | 92 | 44 | 23803 | . | . | . | |
| 21 | D021 | Chennai | Suresh | 22 | 72 | 1 | 205 | 4.3 | 87 | 49 | 23805 | . | . | . |
proc sort data=driver_data;
by Delivery_ID;
run;
proc print data=driver_data;
run;
OUTPUT:
| Obs | Delivery_ID | Driver_Name | Driver_Rating | Vehicle_Type |
|---|---|---|---|---|
| 1 | D001 | Ramesh | 4.5 | Bike |
| 2 | D002 | Suresh | 4.2 | Van |
| 3 | D003 | Mahesh | 4.6 | Bike |
| 4 | D004 | Anil | 3.8 | Truck |
| 5 | D005 | Raju | 4.7 | Bike |
| 6 | D006 | Kumar | 4.3 | Van |
| 7 | D007 | Venkat | 3.9 | Bike |
| 8 | D008 | Ravi | 4.4 | Bike |
| 9 | D009 | Ganesh | 4.0 | Van |
| 10 | D010 | Sandeep | 4.6 | Bike |
| 11 | D011 | Karthik | 4.3 | Van |
| 12 | D012 | Ajay | 4.5 | Bike |
| 13 | D013 | Manoj | 4.2 | Bike |
| 14 | D014 | Srikanth | 4.1 | Truck |
| 15 | D015 | Praveen | 3.7 | Van |
| 16 | D016 | Rajesh | 4.6 | Bike |
| 17 | D017 | Deepak | 4.0 | Bike |
| 18 | D018 | Arjun | 4.5 | Van |
Why Sorting Is Required
The MERGE
statement performs sequential matching.
Without sorting, SAS may show:
ERROR: BY variables are
not properly sorted.
Step 3 — Perform MERGE
data delivery_final;
merge delivery_clean driver_data;
by Delivery_ID;
run;
proc print data=delivery_final;
run;
OUTPUT:
| Obs | Delivery_ID | City | Driver_Name | Utilization | Distance_km | Delivery_Time | Failed_Attempts | Fuel_Cost | Service_Rating | Percentage | Fees | Delivery_Date | Delivery_Date2 | Month_Gap | Next_Service | Driver_Rating | Vehicle_Type |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | D001 | Hyderabad | Ramesh | Low | 12 | 45 | 0 | 150 | 4.5 | 90 | 40 | 23753 | 23753 | 0 | 23773 | 4.5 | Bike |
| 2 | D002 | Bangalore | Suresh | Medium | 20 | 60 | 1 | 200 | 4.2 | 88 | 50 | 23756 | 23756 | 0 | 23773 | 4.2 | Van |
| 3 | D003 | Chennai | Mahesh | Low | 10 | 60 | 0 | 180 | 4.8 | 92 | 45 | 23759 | 23759 | 0 | 23773 | 4.6 | Bike |
| 4 | D004 | Mumbai | Anil | Low | 15 | 55 | 2 | 120 | 3.5 | 70 | 35 | 23761 | 23761 | 0 | 23773 | 3.8 | Truck |
| 5 | D005 | Delhi | Raju | Medium | 25 | 70 | 0 | 220 | 5.0 | 95 | 55 | 23766 | 23766 | 0 | 23773 | 4.7 | Bike |
| 6 | D006 | Kolkata | Kumar | Medium | 18 | 65 | 1 | 190 | 4.1 | 85 | 42 | 23769 | 23769 | 0 | 23773 | 4.3 | Van |
| 7 | D007 | Hyderabad | Venkat | Medium | 22 | 80 | 3 | 250 | 3.8 | 60 | 60 | 23774 | 23774 | 1 | 23801 | 3.9 | Bike |
| 8 | D008 | Bangalore | Ravi | Low | 14 | 50 | 0 | 160 | 4.4 | 89 | 41 | 23777 | 23777 | 1 | 23801 | 4.4 | Bike |
| 9 | D009 | Chennai | Ganesh | High | 30 | 90 | 2 | 300 | 3.9 | 75 | 70 | 23779 | 23779 | 1 | 23801 | 4.0 | Van |
| 10 | D010 | Mumbai | Sandeep | Low | 11 | 40 | 0 | 120 | 4.7 | 91 | 38 | 23782 | 23782 | 1 | 23801 | 4.6 | Bike |
| 11 | D011 | Delhi | Karthik | Medium | 17 | 58 | 1 | 170 | 4.3 | 87 | 43 | 23784 | 23784 | 1 | 23801 | 4.3 | Van |
| 12 | D012 | Kolkata | Ajay | Medium | 16 | 52 | 0 | 165 | 4.6 | 90 | 40 | 23786 | 23786 | 1 | 23801 | 4.5 | Bike |
| 13 | D013 | Hyderabad | Manoj | Medium | 19 | 68 | 0 | 195 | 4.2 | 86 | 48 | 23788 | 23788 | 1 | 23801 | 4.2 | Bike |
| 14 | D014 | Bangalore | Srikanth | Medium | 23 | 75 | 1 | 210 | 4.1 | 84 | 52 | 23790 | 23790 | 1 | 23801 | 4.1 | Truck |
| 15 | D015 | Chennai | Praveen | High | 27 | 82 | 2 | 240 | 3.7 | 72 | 58 | 23792 | 23792 | 1 | 23801 | 3.7 | Van |
| 16 | D016 | Mumbai | Rajesh | Low | 13 | 48 | 0 | 155 | 4.5 | 93 | 39 | 23794 | 23794 | 1 | 23801 | 4.6 | Bike |
| 17 | D017 | Delhi | Deepak | Medium | 21 | 70 | 1 | 205 | 4.0 | 83 | 51 | 23797 | 23797 | 1 | 23801 | 4.0 | Bike |
| 18 | D018 | Kolkata | Arjun | Medium | 24 | 78 | 0 | 215 | 4.6 | 92 | 54 | 23799 | 23799 | 1 | 23801 | 4.5 | Van |
| 19 | D019 | Hyderabad | Raghav | 26 | 85 | 1 | 230 | 4.4 | 90 | 56 | 23801 | . | . | . | . | ||
| 20 | D020 | Bangalore | Mohan | 18 | 62 | 0 | 175 | 4.6 | 92 | 44 | 23803 | . | . | . | . | ||
| 21 | D021 | Chennai | Suresh | 22 | 72 | 1 | 205 | 4.3 | 87 | 49 | 23805 | . | . | . | . |
Explanation Of MERGE Code
DATA
delivery_final
Creates a new dataset containing combined
information.
MERGE
Statement
merge delivery_clean driver_data;
Combines both datasets row-by-row
using a common key.
BY
Delivery_ID
Specifies the matching variable.
SAS matches records like this:
|
delivery_clean |
driver_data |
|||||||||||||||||||||||||||||||||||
|
Delivery_ID |
Delivery_ID |
|||||||||||||||||||||||||||||||||||
|
Example: D001 →
delivery record D001 →
driver record These rows are two merged . Example Result Dataset
Why MERGE Is Important In Real Projects
In real logistics systems, data comes from multiple systems. Example datasets:
MERGE
helps combine all these datasets into one analytical dataset. |
|
7. PROC TRANSPOSE
proc transpose data=delivery_clean out=delivery_transpose;
var Fuel_Cost Fees;
run;
proc print data=delivery_transpose;
run;
OUTPUT:
| Obs | _NAME_ | COL1 | COL2 | COL3 | COL4 | COL5 | COL6 | COL7 | COL8 | COL9 | COL10 | COL11 | COL12 | COL13 | COL14 | COL15 | COL16 | COL17 | COL18 | COL19 | COL20 | COL21 |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | Fuel_Cost | 150 | 200 | 180 | 120 | 220 | 190 | 250 | 160 | 300 | 120 | 170 | 165 | 195 | 210 | 240 | 155 | 205 | 215 | 230 | 175 | 205 |
| 2 | Fees | 40 | 50 | 45 | 35 | 55 | 42 | 60 | 41 | 70 | 38 | 43 | 40 | 48 | 52 | 58 | 39 | 51 | 54 | 56 | 44 | 49 |
Converts rows to columns.
8. Fraud Detection Macro
%macro fraud_check;
data fraud_detection;
set delivery_clean;
if Failed_Attempts >2 and Service_Rating <4 then Fraud_Flag="YES";
else Fraud_Flag="NO";
run;
proc print data=fraud_detection;
run;
%mend;
%fraud_check;
OUTPUT:
| Obs | Delivery_ID | City | Driver_Name | Utilization | Distance_km | Delivery_Time | Failed_Attempts | Fuel_Cost | Service_Rating | Percentage | Fees | Delivery_Date | Delivery_Date2 | Month_Gap | Next_Service | Fraud_Flag |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | D001 | Hyderabad | Ramesh | Low | 12 | 45 | 0 | 150 | 4.5 | 90 | 40 | 23753 | 23753 | 0 | 23773 | NO |
| 2 | D002 | Bangalore | Suresh | Medium | 20 | 60 | 1 | 200 | 4.2 | 88 | 50 | 23756 | 23756 | 0 | 23773 | NO |
| 3 | D003 | Chennai | Mahesh | Low | 10 | 60 | 0 | 180 | 4.8 | 92 | 45 | 23759 | 23759 | 0 | 23773 | NO |
| 4 | D004 | Mumbai | Anil | Low | 15 | 55 | 2 | 120 | 3.5 | 70 | 35 | 23761 | 23761 | 0 | 23773 | NO |
| 5 | D005 | Delhi | Raju | Medium | 25 | 70 | 0 | 220 | 5.0 | 95 | 55 | 23766 | 23766 | 0 | 23773 | NO |
| 6 | D006 | Kolkata | Kumar | Medium | 18 | 65 | 1 | 190 | 4.1 | 85 | 42 | 23769 | 23769 | 0 | 23773 | NO |
| 7 | D007 | Hyderabad | Venkat | Medium | 22 | 80 | 3 | 250 | 3.8 | 60 | 60 | 23774 | 23774 | 1 | 23801 | YES |
| 8 | D008 | Bangalore | Ravi | Low | 14 | 50 | 0 | 160 | 4.4 | 89 | 41 | 23777 | 23777 | 1 | 23801 | NO |
| 9 | D009 | Chennai | Ganesh | High | 30 | 90 | 2 | 300 | 3.9 | 75 | 70 | 23779 | 23779 | 1 | 23801 | NO |
| 10 | D010 | Mumbai | Sandeep | Low | 11 | 40 | 0 | 120 | 4.7 | 91 | 38 | 23782 | 23782 | 1 | 23801 | NO |
| 11 | D011 | Delhi | Karthik | Medium | 17 | 58 | 1 | 170 | 4.3 | 87 | 43 | 23784 | 23784 | 1 | 23801 | NO |
| 12 | D012 | Kolkata | Ajay | Medium | 16 | 52 | 0 | 165 | 4.6 | 90 | 40 | 23786 | 23786 | 1 | 23801 | NO |
| 13 | D013 | Hyderabad | Manoj | Medium | 19 | 68 | 0 | 195 | 4.2 | 86 | 48 | 23788 | 23788 | 1 | 23801 | NO |
| 14 | D014 | Bangalore | Srikanth | Medium | 23 | 75 | 1 | 210 | 4.1 | 84 | 52 | 23790 | 23790 | 1 | 23801 | NO |
| 15 | D015 | Chennai | Praveen | High | 27 | 82 | 2 | 240 | 3.7 | 72 | 58 | 23792 | 23792 | 1 | 23801 | NO |
| 16 | D016 | Mumbai | Rajesh | Low | 13 | 48 | 0 | 155 | 4.5 | 93 | 39 | 23794 | 23794 | 1 | 23801 | NO |
| 17 | D017 | Delhi | Deepak | Medium | 21 | 70 | 1 | 205 | 4.0 | 83 | 51 | 23797 | 23797 | 1 | 23801 | NO |
| 18 | D018 | Kolkata | Arjun | Medium | 24 | 78 | 0 | 215 | 4.6 | 92 | 54 | 23799 | 23799 | 1 | 23801 | NO |
| 19 | D019 | Hyderabad | Raghav | 26 | 85 | 1 | 230 | 4.4 | 90 | 56 | 23801 | . | . | . | NO | |
| 20 | D020 | Bangalore | Mohan | 18 | 62 | 0 | 175 | 4.6 | 92 | 44 | 23803 | . | . | . | NO | |
| 21 | D021 | Chennai | Suresh | 22 | 72 | 1 | 205 | 4.3 | 87 | 49 | 23805 | . | . | . | NO |
9. PROC DATASETS DELETE
proc datasets library=work nolist;
delete delivery_raw;
quit;
LOG:
Removes temporary datasets.
10. Business Insights
Using this dataset we can analyze:
·
Average delivery time per city
·
Failed delivery patterns
·
Fuel cost optimization
·
Driver performance
·
Fraud detection patterns
·
Create a simulated last-mile delivery dataset containing
operational variables like Delivery_ID, City, Distance_km, Delivery_Time,
Fuel_Cost, and Service_Rating.
·
Introduce intentional data quality errors such as negative
distance, missing delivery time, invalid ratings, and inconsistent city names.
·
Use DATA
step programming to read raw delivery data and construct the initial
dataset.
·
Apply character cleaning functions such as STRIP, TRIM, UPCASE, LOWCASE, and PROPCASE to
standardize city and driver names.
·
Use concatenation
functions like CAT,
CATX,
and CATS
to combine variables such as driver and city information for reporting.
·
Implement numeric functions like ABS,
ROUND,
and SUM to correct negative
values and standardize numeric metrics.
·
Handle missing values using the COALESCE function to ensure
data completeness for analysis.
·
Standardize date values using the MDY function to
construct proper SAS date variables.
·
Calculate time intervals between dates using the INTCK function for
delivery scheduling analysis.
·
Generate future or next scheduled dates using the INTNX
function to simulate operational planning.
·
Use conditional logic in DATA step to classify delivery
utilization levels such as Low, Medium, and High.
·
Detect potential operational anomalies
like excessive failed delivery attempts
using conditional rules.
·
Implement macro programming to create a reusable fraud detection
logic for suspicious deliveries.
·
Use PROC APPEND to add new delivery records to the existing
dataset without recreating the dataset.
·
Use MERGE statement to combine delivery information with
driver or operational datasets.
·
Transform dataset structure using PROC TRANSPOSE to convert rows into
columns for reporting.
·
Validate dataset structure using PROC CONTENTS to verify variables,
types, and lengths.
·
Perform dataset management using PROC DATASETS DELETE to remove
unnecessary intermediate datasets.
·
Prepare the cleaned and corrected
dataset for logistics performance
analysis and reporting.
·
Generate insights to help companies reduce delivery errors, detect fraud patterns,
optimize fuel usage, and improve last-mile operational efficiency.
Conclusion
This project demonstrates how SAS programming can support logistics analytics
in last-mile delivery operations. Through the creation of a simulated
dataset with intentional errors, the project illustrates the importance of data
validation and cleaning in operational environments.
Key SAS techniques used in this project
include:
·
Data Step transformations
·
Character and numeric functions
·
Date functions such as MDY, INTCK, and
INTNX
·
Dataset merging and appending
·
Macro-based fraud detection
·
Dataset management using PROC DATASETS
By applying these techniques, the raw dataset was successfully transformed
into a clean and analysis-ready dataset.
The fraud detection macro further enables
organizations to identify suspicious delivery patterns, improving operational
transparency and preventing financial losses.
Overall, this project shows how SAS can be
used not only for statistical analysis but also for real-world operational data management and fraud detection in
logistics systems.
SAS INTERVIEW QUESTIONS
1.What Is The Difference Between MERGE And JOIN?
Answer:
|
Feature |
MERGE |
SQL JOIN |
|
Location |
DATA Step |
PROC SQL |
|
Sorting Required |
Yes |
No |
|
Performance |
Faster for large datasets |
Flexible |
|
Use Case |
Sequential merging |
Complex joins |
2.What is the difference between SET
and PROC APPEND?
Answer:
|
Feature |
SET |
PROC APPEND |
|
Creates new dataset |
Yes |
No |
|
Performance |
Slower |
Faster |
|
Memory usage |
Higher |
Lower |
|
Use case |
Combining datasets |
Adding records |
POINT= option in a SET
statement is used to read a
specific observation directly from a dataset instead of reading sequentially.
It allows random
access to observations using the observation number.
Example:
data example;
set sales point=i;
run;
Here, point=i
tells SAS to read the i-th observation
from the dataset.
Key point:
POINT= reads observations randomly (direct
access) rather than sequentially
from top to bottom.
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
About the Author:
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
About the Author:
SAS Learning Hub is a data analytics and SAS programming platform focused on clinical, financial, and real-world data analysis. The content is created by professionals with academic training in Pharmaceutics and hands-on experience in Base SAS, PROC SQL, Macros, SDTM, and ADaM, providing practical and industry-relevant SAS learning resources.
Disclaimer:
The datasets and analysis in this article are created for educational and demonstration purposes only. They do not represent DELIVERY OPERATIONS data.
Our Mission:
This blog provides industry-focused SAS programming tutorials and analytics projects covering finance, healthcare, and technology.
This project is suitable for:
· Students learning SAS
· Data analysts building portfolios
· Professionals preparing for SAS interviews
· Bloggers writing about analytics
· Clinical SAS Programmer
· Research Data Analyst
· Regulatory Data Validator
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Comments
Post a Comment