Smart SAS Intelligence For Last-Mile Delivery Error Detection And Fraud Prevention

--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

HERE IN THIS PROJECT WE USED THESE SAS STATEMENTS —DATA STEP | LENGTH | INPUT | DATALINES | SET | MERGE | IF-THEN | FORMAT | INFORMAT | PROC SORT | PROC PRINT | PROC CONTENTS | PROC TRANSPOSE | PROC APPEND | PROC DATASETS DELETE | MACRO / %MACRO / %MEND

Table Of Contents

Introduction
Business Context
Project Objectives
Dataset Structure
Raw Dataset With Intentional Errors
Understanding The Errors In The Dataset
Corrected SAS Code For Clean Dataset
Data Preparation Using SAS Functions
Numeric Functions Used
Character Functions Used
Date Functions Using MDY, INTCK, INTNX
Dataset Operations Using SET, MERGE, APPEND
Data Transformation Using PROC TRANSPOSE
Fraud Detection Macro Logic
Utilization Classification Logic
Data Cleaning Using PROC DATASETS DELETE
Analytical Insights From The Dataset
Business Benefits
20 Key Points About This Project
Conclusion

Introduction

Last-mile delivery is the final stage of the logistics process where goods move from a distribution center to the customer's location. This phase is often considered the most complex and expensive part of supply chain logistics. Companies such as e-commerce platforms, courier services, and food delivery providers heavily depend on efficient last-mile delivery operations to ensure customer satisfaction.

However, managing last-mile deliveries involves multiple operational challenges. These include route optimization, delivery delays, failed delivery attempts, fuel cost fluctuations, and sometimes fraudulent reporting by drivers or logistics partners. Errors in operational data can lead to incorrect decision making and financial losses.

In modern analytics environments, SAS programming is widely used to manage operational datasets, detect anomalies, and optimize logistics performance. By combining data steps, SQL queries, macros, and advanced functions, organizations can identify inefficiencies, detect suspicious behavior, and improve service quality.

This project demonstrates how SAS programming can be used to:

Create a last-mile delivery dataset
Introduce intentional data errors
Identify and correct those errors
Detect potential delivery fraud
Calculate operational metrics
Perform dataset transformations and reporting

The project uses multiple SAS techniques including:

Data Step Programming
Character Functions
Numeric Functions
Date Functions
PROC APPEND
PROC TRANSPOSE
Dataset MERGE
Fraud Detection Macros

The final result is a clean dataset that can be used for performance analysis and fraud monitoring.

Business Context

Imagine a logistics company operating across several Indian cities such as Hyderabad, Bangalore, Chennai, Mumbai, and Delhi. The company manages thousands of daily deliveries and records operational metrics for each delivery.

Each delivery record contains:

Delivery ID
City of delivery
Distance traveled
Delivery time
Number of failed attempts
Fuel cost
Service rating
Operational utilization
Percentage performance score
Delivery service fees
Delivery date

However, the company notices inconsistencies in their operational data.

Examples include:

Incorrect city name formatting
Missing delivery time
Negative fuel cost values
Invalid ratings
Fraudulent reporting of failed deliveries

To solve these issues, the company assigns a data analytics team to develop a SAS program that:

Creates a delivery dataset
Identifies errors
Corrects the data
Detects suspicious patterns
Generates insights for operations managers

Project Objectives

The main objectives of this project are:

Create a simulated dataset representing last-mile delivery operations
Introduce intentional errors in the raw dataset
Identify and explain those errors
Correct the dataset using SAS programming
Apply character and numeric functions for cleaning data
Use date functions for delivery scheduling analysis
Implement macros for fraud detection
Perform dataset merging and transformation
Create utilization classifications
Generate an optimized final dataset for analysis

Dataset Structure

The dataset includes the following variables:

Variable	Description
Delivery_ID	Unique identifier for delivery
City	City where delivery occurred
Distance_KM	Distance traveled
Delivery_Time	Time taken for delivery
Failed_Attempts	Number of failed attempts
Fuel_Cost	Fuel expenditure
Service_Rating	Customer rating
Percentage	Performance percentage
Fees	Delivery charges
Utilization	Operational classification
Delivery_Date	Date of delivery
Driver_Name	Name of delivery driver

1. Raw Dataset With Intentional Errors

data delivery_raw;

length Delivery_ID $6 City $20 Driver_Name $25 Utilization $20;

input Delivery_ID $ City $ Distance_km Delivery_Time Failed_Attempts

Fuel_Cost Service_Rating Percentage Fees Driver_Name $ Delivery_Date :ddmmyy10.;

datalines;

D001 hyderabad 12 45 0 150 4.5 90 40 ramesh 12/01/2025

D002 BANGALORE 20 . 1 200 4.2 88 50 SURESH 15/01/2025

D003 chennai -10 60 0 180 4.8 92 45 mahesh 18/01/2025

D004 mumbai 15 55 2 -120 3.5 70 35 anil 20/01/2025

D005 DELHI 25 70 0 220 5.5 95 55 raju 25/01/2025

D006 kolkata 18 65 1 190 4.1 85 42 kumar 28/01/2025

D007 Hyderabad 22 80 3 250 3.8 60 60 venkat 02/02/2025

D008 bangalore 14 50 0 160 4.4 89 41 ravi 05/02/2025

D009 chennai 30 90 2 300 3.9 75 70 ganesh 07/02/2025

D010 mumbai 11 40 0 120 4.7 91 38 sandeep 10/02/2025

D011 delhi 17 58 1 170 4.3 87 43 karthik 12/02/2025

D012 kolkata 16 52 0 165 4.6 90 40 ajay 14/02/2025

D013 hyderabad 19 68 0 195 4.2 86 48 manoj 16/02/2025

D014 bangalore 23 75 1 210 4.1 84 52 srikanth 18/02/2025

D015 chennai 27 82 2 240 3.7 72 58 praveen 20/02/2025

D016 mumbai 13 48 0 155 4.5 93 39 rajesh 22/02/2025

D017 delhi 21 70 1 205 4.0 83 51 deepak 25/02/2025

D018 kolkata 24 78 0 215 4.6 92 54 arjun 27/02/2025

;

run;

proc print data=delivery_raw;

run;

OUTPUT:


Obs	Delivery_ID	City	Driver_Name	Distance_km	Delivery_Time	Failed_Attempts	Fuel_Cost	Service_Rating	Percentage	Fees	Delivery_Date
1	D001	hyderabad	ramesh	12	45	0	150	4.5	90	40	23753
2	D002	BANGALORE	SURESH	20	.	1	200	4.2	88	50	23756
3	D003	chennai	mahesh	-10	60	0	180	4.8	92	45	23759
4	D004	mumbai	anil	15	55	2	-120	3.5	70	35	23761
5	D005	DELHI	raju	25	70	0	220	5.5	95	55	23766
6	D006	kolkata	kumar	18	65	1	190	4.1	85	42	23769
7	D007	Hyderabad	venkat	22	80	3	250	3.8	60	60	23774
8	D008	bangalore	ravi	14	50	0	160	4.4	89	41	23777
9	D009	chennai	ganesh	30	90	2	300	3.9	75	70	23779
10	D010	mumbai	sandeep	11	40	0	120	4.7	91	38	23782
11	D011	delhi	karthik	17	58	1	170	4.3	87	43	23784
12	D012	kolkata	ajay	16	52	0	165	4.6	90	40	23786
13	D013	hyderabad	manoj	19	68	0	195	4.2	86	48	23788
14	D014	bangalore	srikanth	23	75	1	210	4.1	84	52	23790
15	D015	chennai	praveen	27	82	2	240	3.7	72	58	23792
16	D016	mumbai	rajesh	13	48	0	155	4.5	93	39	23794
17	D017	delhi	deepak	21	70	1	205	4.0	83	51	23797
18	D018	kolkata	arjun	24	78	0	215	4.6	92	54	23799

Understanding The Errors In The Dataset

Several intentional errors exist in the dataset:

1.Negative Distance

D003 chennai -10

Distance cannot be negative.

2.Missing Delivery Time

D002 BANGALORE 20 .

Delivery time is missing.

3.Negative Fuel Cost

D004 mumbai fuel_cost = -120

Fuel cost cannot be negative.

4.Invalid Service Rating

D005 rating = 5.5

Rating must be between 1 and 5.

5.Inconsistent City Names

Example:

hyderabad

HYDERABAD

Hyderabad

Needs standard formatting.

2. Corrected SAS Code

data delivery_clean;

set delivery_raw;

City = propcase(strip(City));

Driver_Name = propcase(strip(Driver_Name));

Distance_km = abs(Distance_km);

Fuel_Cost = abs(Fuel_Cost);

if Service_Rating > 5 then Service_Rating = 5;

if Delivery_Time = . then Delivery_Time = round(Distance_km*3);

Delivery_Date2 = mdy(month(Delivery_Date),day(Delivery_Date),year(Delivery_Date));

Month_Gap = intck('month','01JAN2025'd,Delivery_Date);

Next_Service = intnx('month',Delivery_Date,1);

if Distance_km <=15 then Utilization="Low";

else if Distance_km <=25 then Utilization="Medium";

else Utilization="High";

run;

proc print data=delivery_clean;

run;

OUTPUT:


Obs	Delivery_ID	City	Driver_Name	Utilization	Distance_km	Delivery_Time	Failed_Attempts	Fuel_Cost	Service_Rating	Percentage	Fees	Delivery_Date	Delivery_Date2	Month_Gap	Next_Service
1	D001	Hyderabad	Ramesh	Low	12	45	0	150	4.5	90	40	23753	23753	0	23773
2	D002	Bangalore	Suresh	Medium	20	60	1	200	4.2	88	50	23756	23756	0	23773
3	D003	Chennai	Mahesh	Low	10	60	0	180	4.8	92	45	23759	23759	0	23773
4	D004	Mumbai	Anil	Low	15	55	2	120	3.5	70	35	23761	23761	0	23773
5	D005	Delhi	Raju	Medium	25	70	0	220	5.0	95	55	23766	23766	0	23773
6	D006	Kolkata	Kumar	Medium	18	65	1	190	4.1	85	42	23769	23769	0	23773
7	D007	Hyderabad	Venkat	Medium	22	80	3	250	3.8	60	60	23774	23774	1	23801
8	D008	Bangalore	Ravi	Low	14	50	0	160	4.4	89	41	23777	23777	1	23801
9	D009	Chennai	Ganesh	High	30	90	2	300	3.9	75	70	23779	23779	1	23801
10	D010	Mumbai	Sandeep	Low	11	40	0	120	4.7	91	38	23782	23782	1	23801
11	D011	Delhi	Karthik	Medium	17	58	1	170	4.3	87	43	23784	23784	1	23801
12	D012	Kolkata	Ajay	Medium	16	52	0	165	4.6	90	40	23786	23786	1	23801
13	D013	Hyderabad	Manoj	Medium	19	68	0	195	4.2	86	48	23788	23788	1	23801
14	D014	Bangalore	Srikanth	Medium	23	75	1	210	4.1	84	52	23790	23790	1	23801
15	D015	Chennai	Praveen	High	27	82	2	240	3.7	72	58	23792	23792	1	23801
16	D016	Mumbai	Rajesh	Low	13	48	0	155	4.5	93	39	23794	23794	1	23801
17	D017	Delhi	Deepak	Medium	21	70	1	205	4.0	83	51	23797	23797	1	23801
18	D018	Kolkata	Arjun	Medium	24	78	0	215	4.6	92	54	23799	23799	1	23801

Character Functions

Functions used include:

STRIP

Removes leading and trailing spaces.

TRIM

Removes trailing spaces.

CAT

Concatenates values.

CATX

Concatenates values with delimiter.

PROPCASE

Converts text to proper case.

UPCASE

Converts text to uppercase.

LOWCASE

Converts text to lowercase.

COALESCE

Handles missing values.

Numeric Functions

Examples used:

ABS( )

ROUND( )

SUM( )

MEAN( )

These help ensure numerical accuracy and standardization.

Date Functions

MDY

Creates SAS date from month, day, year.

INTCK

Counts intervals between dates.

INTNX

Moves date forward or backward.

3. Best Practice In Real Projects

Before using PROC APPEND, analysts usually check dataset structures using:

proc contents data=delivery_clean;

run;

proc print data=delivery_clean;

run;

OUTPUT:

The CONTENTS Procedure


Data Set Name	WORK.DELIVERY_CLEAN	Observations	18
Member Type	DATA	Variables	15
Engine	V9	Indexes	0
Created	03/06/2026 16:39:51	Observation Length	160
Last Modified	03/06/2026 16:39:51	Deleted Observations	0
Protection		Compressed	NO
Data Set Type		Sorted	NO
Label
Data Representation	SOLARIS_X86_64, LINUX_X86_64, ALPHA_TRU64, LINUX_IA64
Encoding	utf-8 Unicode (UTF-8)


Engine/Host Dependent Information
Data Set Page Size	131072
Number of Data Set Pages	1
First Data Page	1
Max Obs per Page	818
Obs in First Data Page	18
Number of Data Set Repairs	0
Filename	/saswork/SAS_workD19200007C04_odaws01-apse1-2.oda.sas.com/SAS_workF85F00007C04_odaws01-apse1-2.oda.sas.com/delivery_clean.sas7bdat
Release Created	9.0401M8
Host Created	Linux
Inode Number	201344327
Access Permission	rw-r--r--
Owner Name	u63247146
File Size	256KB
File Size (bytes)	262144


Alphabetic List of Variables and Attributes
#	Variable	Type	Len
2	City	Char	20
12	Delivery_Date	Num	8
13	Delivery_Date2	Num	8
1	Delivery_ID	Char	6
6	Delivery_Time	Num	8
5	Distance_km	Num	8
3	Driver_Name	Char	25
7	Failed_Attempts	Num	8
11	Fees	Num	8
8	Fuel_Cost	Num	8
14	Month_Gap	Num	8
15	Next_Service	Num	8
10	Percentage	Num	8
9	Service_Rating	Num	8
4	Utilization	Char	20


Obs	Delivery_ID	City	Driver_Name	Utilization	Distance_km	Delivery_Time	Failed_Attempts	Fuel_Cost	Service_Rating	Percentage	Fees	Delivery_Date	Delivery_Date2	Month_Gap	Next_Service
1	D001	Hyderabad	Ramesh	Low	12	45	0	150	4.5	90	40	23753	23753	0	23773
2	D002	Bangalore	Suresh	Medium	20	60	1	200	4.2	88	50	23756	23756	0	23773
3	D003	Chennai	Mahesh	Low	10	60	0	180	4.8	92	45	23759	23759	0	23773
4	D004	Mumbai	Anil	Low	15	55	2	120	3.5	70	35	23761	23761	0	23773
5	D005	Delhi	Raju	Medium	25	70	0	220	5.0	95	55	23766	23766	0	23773
6	D006	Kolkata	Kumar	Medium	18	65	1	190	4.1	85	42	23769	23769	0	23773
7	D007	Hyderabad	Venkat	Medium	22	80	3	250	3.8	60	60	23774	23774	1	23801
8	D008	Bangalore	Ravi	Low	14	50	0	160	4.4	89	41	23777	23777	1	23801
9	D009	Chennai	Ganesh	High	30	90	2	300	3.9	75	70	23779	23779	1	23801
10	D010	Mumbai	Sandeep	Low	11	40	0	120	4.7	91	38	23782	23782	1	23801
11	D011	Delhi	Karthik	Medium	17	58	1	170	4.3	87	43	23784	23784	1	23801
12	D012	Kolkata	Ajay	Medium	16	52	0	165	4.6	90	40	23786	23786	1	23801
13	D013	Hyderabad	Manoj	Medium	19	68	0	195	4.2	86	48	23788	23788	1	23801
14	D014	Bangalore	Srikanth	Medium	23	75	1	210	4.1	84	52	23790	23790	1	23801
15	D015	Chennai	Praveen	High	27	82	2	240	3.7	72	58	23792	23792	1	23801
16	D016	Mumbai	Rajesh	Low	13	48	0	155	4.5	93	39	23794	23794	1	23801
17	D017	Delhi	Deepak	Medium	21	70	1	205	4.0	83	51	23797	23797	1	23801
18	D018	Kolkata	Arjun	Medium	24	78	0	215	4.6	92	54	23799	23799	1	23801

proc contents data=delivery_new;

run;

proc print data=delivery_new;

run;

OUTPUT:

The CONTENTS Procedure


Data Set Name	WORK.DELIVERY_NEW	Observations	3
Member Type	DATA	Variables	11
Engine	V9	Indexes	0
Created	03/06/2026 16:40:50	Observation Length	120
Last Modified	03/06/2026 16:40:50	Deleted Observations	0
Protection		Compressed	NO
Data Set Type		Sorted	NO
Label
Data Representation	SOLARIS_X86_64, LINUX_X86_64, ALPHA_TRU64, LINUX_IA64
Encoding	utf-8 Unicode (UTF-8)


Engine/Host Dependent Information
Data Set Page Size	131072
Number of Data Set Pages	1
First Data Page	1
Max Obs per Page	1090
Obs in First Data Page	3
Number of Data Set Repairs	0
Filename	/saswork/SAS_workD19200007C04_odaws01-apse1-2.oda.sas.com/SAS_workF85F00007C04_odaws01-apse1-2.oda.sas.com/delivery_new.sas7bdat
Release Created	9.0401M8
Host Created	Linux
Inode Number	201344326
Access Permission	rw-r--r--
Owner Name	u63247146
File Size	256KB
File Size (bytes)	262144


Alphabetic List of Variables and Attributes
#	Variable	Type	Len
2	City	Char	20
11	Delivery_Date	Num	8
1	Delivery_ID	Char	6
5	Delivery_Time	Num	8
4	Distance_km	Num	8
3	Driver_Name	Char	25
6	Failed_Attempts	Num	8
10	Fees	Num	8
7	Fuel_Cost	Num	8
9	Percentage	Num	8
8	Service_Rating	Num	8


Obs	Delivery_ID	City	Driver_Name	Distance_km	Delivery_Time	Failed_Attempts	Fuel_Cost	Service_Rating	Percentage	Fees	Delivery_Date
1	D019	Hyderabad	Raghav	26	85	1	230	4.4	90	56	23801
2	D020	Bangalore	Mohan	18	62	0	175	4.6	92	44	23803
3	D021	Chennai	Suresh	22	72	1	205	4.3	87	49	23805

· Variables match

· Data types match

· Lengths match

4. PROC APPEND

Create New Dataset

data delivery_new;

length Delivery_ID $6 City $20 Driver_Name $25;

input Delivery_ID $ City $ Distance_km Delivery_Time Failed_Attempts Fuel_Cost

Service_Rating Percentage Fees Driver_Name $ Delivery_Date :ddmmyy10.;

datalines;

D019 Hyderabad 26 85 1 230 4.4 90 56 Raghav 01/03/2025

D020 Bangalore 18 62 0 175 4.6 92 44 Mohan 03/03/2025

D021 Chennai 22 72 1 205 4.3 87 49 Suresh 05/03/2025

;

run;

proc print data=delivery_new;

run;

OUTPUT:


Obs	Delivery_ID	City	Driver_Name	Distance_km	Delivery_Time	Failed_Attempts	Fuel_Cost	Service_Rating	Percentage	Fees	Delivery_Date
1	D019	Hyderabad	Raghav	26	85	1	230	4.4	90	56	23801
2	D020	Bangalore	Mohan	18	62	0	175	4.6	92	44	23803
3	D021	Chennai	Suresh	22	72	1	205	4.3	87	49	23805

Explanation Of The Code

`DATA delivery_new`

Creates a new SAS dataset that contains new delivery records.

`length`

Defines the character variable lengths.

Example:

length Delivery_ID $6

This ensures SAS stores the ID as a character value with a length of 6.

Why it is important:

· Prevents truncation

· Ensures consistent structure with the base dataset.

`INPUT Statement`

Reads variables in the order specified.

`:ddmmyy10.`

This informat reads dates written as:

01/03/2025

Without this informat, SAS would not correctly interpret the date.

`DATALINES`

Allows manual entry of data.

Each row represents a new delivery record.

PROC APPEND

proc append base=delivery_clean

data=delivery_new;

run;

proc print data=delivery_clean;

run;

OUTPUT:


Obs	Delivery_ID	City	Driver_Name	Utilization	Distance_km	Delivery_Time	Failed_Attempts	Fuel_Cost	Service_Rating	Percentage	Fees	Delivery_Date	Delivery_Date2	Month_Gap	Next_Service
1	D001	Hyderabad	Ramesh	Low	12	45	0	150	4.5	90	40	23753	23753	0	23773
2	D002	Bangalore	Suresh	Medium	20	60	1	200	4.2	88	50	23756	23756	0	23773
3	D003	Chennai	Mahesh	Low	10	60	0	180	4.8	92	45	23759	23759	0	23773
4	D004	Mumbai	Anil	Low	15	55	2	120	3.5	70	35	23761	23761	0	23773
5	D005	Delhi	Raju	Medium	25	70	0	220	5.0	95	55	23766	23766	0	23773
6	D006	Kolkata	Kumar	Medium	18	65	1	190	4.1	85	42	23769	23769	0	23773
7	D007	Hyderabad	Venkat	Medium	22	80	3	250	3.8	60	60	23774	23774	1	23801
8	D008	Bangalore	Ravi	Low	14	50	0	160	4.4	89	41	23777	23777	1	23801
9	D009	Chennai	Ganesh	High	30	90	2	300	3.9	75	70	23779	23779	1	23801
10	D010	Mumbai	Sandeep	Low	11	40	0	120	4.7	91	38	23782	23782	1	23801
11	D011	Delhi	Karthik	Medium	17	58	1	170	4.3	87	43	23784	23784	1	23801
12	D012	Kolkata	Ajay	Medium	16	52	0	165	4.6	90	40	23786	23786	1	23801
13	D013	Hyderabad	Manoj	Medium	19	68	0	195	4.2	86	48	23788	23788	1	23801
14	D014	Bangalore	Srikanth	Medium	23	75	1	210	4.1	84	52	23790	23790	1	23801
15	D015	Chennai	Praveen	High	27	82	2	240	3.7	72	58	23792	23792	1	23801
16	D016	Mumbai	Rajesh	Low	13	48	0	155	4.5	93	39	23794	23794	1	23801
17	D017	Delhi	Deepak	Medium	21	70	1	205	4.0	83	51	23797	23797	1	23801
18	D018	Kolkata	Arjun	Medium	24	78	0	215	4.6	92	54	23799	23799	1	23801
19	D019	Hyderabad	Raghav		26	85	1	230	4.4	90	56	23801	.	.	.
20	D020	Bangalore	Mohan		18	62	0	175	4.6	92	44	23803	.	.	.
21	D021	Chennai	Suresh		22	72	1	205	4.3	87	49	23805	.	.	.

Explanation Of PROC APPEND

`PROC APPEND`

Used to add observations from one dataset to another without recreating the entire dataset.

`BASE=delivery_clean`

This is the target dataset.

New records will be added here.

`DATA=delivery_new`

This dataset contains new observations to be appended.

What Happens After Execution

Before append:

delivery_clean → 18 observations

delivery_new   → 3 observations

After append:

delivery_clean → 21 observations

Advantages:

· Faster

· Does not recreate dataset

· Efficient for large datasets

Why PROC APPEND Is Used Instead Of SET

DATA STEP SET

data combined;

set delivery_clean

delivery_new;

run;

proc print data=combined;

run;

OUTPUT:


Obs	Delivery_ID	City	Driver_Name	Utilization	Distance_km	Delivery_Time	Failed_Attempts	Fuel_Cost	Service_Rating	Percentage	Fees	Delivery_Date	Delivery_Date2	Month_Gap	Next_Service
1	D001	Hyderabad	Ramesh	Low	12	45	0	150	4.5	90	40	23753	23753	0	23773
2	D002	Bangalore	Suresh	Medium	20	60	1	200	4.2	88	50	23756	23756	0	23773
3	D003	Chennai	Mahesh	Low	10	60	0	180	4.8	92	45	23759	23759	0	23773
4	D004	Mumbai	Anil	Low	15	55	2	120	3.5	70	35	23761	23761	0	23773
5	D005	Delhi	Raju	Medium	25	70	0	220	5.0	95	55	23766	23766	0	23773
6	D006	Kolkata	Kumar	Medium	18	65	1	190	4.1	85	42	23769	23769	0	23773
7	D007	Hyderabad	Venkat	Medium	22	80	3	250	3.8	60	60	23774	23774	1	23801
8	D008	Bangalore	Ravi	Low	14	50	0	160	4.4	89	41	23777	23777	1	23801
9	D009	Chennai	Ganesh	High	30	90	2	300	3.9	75	70	23779	23779	1	23801
10	D010	Mumbai	Sandeep	Low	11	40	0	120	4.7	91	38	23782	23782	1	23801
11	D011	Delhi	Karthik	Medium	17	58	1	170	4.3	87	43	23784	23784	1	23801
12	D012	Kolkata	Ajay	Medium	16	52	0	165	4.6	90	40	23786	23786	1	23801
13	D013	Hyderabad	Manoj	Medium	19	68	0	195	4.2	86	48	23788	23788	1	23801
14	D014	Bangalore	Srikanth	Medium	23	75	1	210	4.1	84	52	23790	23790	1	23801
15	D015	Chennai	Praveen	High	27	82	2	240	3.7	72	58	23792	23792	1	23801
16	D016	Mumbai	Rajesh	Low	13	48	0	155	4.5	93	39	23794	23794	1	23801
17	D017	Delhi	Deepak	Medium	21	70	1	205	4.0	83	51	23797	23797	1	23801
18	D018	Kolkata	Arjun	Medium	24	78	0	215	4.6	92	54	23799	23799	1	23801
19	D019	Hyderabad	Raghav		26	85	1	230	4.4	90	56	23801	.	.	.
20	D020	Bangalore	Mohan		18	62	0	175	4.6	92	44	23803	.	.	.
21	D021	Chennai	Suresh		22	72	1	205	4.3	87	49	23805	.	.	.
22	D019	Hyderabad	Raghav		26	85	1	230	4.4	90	56	23801	.	.	.
23	D020	Bangalore	Mohan		18	62	0	175	4.6	92	44	23803	.	.	.
24	D021	Chennai	Suresh		22	72	1	205	4.3	87	49	23805	.	.	.

Why PROC APPEND Is Used Instead Of SET

Problem:

· Creates a new dataset

· Requires extra storage

5. Best Practice In SAS Projects

Before merging, always check dataset structure:

proc contents data=delivery_clean;

run;

proc print data=delivery_clean;

run;

OUTPUT:

The CONTENTS Procedure


Data Set Name	WORK.DELIVERY_CLEAN	Observations	21
Member Type	DATA	Variables	15
Engine	V9	Indexes	0
Created	03/06/2026 16:39:51	Observation Length	160
Last Modified	03/06/2026 16:41:30	Deleted Observations	0
Protection		Compressed	NO
Data Set Type		Sorted	NO
Label
Data Representation	SOLARIS_X86_64, LINUX_X86_64, ALPHA_TRU64, LINUX_IA64
Encoding	utf-8 Unicode (UTF-8)


Engine/Host Dependent Information
Data Set Page Size	131072
Number of Data Set Pages	1
First Data Page	1
Max Obs per Page	818
Obs in First Data Page	21
Number of Data Set Repairs	0
Filename	/saswork/SAS_workD19200007C04_odaws01-apse1-2.oda.sas.com/SAS_workF85F00007C04_odaws01-apse1-2.oda.sas.com/delivery_clean.sas7bdat
Release Created	9.0401M8
Host Created	Linux
Inode Number	201344327
Access Permission	rw-r--r--
Owner Name	u63247146
File Size	256KB
File Size (bytes)	262144


Alphabetic List of Variables and Attributes
#	Variable	Type	Len
2	City	Char	20
12	Delivery_Date	Num	8
13	Delivery_Date2	Num	8
1	Delivery_ID	Char	6
6	Delivery_Time	Num	8
5	Distance_km	Num	8
3	Driver_Name	Char	25
7	Failed_Attempts	Num	8
11	Fees	Num	8
8	Fuel_Cost	Num	8
14	Month_Gap	Num	8
15	Next_Service	Num	8
10	Percentage	Num	8
9	Service_Rating	Num	8
4	Utilization	Char	20


Obs	Delivery_ID	City	Driver_Name	Utilization	Distance_km	Delivery_Time	Failed_Attempts	Fuel_Cost	Service_Rating	Percentage	Fees	Delivery_Date	Delivery_Date2	Month_Gap	Next_Service
1	D001	Hyderabad	Ramesh	Low	12	45	0	150	4.5	90	40	23753	23753	0	23773
2	D002	Bangalore	Suresh	Medium	20	60	1	200	4.2	88	50	23756	23756	0	23773
3	D003	Chennai	Mahesh	Low	10	60	0	180	4.8	92	45	23759	23759	0	23773
4	D004	Mumbai	Anil	Low	15	55	2	120	3.5	70	35	23761	23761	0	23773
5	D005	Delhi	Raju	Medium	25	70	0	220	5.0	95	55	23766	23766	0	23773
6	D006	Kolkata	Kumar	Medium	18	65	1	190	4.1	85	42	23769	23769	0	23773
7	D007	Hyderabad	Venkat	Medium	22	80	3	250	3.8	60	60	23774	23774	1	23801
8	D008	Bangalore	Ravi	Low	14	50	0	160	4.4	89	41	23777	23777	1	23801
9	D009	Chennai	Ganesh	High	30	90	2	300	3.9	75	70	23779	23779	1	23801
10	D010	Mumbai	Sandeep	Low	11	40	0	120	4.7	91	38	23782	23782	1	23801
11	D011	Delhi	Karthik	Medium	17	58	1	170	4.3	87	43	23784	23784	1	23801
12	D012	Kolkata	Ajay	Medium	16	52	0	165	4.6	90	40	23786	23786	1	23801
13	D013	Hyderabad	Manoj	Medium	19	68	0	195	4.2	86	48	23788	23788	1	23801
14	D014	Bangalore	Srikanth	Medium	23	75	1	210	4.1	84	52	23790	23790	1	23801
15	D015	Chennai	Praveen	High	27	82	2	240	3.7	72	58	23792	23792	1	23801
16	D016	Mumbai	Rajesh	Low	13	48	0	155	4.5	93	39	23794	23794	1	23801
17	D017	Delhi	Deepak	Medium	21	70	1	205	4.0	83	51	23797	23797	1	23801
18	D018	Kolkata	Arjun	Medium	24	78	0	215	4.6	92	54	23799	23799	1	23801
19	D019	Hyderabad	Raghav		26	85	1	230	4.4	90	56	23801	.	.	.
20	D020	Bangalore	Mohan		18	62	0	175	4.6	92	44	23803	.	.	.
21	D021	Chennai	Suresh		22	72	1	205	4.3	87	49	23805	.	.	.

proc contents data=driver_data;

run;

proc print data=driver_data;

run;

OUTPUT:

The CONTENTS Procedure


Data Set Name	WORK.DRIVER_DATA	Observations	18
Member Type	DATA	Variables	4
Engine	V9	Indexes	0
Created	03/06/2026 16:42:31	Observation Length	56
Last Modified	03/06/2026 16:42:31	Deleted Observations	0
Protection		Compressed	NO
Data Set Type		Sorted	NO
Label
Data Representation	SOLARIS_X86_64, LINUX_X86_64, ALPHA_TRU64, LINUX_IA64
Encoding	utf-8 Unicode (UTF-8)


Engine/Host Dependent Information
Data Set Page Size	131072
Number of Data Set Pages	1
First Data Page	1
Max Obs per Page	2334
Obs in First Data Page	18
Number of Data Set Repairs	0
Filename	/saswork/SAS_workD19200007C04_odaws01-apse1-2.oda.sas.com/SAS_workF85F00007C04_odaws01-apse1-2.oda.sas.com/driver_data.sas7bdat
Release Created	9.0401M8
Host Created	Linux
Inode Number	201344328
Access Permission	rw-r--r--
Owner Name	u63247146
File Size	256KB
File Size (bytes)	262144


Alphabetic List of Variables and Attributes
#	Variable	Type	Len
1	Delivery_ID	Char	6
2	Driver_Name	Char	25
3	Driver_Rating	Num	8
4	Vehicle_Type	Char	15


Obs	Delivery_ID	Driver_Name	Driver_Rating	Vehicle_Type
1	D001	Ramesh	4.5	Bike
2	D002	Suresh	4.2	Van
3	D003	Mahesh	4.6	Bike
4	D004	Anil	3.8	Truck
5	D005	Raju	4.7	Bike
6	D006	Kumar	4.3	Van
7	D007	Venkat	3.9	Bike
8	D008	Ravi	4.4	Bike
9	D009	Ganesh	4.0	Van
10	D010	Sandeep	4.6	Bike
11	D011	Karthik	4.3	Van
12	D012	Ajay	4.5	Bike
13	D013	Manoj	4.2	Bike
14	D014	Srikanth	4.1	Truck
15	D015	Praveen	3.7	Van
16	D016	Rajesh	4.6	Bike
17	D017	Deepak	4.0	Bike
18	D018	Arjun	4.5	Van

· Variable types match

· Variable lengths match

· Key variables exist

6. MERGE

Step 1 — Create Driver Dataset

data driver_data;

length Delivery_ID $6 Driver_Name $25 Driver_Rating 8 Vehicle_Type $15;

input Delivery_ID $ Driver_Name $ Driver_Rating Vehicle_Type $;

datalines;

D001 Ramesh 4.5 Bike

D002 Suresh 4.2 Van

D003 Mahesh 4.6 Bike

D004 Anil 3.8 Truck

D005 Raju 4.7 Bike

D006 Kumar 4.3 Van

D007 Venkat 3.9 Bike

D008 Ravi 4.4 Bike

D009 Ganesh 4.0 Van

D010 Sandeep 4.6 Bike

D011 Karthik 4.3 Van

D012 Ajay 4.5 Bike

D013 Manoj 4.2 Bike

D014 Srikanth 4.1 Truck

D015 Praveen 3.7 Van

D016 Rajesh 4.6 Bike

D017 Deepak 4.0 Bike

D018 Arjun 4.5 Van

;

run;

proc print data=driver_data;

run;

OUTPUT:


Obs	Delivery_ID	Driver_Name	Driver_Rating	Vehicle_Type
1	D001	Ramesh	4.5	Bike
2	D002	Suresh	4.2	Van
3	D003	Mahesh	4.6	Bike
4	D004	Anil	3.8	Truck
5	D005	Raju	4.7	Bike
6	D006	Kumar	4.3	Van
7	D007	Venkat	3.9	Bike
8	D008	Ravi	4.4	Bike
9	D009	Ganesh	4.0	Van
10	D010	Sandeep	4.6	Bike
11	D011	Karthik	4.3	Van
12	D012	Ajay	4.5	Bike
13	D013	Manoj	4.2	Bike
14	D014	Srikanth	4.1	Truck
15	D015	Praveen	3.7	Van
16	D016	Rajesh	4.6	Bike
17	D017	Deepak	4.0	Bike
18	D018	Arjun	4.5	Van

Explanation Of The Code

`DATA driver_data`

Creates a new dataset that stores driver-level information.

This dataset will later be merged with delivery data.

`LENGTH Statement`

length Delivery_ID $6 Driver_Name $25 Vehicle_Type $15;

Driver_Name $25

means SAS can store up to 25 characters.

· Defines character variable sizes

· Prevents variable truncation

· Ensures consistent structure during merge

`INPUT Statement`

Defines how SAS should read the variables.

Variables read:

Delivery_ID

Driver_Name

Driver_Rating

Vehicle_Type

Each row represents driver information linked to a delivery.

Example row:

D001 Ramesh 4.5 Bike

Variable	Value
Delivery_ID	D001
Driver_Name	Ramesh
Driver_Rating	4.5
Vehicle_Type	Bike

Step 2 — Sort Datasets

proc sort data=delivery_clean;

by Delivery_ID;

run;

proc print data=delivery_clean;

run;

OUTPUT:


Obs	Delivery_ID	City	Driver_Name	Utilization	Distance_km	Delivery_Time	Failed_Attempts	Fuel_Cost	Service_Rating	Percentage	Fees	Delivery_Date	Delivery_Date2	Month_Gap	Next_Service
1	D001	Hyderabad	Ramesh	Low	12	45	0	150	4.5	90	40	23753	23753	0	23773
2	D002	Bangalore	Suresh	Medium	20	60	1	200	4.2	88	50	23756	23756	0	23773
3	D003	Chennai	Mahesh	Low	10	60	0	180	4.8	92	45	23759	23759	0	23773
4	D004	Mumbai	Anil	Low	15	55	2	120	3.5	70	35	23761	23761	0	23773
5	D005	Delhi	Raju	Medium	25	70	0	220	5.0	95	55	23766	23766	0	23773
6	D006	Kolkata	Kumar	Medium	18	65	1	190	4.1	85	42	23769	23769	0	23773
7	D007	Hyderabad	Venkat	Medium	22	80	3	250	3.8	60	60	23774	23774	1	23801
8	D008	Bangalore	Ravi	Low	14	50	0	160	4.4	89	41	23777	23777	1	23801
9	D009	Chennai	Ganesh	High	30	90	2	300	3.9	75	70	23779	23779	1	23801
10	D010	Mumbai	Sandeep	Low	11	40	0	120	4.7	91	38	23782	23782	1	23801
11	D011	Delhi	Karthik	Medium	17	58	1	170	4.3	87	43	23784	23784	1	23801
12	D012	Kolkata	Ajay	Medium	16	52	0	165	4.6	90	40	23786	23786	1	23801
13	D013	Hyderabad	Manoj	Medium	19	68	0	195	4.2	86	48	23788	23788	1	23801
14	D014	Bangalore	Srikanth	Medium	23	75	1	210	4.1	84	52	23790	23790	1	23801
15	D015	Chennai	Praveen	High	27	82	2	240	3.7	72	58	23792	23792	1	23801
16	D016	Mumbai	Rajesh	Low	13	48	0	155	4.5	93	39	23794	23794	1	23801
17	D017	Delhi	Deepak	Medium	21	70	1	205	4.0	83	51	23797	23797	1	23801
18	D018	Kolkata	Arjun	Medium	24	78	0	215	4.6	92	54	23799	23799	1	23801
19	D019	Hyderabad	Raghav		26	85	1	230	4.4	90	56	23801	.	.	.
20	D020	Bangalore	Mohan		18	62	0	175	4.6	92	44	23803	.	.	.
21	D021	Chennai	Suresh		22	72	1	205	4.3	87	49	23805	.	.	.

proc sort data=driver_data;

by Delivery_ID;

run;

proc print data=driver_data;

run;

OUTPUT:


Obs	Delivery_ID	Driver_Name	Driver_Rating	Vehicle_Type
1	D001	Ramesh	4.5	Bike
2	D002	Suresh	4.2	Van
3	D003	Mahesh	4.6	Bike
4	D004	Anil	3.8	Truck
5	D005	Raju	4.7	Bike
6	D006	Kumar	4.3	Van
7	D007	Venkat	3.9	Bike
8	D008	Ravi	4.4	Bike
9	D009	Ganesh	4.0	Van
10	D010	Sandeep	4.6	Bike
11	D011	Karthik	4.3	Van
12	D012	Ajay	4.5	Bike
13	D013	Manoj	4.2	Bike
14	D014	Srikanth	4.1	Truck
15	D015	Praveen	3.7	Van
16	D016	Rajesh	4.6	Bike
17	D017	Deepak	4.0	Bike
18	D018	Arjun	4.5	Van

Why Sorting Is Required

The MERGE statement performs sequential matching.

Without sorting, SAS may show:

ERROR: BY variables are not properly sorted.

Step 3 — Perform MERGE

data delivery_final;

merge delivery_clean driver_data;

by Delivery_ID;

run;

proc print data=delivery_final;

run;

OUTPUT:


Obs	Delivery_ID	City	Driver_Name	Utilization	Distance_km	Delivery_Time	Failed_Attempts	Fuel_Cost	Service_Rating	Percentage	Fees	Delivery_Date	Delivery_Date2	Month_Gap	Next_Service	Driver_Rating	Vehicle_Type
1	D001	Hyderabad	Ramesh	Low	12	45	0	150	4.5	90	40	23753	23753	0	23773	4.5	Bike
2	D002	Bangalore	Suresh	Medium	20	60	1	200	4.2	88	50	23756	23756	0	23773	4.2	Van
3	D003	Chennai	Mahesh	Low	10	60	0	180	4.8	92	45	23759	23759	0	23773	4.6	Bike
4	D004	Mumbai	Anil	Low	15	55	2	120	3.5	70	35	23761	23761	0	23773	3.8	Truck
5	D005	Delhi	Raju	Medium	25	70	0	220	5.0	95	55	23766	23766	0	23773	4.7	Bike
6	D006	Kolkata	Kumar	Medium	18	65	1	190	4.1	85	42	23769	23769	0	23773	4.3	Van
7	D007	Hyderabad	Venkat	Medium	22	80	3	250	3.8	60	60	23774	23774	1	23801	3.9	Bike
8	D008	Bangalore	Ravi	Low	14	50	0	160	4.4	89	41	23777	23777	1	23801	4.4	Bike
9	D009	Chennai	Ganesh	High	30	90	2	300	3.9	75	70	23779	23779	1	23801	4.0	Van
10	D010	Mumbai	Sandeep	Low	11	40	0	120	4.7	91	38	23782	23782	1	23801	4.6	Bike
11	D011	Delhi	Karthik	Medium	17	58	1	170	4.3	87	43	23784	23784	1	23801	4.3	Van
12	D012	Kolkata	Ajay	Medium	16	52	0	165	4.6	90	40	23786	23786	1	23801	4.5	Bike
13	D013	Hyderabad	Manoj	Medium	19	68	0	195	4.2	86	48	23788	23788	1	23801	4.2	Bike
14	D014	Bangalore	Srikanth	Medium	23	75	1	210	4.1	84	52	23790	23790	1	23801	4.1	Truck
15	D015	Chennai	Praveen	High	27	82	2	240	3.7	72	58	23792	23792	1	23801	3.7	Van
16	D016	Mumbai	Rajesh	Low	13	48	0	155	4.5	93	39	23794	23794	1	23801	4.6	Bike
17	D017	Delhi	Deepak	Medium	21	70	1	205	4.0	83	51	23797	23797	1	23801	4.0	Bike
18	D018	Kolkata	Arjun	Medium	24	78	0	215	4.6	92	54	23799	23799	1	23801	4.5	Van
19	D019	Hyderabad	Raghav		26	85	1	230	4.4	90	56	23801	.	.	.	.
20	D020	Bangalore	Mohan		18	62	0	175	4.6	92	44	23803	.	.	.	.
21	D021	Chennai	Suresh		22	72	1	205	4.3	87	49	23805	.	.	.	.

Explanation Of MERGE Code

`DATA delivery_final`

Creates a new dataset containing combined information.

`MERGE Statement`

merge delivery_clean driver_data;

Combines both datasets row-by-row using a common key.

`BY Delivery_ID`

Specifies the matching variable.

SAS matches records like this:

delivery_clean

driver_data

Delivery_ID

Example:

D001 → delivery record

D001 → driver record

These rows are two merged .

Example Result Dataset

Delivery_ID	City	Distance_km	Driver_Name	Vehicle_Type
D001	Hyderabad	12	Ramesh	Bike
D002	Bangalore	20	Suresh	Van
D003	Chennai	10	Mahesh	Bike

Now each delivery record contains driver information also.

Why MERGE Is Important In Real Projects

In real logistics systems, data comes from multiple systems.

Example datasets:

Dataset	Contains
Delivery dataset	Delivery metrics
Driver dataset	Driver details
Vehicle dataset	Vehicle information
Fuel dataset	Fuel consumption

MERGE helps combine all these datasets into one analytical dataset.

7. PROC TRANSPOSE

proc transpose data=delivery_clean out=delivery_transpose;

var Fuel_Cost Fees;

run;

proc print data=delivery_transpose;

run;

OUTPUT:


Obs	_NAME_	COL1	COL2	COL3	COL4	COL5	COL6	COL7	COL8	COL9	COL10	COL11	COL12	COL13	COL14	COL15	COL16	COL17	COL18	COL19	COL20	COL21
1	Fuel_Cost	150	200	180	120	220	190	250	160	300	120	170	165	195	210	240	155	205	215	230	175	205
2	Fees	40	50	45	35	55	42	60	41	70	38	43	40	48	52	58	39	51	54	56	44	49

Converts rows to columns.

8. Fraud Detection Macro

%macro fraud_check;

data fraud_detection;

set delivery_clean;

if Failed_Attempts >2 and Service_Rating <4 then Fraud_Flag="YES";

else Fraud_Flag="NO";

run;

proc print data=fraud_detection;

run;

%mend;

%fraud_check;

OUTPUT:


Obs	Delivery_ID	City	Driver_Name	Utilization	Distance_km	Delivery_Time	Failed_Attempts	Fuel_Cost	Service_Rating	Percentage	Fees	Delivery_Date	Delivery_Date2	Month_Gap	Next_Service	Fraud_Flag
1	D001	Hyderabad	Ramesh	Low	12	45	0	150	4.5	90	40	23753	23753	0	23773	NO
2	D002	Bangalore	Suresh	Medium	20	60	1	200	4.2	88	50	23756	23756	0	23773	NO
3	D003	Chennai	Mahesh	Low	10	60	0	180	4.8	92	45	23759	23759	0	23773	NO
4	D004	Mumbai	Anil	Low	15	55	2	120	3.5	70	35	23761	23761	0	23773	NO
5	D005	Delhi	Raju	Medium	25	70	0	220	5.0	95	55	23766	23766	0	23773	NO
6	D006	Kolkata	Kumar	Medium	18	65	1	190	4.1	85	42	23769	23769	0	23773	NO
7	D007	Hyderabad	Venkat	Medium	22	80	3	250	3.8	60	60	23774	23774	1	23801	YES
8	D008	Bangalore	Ravi	Low	14	50	0	160	4.4	89	41	23777	23777	1	23801	NO
9	D009	Chennai	Ganesh	High	30	90	2	300	3.9	75	70	23779	23779	1	23801	NO
10	D010	Mumbai	Sandeep	Low	11	40	0	120	4.7	91	38	23782	23782	1	23801	NO
11	D011	Delhi	Karthik	Medium	17	58	1	170	4.3	87	43	23784	23784	1	23801	NO
12	D012	Kolkata	Ajay	Medium	16	52	0	165	4.6	90	40	23786	23786	1	23801	NO
13	D013	Hyderabad	Manoj	Medium	19	68	0	195	4.2	86	48	23788	23788	1	23801	NO
14	D014	Bangalore	Srikanth	Medium	23	75	1	210	4.1	84	52	23790	23790	1	23801	NO
15	D015	Chennai	Praveen	High	27	82	2	240	3.7	72	58	23792	23792	1	23801	NO
16	D016	Mumbai	Rajesh	Low	13	48	0	155	4.5	93	39	23794	23794	1	23801	NO
17	D017	Delhi	Deepak	Medium	21	70	1	205	4.0	83	51	23797	23797	1	23801	NO
18	D018	Kolkata	Arjun	Medium	24	78	0	215	4.6	92	54	23799	23799	1	23801	NO
19	D019	Hyderabad	Raghav		26	85	1	230	4.4	90	56	23801	.	.	.	NO
20	D020	Bangalore	Mohan		18	62	0	175	4.6	92	44	23803	.	.	.	NO
21	D021	Chennai	Suresh		22	72	1	205	4.3	87	49	23805	.	.	.	NO

9. PROC DATASETS DELETE

proc datasets library=work nolist;

delete delivery_raw;

quit;

LOG:

 NOTE: Deleting WORK.DELIVERY_RAW (memtype=DATA).

Removes temporary datasets.

10. Business Insights

Using this dataset we can analyze:

· Average delivery time per city

· Failed delivery patterns

· Fuel cost optimization

· Driver performance

· Fraud detection patterns

11. 20 Key Points About This Project

· Create a simulated last-mile delivery dataset containing operational variables like Delivery_ID, City, Distance_km, Delivery_Time, Fuel_Cost, and Service_Rating.

· Introduce intentional data quality errors such as negative distance, missing delivery time, invalid ratings, and inconsistent city names.

· Use DATA step programming to read raw delivery data and construct the initial dataset.

· Apply character cleaning functions such as STRIP, TRIM, UPCASE, LOWCASE, and PROPCASE to standardize city and driver names.

· Use concatenation functions like CAT, CATX, and CATS to combine variables such as driver and city information for reporting.

· Implement numeric functions like ABS, ROUND, and SUM to correct negative values and standardize numeric metrics.

· Handle missing values using the COALESCE function to ensure data completeness for analysis.

· Standardize date values using the MDY function to construct proper SAS date variables.

· Calculate time intervals between dates using the INTCK function for delivery scheduling analysis.

· Generate future or next scheduled dates using the INTNX function to simulate operational planning.

· Use conditional logic in DATA step to classify delivery utilization levels such as Low, Medium, and High.

· Detect potential operational anomalies like excessive failed delivery attempts using conditional rules.

· Implement macro programming to create a reusable fraud detection logic for suspicious deliveries.

· Use PROC APPEND to add new delivery records to the existing dataset without recreating the dataset.

· Use MERGE statement to combine delivery information with driver or operational datasets.

· Transform dataset structure using PROC TRANSPOSE to convert rows into columns for reporting.

· Validate dataset structure using PROC CONTENTS to verify variables, types, and lengths.

· Perform dataset management using PROC DATASETS DELETE to remove unnecessary intermediate datasets.

· Prepare the cleaned and corrected dataset for logistics performance analysis and reporting.

· Generate insights to help companies reduce delivery errors, detect fraud patterns, optimize fuel usage, and improve last-mile operational efficiency.

Conclusion

This project demonstrates how SAS programming can support logistics analytics in last-mile delivery operations. Through the creation of a simulated dataset with intentional errors, the project illustrates the importance of data validation and cleaning in operational environments.

Key SAS techniques used in this project include:

· Data Step transformations

· Character and numeric functions

· Date functions such as MDY, INTCK, and INTNX

· Dataset merging and appending

· Macro-based fraud detection

· Dataset management using PROC DATASETS

By applying these techniques, the raw dataset was successfully transformed into a clean and analysis-ready dataset.

The fraud detection macro further enables organizations to identify suspicious delivery patterns, improving operational transparency and preventing financial losses.

Overall, this project shows how SAS can be used not only for statistical analysis but also for real-world operational data management and fraud detection in logistics systems.

SAS INTERVIEW QUESTIONS

1.What Is The Difference Between MERGE And JOIN?

Answer:

Feature

MERGE

SQL JOIN

Location

DATA Step

PROC SQL

Sorting Required

Yes

No

Performance

Faster for large datasets

Flexible

Use Case

Sequential merging

Complex joins

Feature	MERGE	SQL JOIN
Location	DATA Step	PROC SQL
Sorting Required	Yes	No
Performance	Faster for large datasets	Flexible
Use Case	Sequential merging	Complex joins

2.What is the difference between SET and PROC APPEND?

Answer:

Feature	SET	PROC APPEND
Creates new dataset	Yes	No
Performance	Slower	Faster
Memory usage	Higher	Lower
Use case	Combining datasets	Adding records

3.What is POINT= option in SET statement?

Answer:

POINT= option in a SET statement is used to read a specific observation directly from a dataset instead of reading sequentially.

It allows random access to observations using the observation number.

Example:

data example;

set sales point=i;

run;

Here, point=i tells SAS to read the i-th observation from the dataset.

Key point:
POINT= reads observations randomly (direct access) rather than sequentially from top to bottom.

--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
About the Author:
SAS Learning Hub is a data analytics and SAS programming platform focused on clinical, financial, and real-world data analysis. The content is created by professionals with academic training in Pharmaceutics and hands-on experience in Base SAS, PROC SQL, Macros, SDTM, and ADaM, providing practical and industry-relevant SAS learning resources.

Disclaimer:
The datasets and analysis in this article are created for educational and demonstration purposes only. They do not represent DELIVERY OPERATIONS data.

Our Mission:
This blog provides industry-focused SAS programming tutorials and analytics projects covering finance, healthcare, and technology.

This project is suitable for:
·  Students learning SAS
·  Data analysts building portfolios
·  Professionals preparing for SAS interviews
·  Bloggers writing about analytics
·  Clinical SAS Programmer
·  Research Data Analyst
·  Regulatory Data Validator
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Follow Us On :

--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

--->Follow our blog for more SAS-based analytics projects and industry data models.

---> Support Us By Following Our Blog..

To deepen your understanding of SAS analytics, please refer to our other data science and industry-focused projects listed below:

1.HOW SMART ARE OUR CITIES? — ANALYZING URBAN PERFORMANCE USING SAS

2.Which Breakfast Is The Healthiest And Most Popular? — Using Sas Data Analytics
3.Which Glass Formula Is The Strongest? — Real – World GlassAnalytics Using Sas

--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

About Us | Contact | Privacy Policy

421.Can We Detect Errors, Prevent Fraud, And Optimize Last-Mile Delivery Operations Using Advanced SAS Programming?

Smart SAS Intelligence For Last-Mile Delivery Error Detection And Fraud Prevention

HERE IN THIS PROJECT WE USED THESE SAS STATEMENTS —DATA STEP | LENGTH | INPUT | DATALINES | SET | MERGE | IF-THEN | FORMAT | INFORMAT | PROC SORT | PROC PRINT | PROC CONTENTS | PROC TRANSPOSE | PROC APPEND | PROC DATASETS DELETE | MACRO / %MACRO / %MEND

Understanding The Errors In The Dataset

1.Negative Distance

2.Missing Delivery Time

3.Negative Fuel Cost

4.Invalid Service Rating

5.Inconsistent City Names

Character Functions

STRIP

TRIM

CAT

CATX

PROPCASE

UPCASE

LOWCASE

COALESCE

Numeric Functions

Date Functions

MDY

INTCK

INTNX

Explanation Of The Code

DATA delivery_new

length

INPUT Statement

:ddmmyy10.

DATALINES

Explanation Of PROC APPEND

PROC APPEND

BASE=delivery_clean

DATA=delivery_new

What Happens After Execution

Explanation Of The Code

DATA driver_data

LENGTH Statement

INPUT Statement

Why Sorting Is Required

Explanation Of MERGE Code

DATA delivery_final

MERGE Statement

BY Delivery_ID

Why MERGE Is Important In Real Projects

10. Business Insights

Conclusion

1.What Is The Difference Between MERGE And JOIN?

Answer: Feature MERGE SQL JOIN Location DATA Step PROC SQL Sorting Required Yes No Performance Faster for large datasets Flexible Use Case Sequential merging Complex joins

Answer:

Follow Us On :

To deepen your understanding of SAS analytics, please refer to our other data science and industry-focused projects listed below:

1.HOW SMART ARE OUR CITIES? — ANALYZING URBAN PERFORMANCE USING SAS2.Which Breakfast Is The Healthiest And Most Popular? — Using Sas Data Analytics3.Which Glass Formula Is The Strongest? — Real – World GlassAnalytics Using Sas

--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

About Us | Contact | Privacy Policy

Comments

Post a Comment

Popular posts from this blog

Beyond Fabric and Fashion: Turning the World’s Most Beautiful Sarees Dataset into Structured Intelligence with SAS and R

Data Cleaning Secrets Using Famous Food Dataset:Handling Duplicate Records in SAS

Global AI Trends Unlocked Through SCAN and SUBSTR Precision in SAS

`DATA delivery_new`

`length`

`INPUT Statement`

`:ddmmyy10.`

`DATALINES`

`PROC APPEND`

`BASE=delivery_clean`

`DATA=delivery_new`

`DATA driver_data`

`LENGTH Statement`

`INPUT Statement`

`DATA delivery_final`

`MERGE Statement`

`BY Delivery_ID`

Answer:

Feature

MERGE

SQL JOIN

Location

DATA Step

PROC SQL

Sorting Required

Yes

No

Performance

Faster for large datasets

Flexible

Use Case

Sequential merging

Complex joins

1.HOW SMART ARE OUR CITIES? — ANALYZING URBAN PERFORMANCE USING SAS

2.Which Breakfast Is The Healthiest And Most Popular? — Using Sas Data Analytics
3.Which Glass Formula Is The Strongest? — Real – World GlassAnalytics Using Sas