Detecting And Correcting Digital Advertising Fraud Data Errors Using Advanced SAS Programming

--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

HERE IN THIS PROJECT WE USED THESE SAS STATEMENTS —DATA STEP | LENGTH | INPUT | DATALINES | SET | MERGE | PROC SORT | PROC PRINT | PROC CONTENTS | PROC FREQ | PROC TRANSPOSE | PROC APPEND | PROC DATASETS DELETE | MACRO / %MACRO / %MEND | NUMERIC AND CHARACTER FUNCTIONS

Introduction

Digital advertising is one of the fastest-growing industries in the world. Companies spend billions of dollars on online advertising campaigns across platforms such as Google Ads, social media platforms, display networks, and affiliate traffic sources. While digital marketing enables companies to reach targeted audiences efficiently, it also introduces a major challenge: digital advertising fraud.

Advertising fraud occurs when malicious actors generate fake clicks, impressions, or conversions in order to steal advertising budgets. These fraudulent activities can involve bot traffic, click farms, automated scripts, or manipulated traffic sources. As a result, companies lose significant amounts of money and receive inaccurate campaign performance metrics.

Data analytics plays a critical role in identifying suspicious patterns in advertising traffic. Organizations analyze campaign datasets to detect abnormal click patterns, invalid traffic sources, and unusual conversion behavior. Advanced analytics tools such as SAS (Statistical Analysis System) are widely used in industries for fraud detection because of their powerful data manipulation capabilities.

In this project, we will simulate a Digital Advertising Fraud Dataset containing campaign performance information such as:

Campaign ID
Traffic Source
Clicks
Invalid Clicks
Conversion Rate
Loss Amount
Fraud Score
Utilization Classification
Fees
Date Formats

The dataset will intentionally include multiple programming errors, which will later be identified and corrected. The goal is to demonstrate how SAS programmers debug datasets and implement fraud detection logic using DATA step programming, macros, character functions, numeric functions, and date manipulation techniques.

By the end of this project, we will:

Create a dataset with intentional coding errors
Detect those errors
Correct the code
Implement fraud detection logic using SAS macros
Use character and numeric functions
Use date functions such as MDY, INTCK, INTNX
Apply SET, MERGE, APPEND, TRANSPOSE
Clean datasets using PROC DATASETS DELETE

This project simulates a real-world scenario where analysts must correct and analyze campaign data before performing fraud analytics.

Table Of Contents

Introduction
Business Context
Dataset Structure
Creating Raw Dataset with Intentional Errors
Explanation of Errors
Corrected Dataset Code
Character Functions in SAS
Numeric Functions in SAS
Date Functions (MDY, INTCK, INTNX)
Dataset Combination using SET
Dataset Combination using MERGE
Dataset Combination using PROC APPEND
Dataset Transformation using PROC TRANSPOSE
Fraud Detection Macro
Utilization Classification Logic
Fraud Score Classification
Cleaning Temporary Datasets using PROC DATASETS DELETE
Final Dataset Output
20 Key Points About This Project
Project Summary and Conclusion

Business Context

Companies invest heavily in digital advertising campaigns to generate customer engagement and conversions. However, fraudulent traffic significantly affects campaign performance metrics.

Common fraud indicators include:

Extremely high click counts with low conversions
High invalid click percentages
Unusual traffic spikes within short time intervals
Traffic from suspicious sources
Extremely high fraud risk scores

Organizations use analytics systems like SAS to monitor advertising datasets and automatically detect fraud patterns.

For example:

Campaign	Clicks	Invalid Clicks	Conversion Rate
C101	5000	50	4%
C102	8000	4000	0.2%

Campaign C102 is suspicious because half of the clicks are invalid.

By applying SAS analytics, companies can:

Identify fraudulent campaigns
Calculate financial losses
Block suspicious traffic sources
Optimize marketing budgets

Dataset Structure

Our dataset will contain the following variables:

Variable	Description
Campaign_ID	Unique advertising campaign identifier
Traffic_Source	Source of advertising traffic
Clicks	Total number of clicks
Invalid_Clicks	Number of fraudulent clicks
Conversion_Rate	Percentage of clicks converting to sales
Loss_Amount	Estimated monetary loss due to fraud
Fraud_Score	Fraud risk score
Percentage	Percentage of invalid clicks
Fees	Advertising platform fees
Utilization_Class	Traffic utilization classification
Campaign_Date	Date of campaign

1. Raw Dataset with Intentional Errors

data ad_fraud_raw;

length Campaign_ID $5 Traffic_Source $20 Utilization_Class $15;

input Campaign_ID $ Traffic_Source $ Clicks

Invalid_Clicks Conversion_Rate Loss_Amount

Fraud_Score Percentage Fees Utilization_Class $

Campaign_Date :ddmmyy10.;

format Campaign_Date date9.;

datalines;

C101 google 5000 50 4.2 200 15 1.0 20 High 01/01/2025

C102 facebook 8000 4000 0.2 900 90 50 35 High 02/01/2025

C103 affiliate 3500 200 2.3 150 30 5.7 10 Medium 03/01/2025

C104 bottraffic 9000 8500 0.1 1200 98 94 45 High 04/01/2025

C105 instagram 4200 150 3.0 180 25 3.5 15 Medium 05/01/2025

C106 youtube 3900 100 4.1 120 18 2.5 12 Low 06/01/2025

C107 google 5200 400 2.8 250 40 7.6 18 Medium 07/01/2025

C108 facebook 6100 300 3.2 200 35 4.9 17 Medium 08/01/2025

C109 affiliate 4500 600 1.5 310 55 13.3 20 High 09/01/2025

C110 bottraffic 10000 9000 0.0 1500 99 90 50 High 10/01/2025

C111 google 4800 120 3.9 160 22 2.5 14 Low 11/01/2025

C112 facebook 5100 180 3.5 170 28 3.5 15 Medium 12/01/2025

C113 youtube 4300 140 3.8 165 26 3.2 14 Low 13/01/2025

C114 affiliate 3700 500 2.0 300 50 13.5 22 High 14/01/2025

C115 bottraffic 8800 8400 0.1 1100 97 95 40 High 15/01/2025

C116 instagram 4600 160 3.3 175 27 3.4 16 Medium 16/01/2025

;

run;

proc print data=ad_fraud_raw;

run;

OUTPUT:


Obs	Campaign_ID	Traffic_Source	Utilization_Class	Clicks	Invalid_Clicks	Conversion_Rate	Loss_Amount	Fraud_Score	Percentage	Fees	Campaign_Date
1	C101	google	High	5000	50	4.2	200	15	1.0	20	01JAN2025
2	C102	facebook	High	8000	4000	0.2	900	90	50.0	35	02JAN2025
3	C103	affiliate	Medium	3500	200	2.3	150	30	5.7	10	03JAN2025
4	C104	bottraffic	High	9000	8500	0.1	1200	98	94.0	45	04JAN2025
5	C105	instagram	Medium	4200	150	3.0	180	25	3.5	15	05JAN2025
6	C106	youtube	Low	3900	100	4.1	120	18	2.5	12	06JAN2025
7	C107	google	Medium	5200	400	2.8	250	40	7.6	18	07JAN2025
8	C108	facebook	Medium	6100	300	3.2	200	35	4.9	17	08JAN2025
9	C109	affiliate	High	4500	600	1.5	310	55	13.3	20	09JAN2025
10	C110	bottraffic	High	10000	9000	0.0	1500	99	90.0	50	10JAN2025
11	C111	google	Low	4800	120	3.9	160	22	2.5	14	11JAN2025
12	C112	facebook	Medium	5100	180	3.5	170	28	3.5	15	12JAN2025
13	C113	youtube	Low	4300	140	3.8	165	26	3.2	14	13JAN2025
14	C114	affiliate	High	3700	500	2.0	300	50	13.5	22	14JAN2025
15	C115	bottraffic	High	8800	8400	0.1	1100	97	95.0	40	15JAN2025
16	C116	instagram	Medium	4600	160	3.3	175	27	3.4	16	16JAN2025

Intentional Errors in the Dataset

The dataset above contains multiple potential issues that analysts must detect.

Examples:

Error 1 — Traffic Source Formatting

Traffic sources appear in inconsistent formats.

Example:

google

facebook

affiliate

bottraffic

Best practice is to standardize case using UPCASE or PROPCASE.

Error 2 — Percentage Not Derived

Percentage should be calculated using:

Invalid_Clicks / Clicks * 100

But the dataset manually entered values.

Error 3 — Fraud Score Not Calculated Programmatically

Fraud scores are entered manually instead of using analytics logic.

Error 4 — Character Cleaning

Traffic source strings may contain leading or trailing spaces.

Functions required:strip,trim

2. Corrected Dataset Code

data ad_fraud_clean;

set ad_fraud_raw;

Traffic_Source = propcase(strip(Traffic_Source));

Percentage = round((Invalid_Clicks/Clicks)*100,0.1);

Fraud_Score = round((Percentage*0.7) + ((1-Conversion_Rate)*20),0.1);

Total_Cost = sum(Loss_Amount,Fees);

Campaign_Name = catx("_",Campaign_ID,Traffic_Source);

run;

proc print data=ad_fraud_clean;

run;

OUTPUT:


Obs	Campaign_ID	Traffic_Source	Utilization_Class	Clicks	Invalid_Clicks	Conversion_Rate	Loss_Amount	Fraud_Score	Percentage	Fees	Campaign_Date	Total_Cost	Campaign_Name
1	C101	Google	High	5000	50	4.2	200	-63.3	1.0	20	01JAN2025	220	C101_Google
2	C102	Facebook	High	8000	4000	0.2	900	51.0	50.0	35	02JAN2025	935	C102_Facebook
3	C103	Affiliate	Medium	3500	200	2.3	150	-22.0	5.7	10	03JAN2025	160	C103_Affiliate
4	C104	Bottraffic	High	9000	8500	0.1	1200	84.1	94.4	45	04JAN2025	1245	C104_Bottraffic
5	C105	Instagram	Medium	4200	150	3.0	180	-37.5	3.6	15	05JAN2025	195	C105_Instagram
6	C106	Youtube	Low	3900	100	4.1	120	-60.2	2.6	12	06JAN2025	132	C106_Youtube
7	C107	Google	Medium	5200	400	2.8	250	-30.6	7.7	18	07JAN2025	268	C107_Google
8	C108	Facebook	Medium	6100	300	3.2	200	-40.6	4.9	17	08JAN2025	217	C108_Facebook
9	C109	Affiliate	High	4500	600	1.5	310	-0.7	13.3	20	09JAN2025	330	C109_Affiliate
10	C110	Bottraffic	High	10000	9000	0.0	1500	83.0	90.0	50	10JAN2025	1550	C110_Bottraffic
11	C111	Google	Low	4800	120	3.9	160	-56.3	2.5	14	11JAN2025	174	C111_Google
12	C112	Facebook	Medium	5100	180	3.5	170	-47.6	3.5	15	12JAN2025	185	C112_Facebook
13	C113	Youtube	Low	4300	140	3.8	165	-53.7	3.3	14	13JAN2025	179	C113_Youtube
14	C114	Affiliate	High	3700	500	2.0	300	-10.6	13.5	22	14JAN2025	322	C114_Affiliate
15	C115	Bottraffic	High	8800	8400	0.1	1100	84.9	95.5	40	15JAN2025	1140	C115_Bottraffic
16	C116	Instagram	Medium	4600	160	3.3	175	-43.6	3.5	16	16JAN2025	191	C116_Instagram

Explanation of Character Functions

STRIP

Removes leading and trailing spaces.

strip(Traffic_Source)

Example:

" google " → "google"

TRIM

Removes trailing spaces.

trim(variable)

CAT

Concatenates values.

cat(a,b)

CATX

Concatenates values with a delimiter.

catx("_",Campaign_ID,Traffic_Source)

Result:

C101_Google

UPCASE

Converts text to uppercase.

upcase(variable)

LOWCASE

Converts text to lowercase.

PROPCASE

Capitalizes the first letter.

Example:

google → Google

Numeric Functions

SUM

Adds numeric variables.

Total_Cost = sum(Loss_Amount,Fees);

ROUND

Rounds numeric values.

round(value,0.1)

Date Functions

MDY

Creates a SAS date.

Campaign_Start = mdy(1,1,2025);

INTCK

Counts intervals between dates.

Days_Between = intck('day',Campaign_Start,Campaign_Date);

INTNX

Moves forward or backward in time.

Next_Campaign = intnx('month',Campaign_Date,1);

3. Using SET Statement

Method 1 — If You Want the Same Dataset Copied Then

data combined_data;

set ad_fraud_clean;

run;

proc print data=combined_data;

run;

OUTPUT:


Obs	Campaign_ID	Traffic_Source	Utilization_Class	Clicks	Invalid_Clicks	Conversion_Rate	Loss_Amount	Fraud_Score	Percentage	Fees	Campaign_Date	Total_Cost	Campaign_Name
1	C101	Google	High	5000	50	4.2	200	-63.3	1.0	20	01JAN2025	220	C101_Google
2	C102	Facebook	High	8000	4000	0.2	900	51.0	50.0	35	02JAN2025	935	C102_Facebook
3	C103	Affiliate	Medium	3500	200	2.3	150	-22.0	5.7	10	03JAN2025	160	C103_Affiliate
4	C104	Bottraffic	High	9000	8500	0.1	1200	84.1	94.4	45	04JAN2025	1245	C104_Bottraffic
5	C105	Instagram	Medium	4200	150	3.0	180	-37.5	3.6	15	05JAN2025	195	C105_Instagram
6	C106	Youtube	Low	3900	100	4.1	120	-60.2	2.6	12	06JAN2025	132	C106_Youtube
7	C107	Google	Medium	5200	400	2.8	250	-30.6	7.7	18	07JAN2025	268	C107_Google
8	C108	Facebook	Medium	6100	300	3.2	200	-40.6	4.9	17	08JAN2025	217	C108_Facebook
9	C109	Affiliate	High	4500	600	1.5	310	-0.7	13.3	20	09JAN2025	330	C109_Affiliate
10	C110	Bottraffic	High	10000	9000	0.0	1500	83.0	90.0	50	10JAN2025	1550	C110_Bottraffic
11	C111	Google	Low	4800	120	3.9	160	-56.3	2.5	14	11JAN2025	174	C111_Google
12	C112	Facebook	Medium	5100	180	3.5	170	-47.6	3.5	15	12JAN2025	185	C112_Facebook
13	C113	Youtube	Low	4300	140	3.8	165	-53.7	3.3	14	13JAN2025	179	C113_Youtube
14	C114	Affiliate	High	3700	500	2.0	300	-10.6	13.5	22	14JAN2025	322	C114_Affiliate
15	C115	Bottraffic	High	8800	8400	0.1	1100	84.9	95.5	40	15JAN2025	1140	C115_Bottraffic
16	C116	Instagram	Medium	4600	160	3.3	175	-43.6	3.5	16	16JAN2025	191	C116_Instagram

Explanation:

· DATA combined_data;
Creates a new dataset named combined_data.

· SET ad_fraud_clean;
Reads all observations from ad_fraud_clean.

· RUN;
Executes the DATA step.

Result:
combined_data will contain exactly the same observations as ad_fraud_clean.

Method 2 — If You Intentionally Want Two Copies of Data Then

data campaign_part1;

set ad_fraud_clean;

run;

proc print data=campaign_part1;

run;

OUTPUT:


Obs	Campaign_ID	Traffic_Source	Utilization_Class	Clicks	Invalid_Clicks	Conversion_Rate	Loss_Amount	Fraud_Score	Percentage	Fees	Campaign_Date	Total_Cost	Campaign_Name
1	C101	Google	High	5000	50	4.2	200	-63.3	1.0	20	01JAN2025	220	C101_Google
2	C102	Facebook	High	8000	4000	0.2	900	51.0	50.0	35	02JAN2025	935	C102_Facebook
3	C103	Affiliate	Medium	3500	200	2.3	150	-22.0	5.7	10	03JAN2025	160	C103_Affiliate
4	C104	Bottraffic	High	9000	8500	0.1	1200	84.1	94.4	45	04JAN2025	1245	C104_Bottraffic
5	C105	Instagram	Medium	4200	150	3.0	180	-37.5	3.6	15	05JAN2025	195	C105_Instagram
6	C106	Youtube	Low	3900	100	4.1	120	-60.2	2.6	12	06JAN2025	132	C106_Youtube
7	C107	Google	Medium	5200	400	2.8	250	-30.6	7.7	18	07JAN2025	268	C107_Google
8	C108	Facebook	Medium	6100	300	3.2	200	-40.6	4.9	17	08JAN2025	217	C108_Facebook
9	C109	Affiliate	High	4500	600	1.5	310	-0.7	13.3	20	09JAN2025	330	C109_Affiliate
10	C110	Bottraffic	High	10000	9000	0.0	1500	83.0	90.0	50	10JAN2025	1550	C110_Bottraffic
11	C111	Google	Low	4800	120	3.9	160	-56.3	2.5	14	11JAN2025	174	C111_Google
12	C112	Facebook	Medium	5100	180	3.5	170	-47.6	3.5	15	12JAN2025	185	C112_Facebook
13	C113	Youtube	Low	4300	140	3.8	165	-53.7	3.3	14	13JAN2025	179	C113_Youtube
14	C114	Affiliate	High	3700	500	2.0	300	-10.6	13.5	22	14JAN2025	322	C114_Affiliate
15	C115	Bottraffic	High	8800	8400	0.1	1100	84.9	95.5	40	15JAN2025	1140	C115_Bottraffic
16	C116	Instagram	Medium	4600	160	3.3	175	-43.6	3.5	16	16JAN2025	191	C116_Instagram

data campaign_part2;

set ad_fraud_clean;

run;

proc print data=campaign_part2;

run;

OUTPUT:


Obs	Campaign_ID	Traffic_Source	Utilization_Class	Clicks	Invalid_Clicks	Conversion_Rate	Loss_Amount	Fraud_Score	Percentage	Fees	Campaign_Date	Total_Cost	Campaign_Name
1	C101	Google	High	5000	50	4.2	200	-63.3	1.0	20	01JAN2025	220	C101_Google
2	C102	Facebook	High	8000	4000	0.2	900	51.0	50.0	35	02JAN2025	935	C102_Facebook
3	C103	Affiliate	Medium	3500	200	2.3	150	-22.0	5.7	10	03JAN2025	160	C103_Affiliate
4	C104	Bottraffic	High	9000	8500	0.1	1200	84.1	94.4	45	04JAN2025	1245	C104_Bottraffic
5	C105	Instagram	Medium	4200	150	3.0	180	-37.5	3.6	15	05JAN2025	195	C105_Instagram
6	C106	Youtube	Low	3900	100	4.1	120	-60.2	2.6	12	06JAN2025	132	C106_Youtube
7	C107	Google	Medium	5200	400	2.8	250	-30.6	7.7	18	07JAN2025	268	C107_Google
8	C108	Facebook	Medium	6100	300	3.2	200	-40.6	4.9	17	08JAN2025	217	C108_Facebook
9	C109	Affiliate	High	4500	600	1.5	310	-0.7	13.3	20	09JAN2025	330	C109_Affiliate
10	C110	Bottraffic	High	10000	9000	0.0	1500	83.0	90.0	50	10JAN2025	1550	C110_Bottraffic
11	C111	Google	Low	4800	120	3.9	160	-56.3	2.5	14	11JAN2025	174	C111_Google
12	C112	Facebook	Medium	5100	180	3.5	170	-47.6	3.5	15	12JAN2025	185	C112_Facebook
13	C113	Youtube	Low	4300	140	3.8	165	-53.7	3.3	14	13JAN2025	179	C113_Youtube
14	C114	Affiliate	High	3700	500	2.0	300	-10.6	13.5	22	14JAN2025	322	C114_Affiliate
15	C115	Bottraffic	High	8800	8400	0.1	1100	84.9	95.5	40	15JAN2025	1140	C115_Bottraffic
16	C116	Instagram	Medium	4600	160	3.3	175	-43.6	3.5	16	16JAN2025	191	C116_Instagram

data combined_data;

set campaign_part1 campaign_part2;

run;

proc print data=combined_data;

run;

OUTPUT:


Obs	Campaign_ID	Traffic_Source	Utilization_Class	Clicks	Invalid_Clicks	Conversion_Rate	Loss_Amount	Fraud_Score	Percentage	Fees	Campaign_Date	Total_Cost	Campaign_Name
1	C101	Google	High	5000	50	4.2	200	-63.3	1.0	20	01JAN2025	220	C101_Google
2	C102	Facebook	High	8000	4000	0.2	900	51.0	50.0	35	02JAN2025	935	C102_Facebook
3	C103	Affiliate	Medium	3500	200	2.3	150	-22.0	5.7	10	03JAN2025	160	C103_Affiliate
4	C104	Bottraffic	High	9000	8500	0.1	1200	84.1	94.4	45	04JAN2025	1245	C104_Bottraffic
5	C105	Instagram	Medium	4200	150	3.0	180	-37.5	3.6	15	05JAN2025	195	C105_Instagram
6	C106	Youtube	Low	3900	100	4.1	120	-60.2	2.6	12	06JAN2025	132	C106_Youtube
7	C107	Google	Medium	5200	400	2.8	250	-30.6	7.7	18	07JAN2025	268	C107_Google
8	C108	Facebook	Medium	6100	300	3.2	200	-40.6	4.9	17	08JAN2025	217	C108_Facebook
9	C109	Affiliate	High	4500	600	1.5	310	-0.7	13.3	20	09JAN2025	330	C109_Affiliate
10	C110	Bottraffic	High	10000	9000	0.0	1500	83.0	90.0	50	10JAN2025	1550	C110_Bottraffic
11	C111	Google	Low	4800	120	3.9	160	-56.3	2.5	14	11JAN2025	174	C111_Google
12	C112	Facebook	Medium	5100	180	3.5	170	-47.6	3.5	15	12JAN2025	185	C112_Facebook
13	C113	Youtube	Low	4300	140	3.8	165	-53.7	3.3	14	13JAN2025	179	C113_Youtube
14	C114	Affiliate	High	3700	500	2.0	300	-10.6	13.5	22	14JAN2025	322	C114_Affiliate
15	C115	Bottraffic	High	8800	8400	0.1	1100	84.9	95.5	40	15JAN2025	1140	C115_Bottraffic
16	C116	Instagram	Medium	4600	160	3.3	175	-43.6	3.5	16	16JAN2025	191	C116_Instagram
17	C101	Google	High	5000	50	4.2	200	-63.3	1.0	20	01JAN2025	220	C101_Google
18	C102	Facebook	High	8000	4000	0.2	900	51.0	50.0	35	02JAN2025	935	C102_Facebook
19	C103	Affiliate	Medium	3500	200	2.3	150	-22.0	5.7	10	03JAN2025	160	C103_Affiliate
20	C104	Bottraffic	High	9000	8500	0.1	1200	84.1	94.4	45	04JAN2025	1245	C104_Bottraffic
21	C105	Instagram	Medium	4200	150	3.0	180	-37.5	3.6	15	05JAN2025	195	C105_Instagram
22	C106	Youtube	Low	3900	100	4.1	120	-60.2	2.6	12	06JAN2025	132	C106_Youtube
23	C107	Google	Medium	5200	400	2.8	250	-30.6	7.7	18	07JAN2025	268	C107_Google
24	C108	Facebook	Medium	6100	300	3.2	200	-40.6	4.9	17	08JAN2025	217	C108_Facebook
25	C109	Affiliate	High	4500	600	1.5	310	-0.7	13.3	20	09JAN2025	330	C109_Affiliate
26	C110	Bottraffic	High	10000	9000	0.0	1500	83.0	90.0	50	10JAN2025	1550	C110_Bottraffic
27	C111	Google	Low	4800	120	3.9	160	-56.3	2.5	14	11JAN2025	174	C111_Google
28	C112	Facebook	Medium	5100	180	3.5	170	-47.6	3.5	15	12JAN2025	185	C112_Facebook
29	C113	Youtube	Low	4300	140	3.8	165	-53.7	3.3	14	13JAN2025	179	C113_Youtube
30	C114	Affiliate	High	3700	500	2.0	300	-10.6	13.5	22	14JAN2025	322	C114_Affiliate
31	C115	Bottraffic	High	8800	8400	0.1	1100	84.9	95.5	40	15JAN2025	1140	C115_Bottraffic
32	C116	Instagram	Medium	4600	160	3.3	175	-43.6	3.5	16	16JAN2025	191	C116_Instagram

Explanation:

· campaign_part1 and campaign_part2 are identical copies.

· SET campaign_part1 campaign_part2; stacks them vertically.

· Final dataset will contain double the observations.

Method 3 — If the Goal is Appending Then

proc append base=combined_data

data=ad_fraud_clean;

run;

proc print data=combined_data;

run;

OUTPUT:


Obs	Campaign_ID	Traffic_Source	Utilization_Class	Clicks	Invalid_Clicks	Conversion_Rate	Loss_Amount	Fraud_Score	Percentage	Fees	Campaign_Date	Total_Cost	Campaign_Name
1	C101	Google	High	5000	50	4.2	200	-63.3	1.0	20	01JAN2025	220	C101_Google
2	C102	Facebook	High	8000	4000	0.2	900	51.0	50.0	35	02JAN2025	935	C102_Facebook
3	C103	Affiliate	Medium	3500	200	2.3	150	-22.0	5.7	10	03JAN2025	160	C103_Affiliate
4	C104	Bottraffic	High	9000	8500	0.1	1200	84.1	94.4	45	04JAN2025	1245	C104_Bottraffic
5	C105	Instagram	Medium	4200	150	3.0	180	-37.5	3.6	15	05JAN2025	195	C105_Instagram
6	C106	Youtube	Low	3900	100	4.1	120	-60.2	2.6	12	06JAN2025	132	C106_Youtube
7	C107	Google	Medium	5200	400	2.8	250	-30.6	7.7	18	07JAN2025	268	C107_Google
8	C108	Facebook	Medium	6100	300	3.2	200	-40.6	4.9	17	08JAN2025	217	C108_Facebook
9	C109	Affiliate	High	4500	600	1.5	310	-0.7	13.3	20	09JAN2025	330	C109_Affiliate
10	C110	Bottraffic	High	10000	9000	0.0	1500	83.0	90.0	50	10JAN2025	1550	C110_Bottraffic
11	C111	Google	Low	4800	120	3.9	160	-56.3	2.5	14	11JAN2025	174	C111_Google
12	C112	Facebook	Medium	5100	180	3.5	170	-47.6	3.5	15	12JAN2025	185	C112_Facebook
13	C113	Youtube	Low	4300	140	3.8	165	-53.7	3.3	14	13JAN2025	179	C113_Youtube
14	C114	Affiliate	High	3700	500	2.0	300	-10.6	13.5	22	14JAN2025	322	C114_Affiliate
15	C115	Bottraffic	High	8800	8400	0.1	1100	84.9	95.5	40	15JAN2025	1140	C115_Bottraffic
16	C116	Instagram	Medium	4600	160	3.3	175	-43.6	3.5	16	16JAN2025	191	C116_Instagram
17	C101	Google	High	5000	50	4.2	200	-63.3	1.0	20	01JAN2025	220	C101_Google
18	C102	Facebook	High	8000	4000	0.2	900	51.0	50.0	35	02JAN2025	935	C102_Facebook
19	C103	Affiliate	Medium	3500	200	2.3	150	-22.0	5.7	10	03JAN2025	160	C103_Affiliate
20	C104	Bottraffic	High	9000	8500	0.1	1200	84.1	94.4	45	04JAN2025	1245	C104_Bottraffic
21	C105	Instagram	Medium	4200	150	3.0	180	-37.5	3.6	15	05JAN2025	195	C105_Instagram
22	C106	Youtube	Low	3900	100	4.1	120	-60.2	2.6	12	06JAN2025	132	C106_Youtube
23	C107	Google	Medium	5200	400	2.8	250	-30.6	7.7	18	07JAN2025	268	C107_Google
24	C108	Facebook	Medium	6100	300	3.2	200	-40.6	4.9	17	08JAN2025	217	C108_Facebook
25	C109	Affiliate	High	4500	600	1.5	310	-0.7	13.3	20	09JAN2025	330	C109_Affiliate
26	C110	Bottraffic	High	10000	9000	0.0	1500	83.0	90.0	50	10JAN2025	1550	C110_Bottraffic
27	C111	Google	Low	4800	120	3.9	160	-56.3	2.5	14	11JAN2025	174	C111_Google
28	C112	Facebook	Medium	5100	180	3.5	170	-47.6	3.5	15	12JAN2025	185	C112_Facebook
29	C113	Youtube	Low	4300	140	3.8	165	-53.7	3.3	14	13JAN2025	179	C113_Youtube
30	C114	Affiliate	High	3700	500	2.0	300	-10.6	13.5	22	14JAN2025	322	C114_Affiliate
31	C115	Bottraffic	High	8800	8400	0.1	1100	84.9	95.5	40	15JAN2025	1140	C115_Bottraffic
32	C116	Instagram	Medium	4600	160	3.3	175	-43.6	3.5	16	16JAN2025	191	C116_Instagram
33	C101	Google	High	5000	50	4.2	200	-63.3	1.0	20	01JAN2025	220	C101_Google
34	C102	Facebook	High	8000	4000	0.2	900	51.0	50.0	35	02JAN2025	935	C102_Facebook
35	C103	Affiliate	Medium	3500	200	2.3	150	-22.0	5.7	10	03JAN2025	160	C103_Affiliate
36	C104	Bottraffic	High	9000	8500	0.1	1200	84.1	94.4	45	04JAN2025	1245	C104_Bottraffic
37	C105	Instagram	Medium	4200	150	3.0	180	-37.5	3.6	15	05JAN2025	195	C105_Instagram
38	C106	Youtube	Low	3900	100	4.1	120	-60.2	2.6	12	06JAN2025	132	C106_Youtube
39	C107	Google	Medium	5200	400	2.8	250	-30.6	7.7	18	07JAN2025	268	C107_Google
40	C108	Facebook	Medium	6100	300	3.2	200	-40.6	4.9	17	08JAN2025	217	C108_Facebook
41	C109	Affiliate	High	4500	600	1.5	310	-0.7	13.3	20	09JAN2025	330	C109_Affiliate
42	C110	Bottraffic	High	10000	9000	0.0	1500	83.0	90.0	50	10JAN2025	1550	C110_Bottraffic
43	C111	Google	Low	4800	120	3.9	160	-56.3	2.5	14	11JAN2025	174	C111_Google
44	C112	Facebook	Medium	5100	180	3.5	170	-47.6	3.5	15	12JAN2025	185	C112_Facebook
45	C113	Youtube	Low	4300	140	3.8	165	-53.7	3.3	14	13JAN2025	179	C113_Youtube
46	C114	Affiliate	High	3700	500	2.0	300	-10.6	13.5	22	14JAN2025	322	C114_Affiliate
47	C115	Bottraffic	High	8800	8400	0.1	1100	84.9	95.5	40	15JAN2025	1140	C115_Bottraffic
48	C116	Instagram	Medium	4600	160	3.3	175	-43.6	3.5	16	16JAN2025	191	C116_Instagram

Explanation:

· BASE= → target dataset

· DATA= → dataset to append

· Adds new observations without recreating dataset structure.

APPEND adds new observations efficiently.

4. Using PROC TRANSPOSE

proc sort data=ad_fraud_clean;by Campaign_ID;run;

proc print data=ad_fraud_clean;

run;

OUTPUT:


Obs	Campaign_ID	Traffic_Source	Utilization_Class	Clicks	Invalid_Clicks	Conversion_Rate	Loss_Amount	Fraud_Score	Percentage	Fees	Campaign_Date	Total_Cost	Campaign_Name
1	C101	Google	High	5000	50	4.2	200	-63.3	1.0	20	01JAN2025	220	C101_Google
2	C102	Facebook	High	8000	4000	0.2	900	51.0	50.0	35	02JAN2025	935	C102_Facebook
3	C103	Affiliate	Medium	3500	200	2.3	150	-22.0	5.7	10	03JAN2025	160	C103_Affiliate
4	C104	Bottraffic	High	9000	8500	0.1	1200	84.1	94.4	45	04JAN2025	1245	C104_Bottraffic
5	C105	Instagram	Medium	4200	150	3.0	180	-37.5	3.6	15	05JAN2025	195	C105_Instagram
6	C106	Youtube	Low	3900	100	4.1	120	-60.2	2.6	12	06JAN2025	132	C106_Youtube
7	C107	Google	Medium	5200	400	2.8	250	-30.6	7.7	18	07JAN2025	268	C107_Google
8	C108	Facebook	Medium	6100	300	3.2	200	-40.6	4.9	17	08JAN2025	217	C108_Facebook
9	C109	Affiliate	High	4500	600	1.5	310	-0.7	13.3	20	09JAN2025	330	C109_Affiliate
10	C110	Bottraffic	High	10000	9000	0.0	1500	83.0	90.0	50	10JAN2025	1550	C110_Bottraffic
11	C111	Google	Low	4800	120	3.9	160	-56.3	2.5	14	11JAN2025	174	C111_Google
12	C112	Facebook	Medium	5100	180	3.5	170	-47.6	3.5	15	12JAN2025	185	C112_Facebook
13	C113	Youtube	Low	4300	140	3.8	165	-53.7	3.3	14	13JAN2025	179	C113_Youtube
14	C114	Affiliate	High	3700	500	2.0	300	-10.6	13.5	22	14JAN2025	322	C114_Affiliate
15	C115	Bottraffic	High	8800	8400	0.1	1100	84.9	95.5	40	15JAN2025	1140	C115_Bottraffic
16	C116	Instagram	Medium	4600	160	3.3	175	-43.6	3.5	16	16JAN2025	191	C116_Instagram

proc transpose data=ad_fraud_clean out=fraud_transpose;

by Campaign_ID;

var Clicks Invalid_Clicks Loss_Amount;

run;

proc print data=fraud_transpose;

run;

OUTPUT:


Obs	Campaign_ID	_NAME_	COL1
1	C101	Clicks	5000
2	C101	Invalid_Clicks	50
3	C101	Loss_Amount	200
4	C102	Clicks	8000
5	C102	Invalid_Clicks	4000
6	C102	Loss_Amount	900
7	C103	Clicks	3500
8	C103	Invalid_Clicks	200
9	C103	Loss_Amount	150
10	C104	Clicks	9000
11	C104	Invalid_Clicks	8500
12	C104	Loss_Amount	1200
13	C105	Clicks	4200
14	C105	Invalid_Clicks	150
15	C105	Loss_Amount	180
16	C106	Clicks	3900
17	C106	Invalid_Clicks	100
18	C106	Loss_Amount	120
19	C107	Clicks	5200
20	C107	Invalid_Clicks	400
21	C107	Loss_Amount	250
22	C108	Clicks	6100
23	C108	Invalid_Clicks	300
24	C108	Loss_Amount	200
25	C109	Clicks	4500
26	C109	Invalid_Clicks	600
27	C109	Loss_Amount	310
28	C110	Clicks	10000
29	C110	Invalid_Clicks	9000
30	C110	Loss_Amount	1500
31	C111	Clicks	4800
32	C111	Invalid_Clicks	120
33	C111	Loss_Amount	160
34	C112	Clicks	5100
35	C112	Invalid_Clicks	180
36	C112	Loss_Amount	170
37	C113	Clicks	4300
38	C113	Invalid_Clicks	140
39	C113	Loss_Amount	165
40	C114	Clicks	3700
41	C114	Invalid_Clicks	500
42	C114	Loss_Amount	300
43	C115	Clicks	8800
44	C115	Invalid_Clicks	8400
45	C115	Loss_Amount	1100
46	C116	Clicks	4600
47	C116	Invalid_Clicks	160
48	C116	Loss_Amount	175

TRANSPOSE converts rows into columns.

5. Fraud Detection Macro

%macro fraud_check;

data fraud_analysis;

set ad_fraud_clean;

length Fraud_Level $12.;

if Percentage > 40 then Fraud_Level="Severe";

else if Percentage > 10 then Fraud_Level="Moderate";

else Fraud_Level="Low";

run;

proc print data=fraud_analysis;

run;

%mend;

%fraud_check;

OUTPUT:


Obs	Campaign_ID	Traffic_Source	Utilization_Class	Clicks	Invalid_Clicks	Conversion_Rate	Loss_Amount	Fraud_Score	Percentage	Fees	Campaign_Date	Total_Cost	Campaign_Name	Fraud_Level
1	C101	Google	High	5000	50	4.2	200	-63.3	1.0	20	01JAN2025	220	C101_Google	Low
2	C102	Facebook	High	8000	4000	0.2	900	51.0	50.0	35	02JAN2025	935	C102_Facebook	Severe
3	C103	Affiliate	Medium	3500	200	2.3	150	-22.0	5.7	10	03JAN2025	160	C103_Affiliate	Low
4	C104	Bottraffic	High	9000	8500	0.1	1200	84.1	94.4	45	04JAN2025	1245	C104_Bottraffic	Severe
5	C105	Instagram	Medium	4200	150	3.0	180	-37.5	3.6	15	05JAN2025	195	C105_Instagram	Low
6	C106	Youtube	Low	3900	100	4.1	120	-60.2	2.6	12	06JAN2025	132	C106_Youtube	Low
7	C107	Google	Medium	5200	400	2.8	250	-30.6	7.7	18	07JAN2025	268	C107_Google	Low
8	C108	Facebook	Medium	6100	300	3.2	200	-40.6	4.9	17	08JAN2025	217	C108_Facebook	Low
9	C109	Affiliate	High	4500	600	1.5	310	-0.7	13.3	20	09JAN2025	330	C109_Affiliate	Moderate
10	C110	Bottraffic	High	10000	9000	0.0	1500	83.0	90.0	50	10JAN2025	1550	C110_Bottraffic	Severe
11	C111	Google	Low	4800	120	3.9	160	-56.3	2.5	14	11JAN2025	174	C111_Google	Low
12	C112	Facebook	Medium	5100	180	3.5	170	-47.6	3.5	15	12JAN2025	185	C112_Facebook	Low
13	C113	Youtube	Low	4300	140	3.8	165	-53.7	3.3	14	13JAN2025	179	C113_Youtube	Low
14	C114	Affiliate	High	3700	500	2.0	300	-10.6	13.5	22	14JAN2025	322	C114_Affiliate	Moderate
15	C115	Bottraffic	High	8800	8400	0.1	1100	84.9	95.5	40	15JAN2025	1140	C115_Bottraffic	Severe
16	C116	Instagram	Medium	4600	160	3.3	175	-43.6	3.5	16	16JAN2025	191	C116_Instagram	Low

proc contents data=fraud_analysis;

run;

OUTPUT:

The CONTENTS Procedure


Data Set Name	WORK.FRAUD_ANALYSIS	Observations	16
Member Type	DATA	Variables	14
Engine	V9	Indexes	0
Created	03/07/2026 11:49:58	Observation Length	328
Last Modified	03/07/2026 11:49:58	Deleted Observations	0
Protection		Compressed	NO
Data Set Type		Sorted	NO
Label
Data Representation	SOLARIS_X86_64, LINUX_X86_64, ALPHA_TRU64, LINUX_IA64
Encoding	utf-8 Unicode (UTF-8)


Engine/Host Dependent Information
Data Set Page Size	131072
Number of Data Set Pages	1
First Data Page	1
Max Obs per Page	399
Obs in First Data Page	16
Number of Data Set Repairs	0
Filename	/saswork/SAS_workAFA100004268_odaws02-apse1-2.oda.sas.com/SAS_workDBAA00004268_odaws02-apse1-2.oda.sas.com/fraud_analysis.sas7bdat
Release Created	9.0401M8
Host Created	Linux
Inode Number	134320160
Access Permission	rw-r--r--
Owner Name	u63247146
File Size	256KB
File Size (bytes)	262144


Alphabetic List of Variables and Attributes
#	Variable	Type	Len	Format
11	Campaign_Date	Num	8	DATE9.
1	Campaign_ID	Char	5
13	Campaign_Name	Char	200
4	Clicks	Num	8
6	Conversion_Rate	Num	8
10	Fees	Num	8
14	Fraud_Level	Char	12
8	Fraud_Score	Num	8
5	Invalid_Clicks	Num	8
7	Loss_Amount	Num	8
9	Percentage	Num	8
12	Total_Cost	Num	8
2	Traffic_Source	Char	20
3	Utilization_Class	Char	15

proc freq data=fraud_analysis;

tables Fraud_Level Traffic_Source;

run;

OUTPUT:

The FREQ Procedure


Fraud_Level	Frequency	Percent	Cumulative Frequency	Cumulative Percent
Low	10	62.50	10	62.50
Moderate	2	12.50	12	75.00
Severe	4	25.00	16	100.00


Traffic_Source	Frequency	Percent	Cumulative Frequency	Cumulative Percent
Affiliate	3	18.75	3	18.75
Bottraffic	3	18.75	6	37.50
Facebook	3	18.75	9	56.25
Google	3	18.75	12	75.00
Instagram	2	12.50	14	87.50
Youtube	2	12.50	16	100.00

6. Utilization Classification

%macro Utilization;

data Final;

set fraud_analysis;

length Utilization_Class $12.;

if Clicks > 8000 then Utilization_Class="Very High";

else if Clicks > 5000 then Utilization_Class="High";

else if Clicks > 3000 then Utilization_Class="Medium";

else Utilization_Class="Low";

run;

proc print data=Final;

run;

%mend;

%Utilization;

OUTPUT:


Obs	Campaign_ID	Traffic_Source	Utilization_Class	Clicks	Invalid_Clicks	Conversion_Rate	Loss_Amount	Fraud_Score	Percentage	Fees	Campaign_Date	Total_Cost	Campaign_Name	Fraud_Level
1	C101	Google	Medium	5000	50	4.2	200	-63.3	1.0	20	01JAN2025	220	C101_Google	Low
2	C102	Facebook	High	8000	4000	0.2	900	51.0	50.0	35	02JAN2025	935	C102_Facebook	Severe
3	C103	Affiliate	Medium	3500	200	2.3	150	-22.0	5.7	10	03JAN2025	160	C103_Affiliate	Low
4	C104	Bottraffic	Very High	9000	8500	0.1	1200	84.1	94.4	45	04JAN2025	1245	C104_Bottraffic	Severe
5	C105	Instagram	Medium	4200	150	3.0	180	-37.5	3.6	15	05JAN2025	195	C105_Instagram	Low
6	C106	Youtube	Medium	3900	100	4.1	120	-60.2	2.6	12	06JAN2025	132	C106_Youtube	Low
7	C107	Google	High	5200	400	2.8	250	-30.6	7.7	18	07JAN2025	268	C107_Google	Low
8	C108	Facebook	High	6100	300	3.2	200	-40.6	4.9	17	08JAN2025	217	C108_Facebook	Low
9	C109	Affiliate	Medium	4500	600	1.5	310	-0.7	13.3	20	09JAN2025	330	C109_Affiliate	Moderate
10	C110	Bottraffic	Very High	10000	9000	0.0	1500	83.0	90.0	50	10JAN2025	1550	C110_Bottraffic	Severe
11	C111	Google	Medium	4800	120	3.9	160	-56.3	2.5	14	11JAN2025	174	C111_Google	Low
12	C112	Facebook	High	5100	180	3.5	170	-47.6	3.5	15	12JAN2025	185	C112_Facebook	Low
13	C113	Youtube	Medium	4300	140	3.8	165	-53.7	3.3	14	13JAN2025	179	C113_Youtube	Low
14	C114	Affiliate	Medium	3700	500	2.0	300	-10.6	13.5	22	14JAN2025	322	C114_Affiliate	Moderate
15	C115	Bottraffic	Very High	8800	8400	0.1	1100	84.9	95.5	40	15JAN2025	1140	C115_Bottraffic	Severe
16	C116	Instagram	Medium	4600	160	3.3	175	-43.6	3.5	16	16JAN2025	191	C116_Instagram	Low

7. Cleaning Temporary Datasets

proc datasets library=work nolist;

delete combined_data merged_data fraud_transpose;

quit;

LOG:

 NOTE: The file WORK.MERGED_DATA (memtype=DATA) was not found, but appears on a DELETE statement.

 NOTE: Deleting WORK.COMBINED_DATA (memtype=DATA).

 NOTE: Deleting WORK.FRAUD_TRANSPOSE (memtype=DATA).

8. Final Dataset Creation (Including Utilization Classification)

data ad_fraud_final;

set ad_fraud_clean;

length Fraud_Level $10 Utilization_Class $12;

* Fraud classification;

if Percentage > 40 then Fraud_Level="Severe";

else if Percentage > 10 then Fraud_Level="Moderate";

else Fraud_Level="Low";

* Utilization classification;

if Clicks >= 8000 then Utilization_Class="Very High";

else if Clicks >= 5000 then Utilization_Class="High";

else if Clicks >= 3000 then Utilization_Class="Medium";

else Utilization_Class="Low";

run;

proc print data=ad_fraud_final;

run;

OUTPUT:


Obs	Campaign_ID	Traffic_Source	Utilization_Class	Clicks	Invalid_Clicks	Conversion_Rate	Loss_Amount	Fraud_Score	Percentage	Fees	Campaign_Date	Total_Cost	Campaign_Name	Fraud_Level
1	C101	Google	High	5000	50	4.2	200	-63.3	1.0	20	01JAN2025	220	C101_Google	Low
2	C102	Facebook	Very High	8000	4000	0.2	900	51.0	50.0	35	02JAN2025	935	C102_Facebook	Severe
3	C103	Affiliate	Medium	3500	200	2.3	150	-22.0	5.7	10	03JAN2025	160	C103_Affiliate	Low
4	C104	Bottraffic	Very High	9000	8500	0.1	1200	84.1	94.4	45	04JAN2025	1245	C104_Bottraffic	Severe
5	C105	Instagram	Medium	4200	150	3.0	180	-37.5	3.6	15	05JAN2025	195	C105_Instagram	Low
6	C106	Youtube	Medium	3900	100	4.1	120	-60.2	2.6	12	06JAN2025	132	C106_Youtube	Low
7	C107	Google	High	5200	400	2.8	250	-30.6	7.7	18	07JAN2025	268	C107_Google	Low
8	C108	Facebook	High	6100	300	3.2	200	-40.6	4.9	17	08JAN2025	217	C108_Facebook	Low
9	C109	Affiliate	Medium	4500	600	1.5	310	-0.7	13.3	20	09JAN2025	330	C109_Affiliate	Moderate
10	C110	Bottraffic	Very High	10000	9000	0.0	1500	83.0	90.0	50	10JAN2025	1550	C110_Bottraffic	Severe
11	C111	Google	Medium	4800	120	3.9	160	-56.3	2.5	14	11JAN2025	174	C111_Google	Low
12	C112	Facebook	High	5100	180	3.5	170	-47.6	3.5	15	12JAN2025	185	C112_Facebook	Low
13	C113	Youtube	Medium	4300	140	3.8	165	-53.7	3.3	14	13JAN2025	179	C113_Youtube	Low
14	C114	Affiliate	Medium	3700	500	2.0	300	-10.6	13.5	22	14JAN2025	322	C114_Affiliate	Moderate
15	C115	Bottraffic	Very High	8800	8400	0.1	1100	84.9	95.5	40	15JAN2025	1140	C115_Bottraffic	Severe
16	C116	Instagram	Medium	4600	160	3.3	175	-43.6	3.5	16	16JAN2025	191	C116_Instagram	Low

Explanation of the Code

`DATA ad_fraud_final`

Creates the final analytical dataset that contains all cleaned and derived variables.

`SET ad_fraud_clean`

Reads observations from the cleaned dataset and allows us to add more derived variables.

`LENGTH Fraud_Level $10 Utilization_Class $12`

Defines the length of the character variables.

This is important because SAS assigns the length based on the first value encountered if not specified.

This classification evaluates campaign traffic utilization.

Click Range	Utilization
≥ 8000	Very High
5000–7999	High
3000–4999	Medium
< 3000	Low

This helps analysts identify campaign popularity and resource usage.

Final Dataset Variables

Variable	Description
Campaign_ID	Campaign identifier
Traffic_Source	Advertising platform
Clicks	Total campaign clicks
Invalid_Clicks	Fraud clicks
Conversion_Rate	Conversion percentage
Loss_Amount	Fraud loss
Fraud_Score	Fraud risk score
Percentage	Invalid click percentage
Fees	Platform fees
Total_Cost	Loss + fees
Campaign_Name	Combined campaign label
Fraud_Level	Fraud classification
Utilization_Class	Campaign utilization category
Campaign_Date	Campaign date

Key Skills Demonstrated in This Project

• DATA step programming
• Character data cleaning techniques
• Numeric calculations and derived metrics
• Date function usage
• Dataset combination methods
• SAS macro development

• Fraud detection logic implementation
• Dataset management and cleanup

20 Key Points About This Project:

A simulated digital advertising fraud dataset was created to analyze campaign traffic behavior and identify fraudulent activity patterns.
The dataset included key variables such as Campaign_ID, Traffic_Source, Clicks, Invalid_Clicks, Conversion_Rate, Loss_Amount, Fraud_Score, Percentage, Fees, Utilization_Class, and Campaign_Date.
More than 15 observations were generated to simulate multiple advertising campaigns across different traffic sources.
Intentional data issues and inconsistencies were introduced to simulate real-world raw marketing data problems.
The raw dataset AD_FRAUD_RAW served as the starting point for the fraud analytics workflow.
Data cleaning and standardization were performed using character functions such as STRIP, TRIM, PROPCASE, UPCASE, LOWCASE, CAT, and CATX.
Numeric calculations were implemented using SUM and ROUND functions to derive campaign cost and fraud-related metrics.
The invalid click percentage was calculated using the formula (Invalid_Clicks / Clicks) × 100.
A Fraud_Score metric was derived to quantify the likelihood of fraudulent traffic within each campaign.
Campaign date analysis was implemented using date functions MDY, INTCK, and INTNX to handle campaign timelines and date intervals.
The SET statement was used to read and combine datasets vertically during data processing steps.
Dataset integration techniques such as MERGE were used to combine campaign data with additional analytical variables.
Additional observations could be appended using PROC APPEND, which efficiently adds new campaign records.
The PROC TRANSPOSE procedure was demonstrated to restructure campaign metrics for analytical reporting.
A reusable SAS macro was developed to automate fraud detection classification across campaign records.
Campaigns were categorized into Fraud Levels (Low, Moderate, Severe) based on the percentage of invalid clicks.
Campaign traffic was classified into Utilization Classes (Low, Medium, High, Very High) based on total click counts.
Temporary or intermediate datasets were cleaned using PROC DATASETS DELETE to maintain a structured SAS working environment.
The final analytical dataset AD_FRAUD_FINAL contained cleaned variables, calculated fraud metrics, and campaign classifications.
The project demonstrates how SAS programming techniques can be used to detect digital advertising fraud, clean marketing datasets, and support campaign performance analytics.

Project Summary

This project demonstrates how SAS programming can be used to detect and correct errors in digital advertising fraud data.Digital advertising platforms generate massive volumes of campaign performance data every day. However, raw marketing data often contains inconsistencies, formatting issues, incorrect calculations, and missing fraud indicators. Before meaningful analysis can be performed, data must be cleaned and validated.

In this project, a simulated dataset representing digital advertising campaigns was created with variables such as campaign identifiers, traffic sources, click counts, invalid clicks, conversion rates, fraud scores, and financial loss estimates.To simulate real-world analytics scenarios, intentional errors were introduced into the dataset. These errors included inconsistent formatting of traffic source names, manually entered fraud percentages, and unverified fraud scores.

Using SAS DATA step programming, these issues were corrected through various techniques including character functions, numeric functions, and calculated variables.Several SAS character functions were applied to standardize and clean textual data. Functions such as STRIP, TRIM, PROPCASE, UPCASE, and LOWCASE were used to remove unnecessary spaces and convert text values into consistent formats.Numeric functions were used to calculate derived metrics such as fraud percentages, total campaign cost, and fraud scores. Functions like SUM and ROUND helped ensure accurate financial calculations.

The project also demonstrated the use of date manipulation functions such as MDY, INTCK, and INTNX. These functions are important when analyzing campaign timelines, calculating campaign durations, and scheduling future campaign evaluations.

Dataset manipulation techniques such as SET, MERGE, and PROC APPEND were used to combine multiple datasets and manage campaign data efficiently. The PROC TRANSPOSE procedure was used to restructure data for analytical reporting.A fraud detection macro was developed to automatically classify campaigns into fraud categories based on invalid click percentages. Macros allow SAS programmers to automate repetitive logic and build reusable fraud detection frameworks.

Finally, unnecessary temporary datasets were removed using PROC DATASETS DELETE, ensuring that the working library remained clean and organized.The final dataset produced in this project provides a structured and validated view of campaign fraud risks. Analysts can use this information to identify suspicious campaigns and prevent advertising budget losses.

Conclusion

Digital advertising fraud analytics requires strong data management and fraud detection techniques. In this project we demonstrated how SAS programming can be used to build and analyze a campaign dataset containing digital advertising performance metrics.

We created a dataset with more than fifteen observations representing advertising campaigns from multiple traffic sources. The dataset contained important variables such as clicks, invalid clicks, conversion rates, fraud scores, and loss amounts.

Intentional errors were introduced to simulate real-world data problems. These errors included inconsistent formatting, manually entered percentages, and improper fraud scoring. We then corrected these issues using SAS DATA step programming.

Key SAS techniques used in this project include:

· Character functions for text cleaning

· Numeric functions for calculations

· Date functions for campaign timeline analysis

· Dataset manipulation using SET, MERGE, and APPEND

· Data restructuring using PROC TRANSPOSE

· Fraud detection logic implemented through SAS macros

· Dataset cleanup using PROC DATASETS

This type of analytics approach allows companies to detect suspicious campaign behavior, reduce financial losses, and improve digital marketing efficiency.

For SAS programmers preparing for analytics or fraud detection roles, projects like this demonstrate practical knowledge of data cleaning, dataset transformation, and automated fraud classification.

SAS INTERVIEW QUESTIONS

1. How do FIRST. and LAST. work?
Answer:
FIRST. and LAST. are temporary variables automatically created by SAS when using a BY statement after sorting data.

FIRST.variable = 1 for the first observation of each BY group

LAST.variable = 1 for the last observation of each BY group

Example: Used for group totals or identifying first/last records in a group.

2. Difference between CALL SYMPUT and CALL SYMPUTX?
Answer:

Feature

CALL SYMPUT

CALL SYMPUTX

Spaces

Keeps leading/trailing spaces

Removes spaces automatically

Type conversion

May cause numeric-to-character notes

Automatically converts values

Usage

Older method

Recommended method

Short: CALL SYMPUTX is an improved version of CALL SYMPUT.

3. Difference between INPUT and PUT functions?
Answer:

Function

Purpose

INPUT

Converts character → numeric

PUT

Converts numeric → character

Example:

INPUT(char_var,8.) → character to numeric

PUT(num_var,8.) → numeric to character.

Feature	CALL SYMPUT	CALL SYMPUTX
Spaces	Keeps leading/trailing spaces	Removes spaces automatically
Type conversion	May cause numeric-to-character notes	Automatically converts values
Usage	Older method	Recommended method

Function	Purpose
INPUT	Converts character → numeric
PUT	Converts numeric → character

--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
About the Author:
SAS Learning Hub is a data analytics and SAS programming platform focused on clinical, financial, and real-world data analysis. The content is created by professionals with academic training in Pharmaceutics and hands-on experience in Base SAS, PROC SQL, Macros, SDTM, and ADaM, providing practical and industry-relevant SAS learning resources.

Disclaimer:
The datasets and analysis in this article are created for educational and demonstration purposes only. They do not represent DIGITAL ADVERTISING data.

Our Mission:
This blog provides industry-focused SAS programming tutorials and analytics projects covering finance, healthcare, and technology.

This project is suitable for:
·  Students learning SAS
·  Data analysts building portfolios
·  Professionals preparing for SAS interviews
·  Bloggers writing about analytics
·  Clinical SAS Programmer
·  Research Data Analyst
·  Regulatory Data Validator
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Follow Us On :

--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

--->Follow our blog for more SAS-based analytics projects and industry data models.

---> Support Us By Following Our Blog..

To deepen your understanding of SAS analytics, please refer to our other data science and industry-focused projects listed below:

1.Which Country Truly Dominates the Olympics? – A Complete SAS Medal Efficiency Analytics Project
2.Which Airports Are Really the Busiest? – An End-to-End SAS Airport Traffic Analytics Project
3.Can Data Predict Election Outcomes? – A Complete SAS Voting Analytics Project

--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

About Us | Contact | Privacy Policy

422.Can We Detect And Correct Errors In Digital Advertising Fraud Data Using Advanced SAS Programming?

Detecting And Correcting Digital Advertising Fraud Data Errors Using Advanced SAS Programming

HERE IN THIS PROJECT WE USED THESE SAS STATEMENTS —DATA STEP | LENGTH | INPUT | DATALINES | SET | MERGE | PROC SORT | PROC PRINT | PROC CONTENTS | PROC FREQ | PROC TRANSPOSE | PROC APPEND | PROC DATASETS DELETE | MACRO / %MACRO / %MEND | NUMERIC AND CHARACTER FUNCTIONS

Intentional Errors in the Dataset

Error 1 — Traffic Source Formatting

Error 2 — Percentage Not Derived

Error 3 — Fraud Score Not Calculated Programmatically

Error 4 — Character Cleaning

Explanation of Character Functions

STRIP

TRIM

CAT

CATX

UPCASE

LOWCASE

PROPCASE

Numeric Functions

SUM

ROUND

Date Functions

MDY

INTCK

INTNX

Explanation of the Code

DATA ad_fraud_final

SET ad_fraud_clean

LENGTH Fraud_Level $10 Utilization_Class $12

Project Summary

Conclusion

Follow Us On :

To deepen your understanding of SAS analytics, please refer to our other data science and industry-focused projects listed below:

1.Which Country Truly Dominates the Olympics? – A Complete SAS Medal Efficiency Analytics Project2.Which Airports Are Really the Busiest? – An End-to-End SAS Airport Traffic Analytics Project3.Can Data Predict Election Outcomes? – A Complete SAS Voting Analytics Project

--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

About Us | Contact | Privacy Policy

Comments

Post a Comment

Popular posts from this blog

Beyond Fabric and Fashion: Turning the World’s Most Beautiful Sarees Dataset into Structured Intelligence with SAS and R

Data Cleaning Secrets Using Famous Food Dataset:Handling Duplicate Records in SAS

Global AI Trends Unlocked Through SCAN and SUBSTR Precision in SAS

`DATA ad_fraud_final`

`SET ad_fraud_clean`

`LENGTH Fraud_Level $10 Utilization_Class $12`

1.Which Country Truly Dominates the Olympics? – A Complete SAS Medal Efficiency Analytics Project
2.Which Airports Are Really the Busiest? – An End-to-End SAS Airport Traffic Analytics Project
3.Can Data Predict Election Outcomes? – A Complete SAS Voting Analytics Project