422.Can We Detect And Correct Errors In Digital Advertising Fraud Data Using Advanced SAS Programming?

Detecting And Correcting Digital Advertising Fraud Data Errors Using Advanced SAS Programming

--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

HERE IN THIS PROJECT WE USED THESE SAS STATEMENTS —DATA STEP | LENGTH | INPUT | DATALINES | SET | MERGE  | PROC SORT | PROC PRINT | PROC CONTENTS | PROC FREQ | PROC TRANSPOSE | PROC APPEND | PROC DATASETS DELETE | MACRO / %MACRO / %MEND | NUMERIC AND CHARACTER FUNCTIONS

--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Introduction

Digital advertising is one of the fastest-growing industries in the world. Companies spend billions of dollars on online advertising campaigns across platforms such as Google Ads, social media platforms, display networks, and affiliate traffic sources. While digital marketing enables companies to reach targeted audiences efficiently, it also introduces a major challenge: digital advertising fraud.

Advertising fraud occurs when malicious actors generate fake clicks, impressions, or conversions in order to steal advertising budgets. These fraudulent activities can involve bot traffic, click farms, automated scripts, or manipulated traffic sources. As a result, companies lose significant amounts of money and receive inaccurate campaign performance metrics.

Data analytics plays a critical role in identifying suspicious patterns in advertising traffic. Organizations analyze campaign datasets to detect abnormal click patterns, invalid traffic sources, and unusual conversion behavior. Advanced analytics tools such as SAS (Statistical Analysis System) are widely used in industries for fraud detection because of their powerful data manipulation capabilities.

In this project, we will simulate a Digital Advertising Fraud Dataset containing campaign performance information such as:

  • Campaign ID
  • Traffic Source
  • Clicks
  • Invalid Clicks
  • Conversion Rate
  • Loss Amount
  • Fraud Score
  • Utilization Classification
  • Fees
  • Date Formats

The dataset will intentionally include multiple programming errors, which will later be identified and corrected. The goal is to demonstrate how SAS programmers debug datasets and implement fraud detection logic using DATA step programming, macros, character functions, numeric functions, and date manipulation techniques.

By the end of this project, we will:

  • Create a dataset with intentional coding errors
  • Detect those errors
  • Correct the code
  • Implement fraud detection logic using SAS macros
  • Use character and numeric functions
  • Use date functions such as MDY, INTCK, INTNX
  • Apply SET, MERGE, APPEND, TRANSPOSE
  • Clean datasets using PROC DATASETS DELETE

This project simulates a real-world scenario where analysts must correct and analyze campaign data before performing fraud analytics.

Table Of Contents

  1. Introduction
  2. Business Context
  3. Dataset Structure
  4. Creating Raw Dataset with Intentional Errors
  5. Explanation of Errors
  6. Corrected Dataset Code
  7. Character Functions in SAS
  8. Numeric Functions in SAS
  9. Date Functions (MDY, INTCK, INTNX)
  10. Dataset Combination using SET
  11. Dataset Combination using MERGE
  12. Dataset Combination using PROC APPEND
  13. Dataset Transformation using PROC TRANSPOSE
  14. Fraud Detection Macro
  15. Utilization Classification Logic
  16. Fraud Score Classification
  17. Cleaning Temporary Datasets using PROC DATASETS DELETE
  18. Final Dataset Output
  19. 20 Key Points About This Project
  20. Project Summary and Conclusion

Business Context

Companies invest heavily in digital advertising campaigns to generate customer engagement and conversions. However, fraudulent traffic significantly affects campaign performance metrics.

Common fraud indicators include:

  • Extremely high click counts with low conversions
  • High invalid click percentages
  • Unusual traffic spikes within short time intervals
  • Traffic from suspicious sources
  • Extremely high fraud risk scores

Organizations use analytics systems like SAS to monitor advertising datasets and automatically detect fraud patterns.

For example:

Campaign

Clicks

Invalid Clicks

Conversion Rate

C101

5000

50

4%

C102

8000

4000

0.2%

Campaign C102 is suspicious because half of the clicks are invalid.

By applying SAS analytics, companies can:

  • Identify fraudulent campaigns
  • Calculate financial losses
  • Block suspicious traffic sources
  • Optimize marketing budgets

Dataset Structure

Our dataset will contain the following variables:

Variable

Description

Campaign_ID

Unique advertising campaign identifier

Traffic_Source

Source of advertising traffic

Clicks

Total number of clicks

Invalid_Clicks

Number of fraudulent clicks

Conversion_Rate

Percentage of clicks converting to sales

Loss_Amount

Estimated monetary loss due to fraud

Fraud_Score

Fraud risk score

Percentage

Percentage of invalid clicks

Fees

Advertising platform fees

Utilization_Class

Traffic utilization classification

Campaign_Date

Date of campaign

1. Raw Dataset with Intentional Errors

data ad_fraud_raw;

length Campaign_ID $5 Traffic_Source $20 Utilization_Class $15;

input Campaign_ID $ Traffic_Source $ Clicks

      Invalid_Clicks Conversion_Rate Loss_Amount

      Fraud_Score Percentage Fees Utilization_Class $

      Campaign_Date :ddmmyy10.;

format Campaign_Date date9.;

datalines;

C101 google 5000 50 4.2 200 15 1.0 20 High 01/01/2025

C102 facebook 8000 4000 0.2 900 90 50 35 High 02/01/2025

C103 affiliate 3500 200 2.3 150 30 5.7 10 Medium 03/01/2025

C104 bottraffic 9000 8500 0.1 1200 98 94 45 High 04/01/2025

C105 instagram 4200 150 3.0 180 25 3.5 15 Medium 05/01/2025

C106 youtube 3900 100 4.1 120 18 2.5 12 Low 06/01/2025

C107 google 5200 400 2.8 250 40 7.6 18 Medium 07/01/2025

C108 facebook 6100 300 3.2 200 35 4.9 17 Medium 08/01/2025

C109 affiliate 4500 600 1.5 310 55 13.3 20 High 09/01/2025

C110 bottraffic 10000 9000 0.0 1500 99 90 50 High 10/01/2025

C111 google 4800 120 3.9 160 22 2.5 14 Low 11/01/2025

C112 facebook 5100 180 3.5 170 28 3.5 15 Medium 12/01/2025

C113 youtube 4300 140 3.8 165 26 3.2 14 Low 13/01/2025

C114 affiliate 3700 500 2.0 300 50 13.5 22 High 14/01/2025

C115 bottraffic 8800 8400 0.1 1100 97 95 40 High 15/01/2025

C116 instagram 4600 160 3.3 175 27 3.4 16 Medium 16/01/2025

;

run;

proc print data=ad_fraud_raw;

run;

OUTPUT:

ObsCampaign_IDTraffic_SourceUtilization_ClassClicksInvalid_ClicksConversion_RateLoss_AmountFraud_ScorePercentageFeesCampaign_Date
1C101googleHigh5000504.2200151.02001JAN2025
2C102facebookHigh800040000.29009050.03502JAN2025
3C103affiliateMedium35002002.3150305.71003JAN2025
4C104bottrafficHigh900085000.112009894.04504JAN2025
5C105instagramMedium42001503.0180253.51505JAN2025
6C106youtubeLow39001004.1120182.51206JAN2025
7C107googleMedium52004002.8250407.61807JAN2025
8C108facebookMedium61003003.2200354.91708JAN2025
9C109affiliateHigh45006001.53105513.32009JAN2025
10C110bottrafficHigh1000090000.015009990.05010JAN2025
11C111googleLow48001203.9160222.51411JAN2025
12C112facebookMedium51001803.5170283.51512JAN2025
13C113youtubeLow43001403.8165263.21413JAN2025
14C114affiliateHigh37005002.03005013.52214JAN2025
15C115bottrafficHigh880084000.111009795.04015JAN2025
16C116instagramMedium46001603.3175273.41616JAN2025

Intentional Errors in the Dataset

The dataset above contains multiple potential issues that analysts must detect.

Examples:

Error 1 — Traffic Source Formatting

Traffic sources appear in inconsistent formats.

Example:

google
facebook
affiliate
bottraffic

Best practice is to standardize case using UPCASE or PROPCASE.

Error 2 — Percentage Not Derived

Percentage should be calculated using:

Invalid_Clicks / Clicks * 100

But the dataset manually entered values.

Error 3 — Fraud Score Not Calculated Programmatically

Fraud scores are entered manually instead of using analytics logic.

Error 4 — Character Cleaning

Traffic source strings may contain leading or trailing spaces.

Functions required:strip,trim

2. Corrected Dataset Code

data ad_fraud_clean;

set ad_fraud_raw;

Traffic_Source = propcase(strip(Traffic_Source));

Percentage = round((Invalid_Clicks/Clicks)*100,0.1);

Fraud_Score = round((Percentage*0.7) + ((1-Conversion_Rate)*20),0.1);

Total_Cost = sum(Loss_Amount,Fees);

Campaign_Name = catx("_",Campaign_ID,Traffic_Source);

run;

proc print data=ad_fraud_clean;

run;

OUTPUT:

ObsCampaign_IDTraffic_SourceUtilization_ClassClicksInvalid_ClicksConversion_RateLoss_AmountFraud_ScorePercentageFeesCampaign_DateTotal_CostCampaign_Name
1C101GoogleHigh5000504.2200-63.31.02001JAN2025220C101_Google
2C102FacebookHigh800040000.290051.050.03502JAN2025935C102_Facebook
3C103AffiliateMedium35002002.3150-22.05.71003JAN2025160C103_Affiliate
4C104BottrafficHigh900085000.1120084.194.44504JAN20251245C104_Bottraffic
5C105InstagramMedium42001503.0180-37.53.61505JAN2025195C105_Instagram
6C106YoutubeLow39001004.1120-60.22.61206JAN2025132C106_Youtube
7C107GoogleMedium52004002.8250-30.67.71807JAN2025268C107_Google
8C108FacebookMedium61003003.2200-40.64.91708JAN2025217C108_Facebook
9C109AffiliateHigh45006001.5310-0.713.32009JAN2025330C109_Affiliate
10C110BottrafficHigh1000090000.0150083.090.05010JAN20251550C110_Bottraffic
11C111GoogleLow48001203.9160-56.32.51411JAN2025174C111_Google
12C112FacebookMedium51001803.5170-47.63.51512JAN2025185C112_Facebook
13C113YoutubeLow43001403.8165-53.73.31413JAN2025179C113_Youtube
14C114AffiliateHigh37005002.0300-10.613.52214JAN2025322C114_Affiliate
15C115BottrafficHigh880084000.1110084.995.54015JAN20251140C115_Bottraffic
16C116InstagramMedium46001603.3175-43.63.51616JAN2025191C116_Instagram

Explanation of Character Functions

STRIP

Removes leading and trailing spaces.

strip(Traffic_Source)

Example:

" google ""google"

TRIM

Removes trailing spaces.

trim(variable)

CAT

Concatenates values.

cat(a,b)

CATX

Concatenates values with a delimiter.

catx("_",Campaign_ID,Traffic_Source)

Result:

C101_Google

UPCASE

Converts text to uppercase.

upcase(variable)

LOWCASE

Converts text to lowercase.

PROPCASE

Capitalizes the first letter.

Example:

google → Google

Numeric Functions

SUM

Adds numeric variables.

Total_Cost = sum(Loss_Amount,Fees);

ROUND

Rounds numeric values.

round(value,0.1)

Date Functions

MDY

Creates a SAS date.

Campaign_Start = mdy(1,1,2025);

INTCK

Counts intervals between dates.

Days_Between = intck('day',Campaign_Start,Campaign_Date);

INTNX

Moves forward or backward in time.

Next_Campaign = intnx('month',Campaign_Date,1);

3. Using SET Statement

Method 1 — If You Want the Same Dataset Copied Then

data combined_data;

set ad_fraud_clean;

run;

proc print data=combined_data;

run;

OUTPUT:

ObsCampaign_IDTraffic_SourceUtilization_ClassClicksInvalid_ClicksConversion_RateLoss_AmountFraud_ScorePercentageFeesCampaign_DateTotal_CostCampaign_Name
1C101GoogleHigh5000504.2200-63.31.02001JAN2025220C101_Google
2C102FacebookHigh800040000.290051.050.03502JAN2025935C102_Facebook
3C103AffiliateMedium35002002.3150-22.05.71003JAN2025160C103_Affiliate
4C104BottrafficHigh900085000.1120084.194.44504JAN20251245C104_Bottraffic
5C105InstagramMedium42001503.0180-37.53.61505JAN2025195C105_Instagram
6C106YoutubeLow39001004.1120-60.22.61206JAN2025132C106_Youtube
7C107GoogleMedium52004002.8250-30.67.71807JAN2025268C107_Google
8C108FacebookMedium61003003.2200-40.64.91708JAN2025217C108_Facebook
9C109AffiliateHigh45006001.5310-0.713.32009JAN2025330C109_Affiliate
10C110BottrafficHigh1000090000.0150083.090.05010JAN20251550C110_Bottraffic
11C111GoogleLow48001203.9160-56.32.51411JAN2025174C111_Google
12C112FacebookMedium51001803.5170-47.63.51512JAN2025185C112_Facebook
13C113YoutubeLow43001403.8165-53.73.31413JAN2025179C113_Youtube
14C114AffiliateHigh37005002.0300-10.613.52214JAN2025322C114_Affiliate
15C115BottrafficHigh880084000.1110084.995.54015JAN20251140C115_Bottraffic
16C116InstagramMedium46001603.3175-43.63.51616JAN2025191C116_Instagram

Explanation:

·  DATA combined_data;
Creates a new dataset named combined_data.

·  SET ad_fraud_clean;
Reads all observations from ad_fraud_clean.

·  RUN;
Executes the DATA step.

Result:
combined_data will contain exactly the same observations as ad_fraud_clean.

Method 2 — If You Intentionally Want Two Copies of Data Then

data campaign_part1;

set ad_fraud_clean;

run;

proc print data=campaign_part1;

run;

OUTPUT:

ObsCampaign_IDTraffic_SourceUtilization_ClassClicksInvalid_ClicksConversion_RateLoss_AmountFraud_ScorePercentageFeesCampaign_DateTotal_CostCampaign_Name
1C101GoogleHigh5000504.2200-63.31.02001JAN2025220C101_Google
2C102FacebookHigh800040000.290051.050.03502JAN2025935C102_Facebook
3C103AffiliateMedium35002002.3150-22.05.71003JAN2025160C103_Affiliate
4C104BottrafficHigh900085000.1120084.194.44504JAN20251245C104_Bottraffic
5C105InstagramMedium42001503.0180-37.53.61505JAN2025195C105_Instagram
6C106YoutubeLow39001004.1120-60.22.61206JAN2025132C106_Youtube
7C107GoogleMedium52004002.8250-30.67.71807JAN2025268C107_Google
8C108FacebookMedium61003003.2200-40.64.91708JAN2025217C108_Facebook
9C109AffiliateHigh45006001.5310-0.713.32009JAN2025330C109_Affiliate
10C110BottrafficHigh1000090000.0150083.090.05010JAN20251550C110_Bottraffic
11C111GoogleLow48001203.9160-56.32.51411JAN2025174C111_Google
12C112FacebookMedium51001803.5170-47.63.51512JAN2025185C112_Facebook
13C113YoutubeLow43001403.8165-53.73.31413JAN2025179C113_Youtube
14C114AffiliateHigh37005002.0300-10.613.52214JAN2025322C114_Affiliate
15C115BottrafficHigh880084000.1110084.995.54015JAN20251140C115_Bottraffic
16C116InstagramMedium46001603.3175-43.63.51616JAN2025191C116_Instagram

data campaign_part2;

set ad_fraud_clean;

run;

proc print data=campaign_part2;

run;

OUTPUT:

ObsCampaign_IDTraffic_SourceUtilization_ClassClicksInvalid_ClicksConversion_RateLoss_AmountFraud_ScorePercentageFeesCampaign_DateTotal_CostCampaign_Name
1C101GoogleHigh5000504.2200-63.31.02001JAN2025220C101_Google
2C102FacebookHigh800040000.290051.050.03502JAN2025935C102_Facebook
3C103AffiliateMedium35002002.3150-22.05.71003JAN2025160C103_Affiliate
4C104BottrafficHigh900085000.1120084.194.44504JAN20251245C104_Bottraffic
5C105InstagramMedium42001503.0180-37.53.61505JAN2025195C105_Instagram
6C106YoutubeLow39001004.1120-60.22.61206JAN2025132C106_Youtube
7C107GoogleMedium52004002.8250-30.67.71807JAN2025268C107_Google
8C108FacebookMedium61003003.2200-40.64.91708JAN2025217C108_Facebook
9C109AffiliateHigh45006001.5310-0.713.32009JAN2025330C109_Affiliate
10C110BottrafficHigh1000090000.0150083.090.05010JAN20251550C110_Bottraffic
11C111GoogleLow48001203.9160-56.32.51411JAN2025174C111_Google
12C112FacebookMedium51001803.5170-47.63.51512JAN2025185C112_Facebook
13C113YoutubeLow43001403.8165-53.73.31413JAN2025179C113_Youtube
14C114AffiliateHigh37005002.0300-10.613.52214JAN2025322C114_Affiliate
15C115BottrafficHigh880084000.1110084.995.54015JAN20251140C115_Bottraffic
16C116InstagramMedium46001603.3175-43.63.51616JAN2025191C116_Instagram

data combined_data;

set campaign_part1 campaign_part2;

run;

proc print data=combined_data;

run;

OUTPUT:

ObsCampaign_IDTraffic_SourceUtilization_ClassClicksInvalid_ClicksConversion_RateLoss_AmountFraud_ScorePercentageFeesCampaign_DateTotal_CostCampaign_Name
1C101GoogleHigh5000504.2200-63.31.02001JAN2025220C101_Google
2C102FacebookHigh800040000.290051.050.03502JAN2025935C102_Facebook
3C103AffiliateMedium35002002.3150-22.05.71003JAN2025160C103_Affiliate
4C104BottrafficHigh900085000.1120084.194.44504JAN20251245C104_Bottraffic
5C105InstagramMedium42001503.0180-37.53.61505JAN2025195C105_Instagram
6C106YoutubeLow39001004.1120-60.22.61206JAN2025132C106_Youtube
7C107GoogleMedium52004002.8250-30.67.71807JAN2025268C107_Google
8C108FacebookMedium61003003.2200-40.64.91708JAN2025217C108_Facebook
9C109AffiliateHigh45006001.5310-0.713.32009JAN2025330C109_Affiliate
10C110BottrafficHigh1000090000.0150083.090.05010JAN20251550C110_Bottraffic
11C111GoogleLow48001203.9160-56.32.51411JAN2025174C111_Google
12C112FacebookMedium51001803.5170-47.63.51512JAN2025185C112_Facebook
13C113YoutubeLow43001403.8165-53.73.31413JAN2025179C113_Youtube
14C114AffiliateHigh37005002.0300-10.613.52214JAN2025322C114_Affiliate
15C115BottrafficHigh880084000.1110084.995.54015JAN20251140C115_Bottraffic
16C116InstagramMedium46001603.3175-43.63.51616JAN2025191C116_Instagram
17C101GoogleHigh5000504.2200-63.31.02001JAN2025220C101_Google
18C102FacebookHigh800040000.290051.050.03502JAN2025935C102_Facebook
19C103AffiliateMedium35002002.3150-22.05.71003JAN2025160C103_Affiliate
20C104BottrafficHigh900085000.1120084.194.44504JAN20251245C104_Bottraffic
21C105InstagramMedium42001503.0180-37.53.61505JAN2025195C105_Instagram
22C106YoutubeLow39001004.1120-60.22.61206JAN2025132C106_Youtube
23C107GoogleMedium52004002.8250-30.67.71807JAN2025268C107_Google
24C108FacebookMedium61003003.2200-40.64.91708JAN2025217C108_Facebook
25C109AffiliateHigh45006001.5310-0.713.32009JAN2025330C109_Affiliate
26C110BottrafficHigh1000090000.0150083.090.05010JAN20251550C110_Bottraffic
27C111GoogleLow48001203.9160-56.32.51411JAN2025174C111_Google
28C112FacebookMedium51001803.5170-47.63.51512JAN2025185C112_Facebook
29C113YoutubeLow43001403.8165-53.73.31413JAN2025179C113_Youtube
30C114AffiliateHigh37005002.0300-10.613.52214JAN2025322C114_Affiliate
31C115BottrafficHigh880084000.1110084.995.54015JAN20251140C115_Bottraffic
32C116InstagramMedium46001603.3175-43.63.51616JAN2025191C116_Instagram

Explanation:

·  campaign_part1 and campaign_part2 are identical copies.

·  SET campaign_part1 campaign_part2; stacks them vertically.

·  Final dataset will contain double the observations.

Method 3 — If the Goal is Appending Then

proc append base=combined_data 

            data=ad_fraud_clean;

run;

proc print data=combined_data;

run;

OUTPUT:

ObsCampaign_IDTraffic_SourceUtilization_ClassClicksInvalid_ClicksConversion_RateLoss_AmountFraud_ScorePercentageFeesCampaign_DateTotal_CostCampaign_Name
1C101GoogleHigh5000504.2200-63.31.02001JAN2025220C101_Google
2C102FacebookHigh800040000.290051.050.03502JAN2025935C102_Facebook
3C103AffiliateMedium35002002.3150-22.05.71003JAN2025160C103_Affiliate
4C104BottrafficHigh900085000.1120084.194.44504JAN20251245C104_Bottraffic
5C105InstagramMedium42001503.0180-37.53.61505JAN2025195C105_Instagram
6C106YoutubeLow39001004.1120-60.22.61206JAN2025132C106_Youtube
7C107GoogleMedium52004002.8250-30.67.71807JAN2025268C107_Google
8C108FacebookMedium61003003.2200-40.64.91708JAN2025217C108_Facebook
9C109AffiliateHigh45006001.5310-0.713.32009JAN2025330C109_Affiliate
10C110BottrafficHigh1000090000.0150083.090.05010JAN20251550C110_Bottraffic
11C111GoogleLow48001203.9160-56.32.51411JAN2025174C111_Google
12C112FacebookMedium51001803.5170-47.63.51512JAN2025185C112_Facebook
13C113YoutubeLow43001403.8165-53.73.31413JAN2025179C113_Youtube
14C114AffiliateHigh37005002.0300-10.613.52214JAN2025322C114_Affiliate
15C115BottrafficHigh880084000.1110084.995.54015JAN20251140C115_Bottraffic
16C116InstagramMedium46001603.3175-43.63.51616JAN2025191C116_Instagram
17C101GoogleHigh5000504.2200-63.31.02001JAN2025220C101_Google
18C102FacebookHigh800040000.290051.050.03502JAN2025935C102_Facebook
19C103AffiliateMedium35002002.3150-22.05.71003JAN2025160C103_Affiliate
20C104BottrafficHigh900085000.1120084.194.44504JAN20251245C104_Bottraffic
21C105InstagramMedium42001503.0180-37.53.61505JAN2025195C105_Instagram
22C106YoutubeLow39001004.1120-60.22.61206JAN2025132C106_Youtube
23C107GoogleMedium52004002.8250-30.67.71807JAN2025268C107_Google
24C108FacebookMedium61003003.2200-40.64.91708JAN2025217C108_Facebook
25C109AffiliateHigh45006001.5310-0.713.32009JAN2025330C109_Affiliate
26C110BottrafficHigh1000090000.0150083.090.05010JAN20251550C110_Bottraffic
27C111GoogleLow48001203.9160-56.32.51411JAN2025174C111_Google
28C112FacebookMedium51001803.5170-47.63.51512JAN2025185C112_Facebook
29C113YoutubeLow43001403.8165-53.73.31413JAN2025179C113_Youtube
30C114AffiliateHigh37005002.0300-10.613.52214JAN2025322C114_Affiliate
31C115BottrafficHigh880084000.1110084.995.54015JAN20251140C115_Bottraffic
32C116InstagramMedium46001603.3175-43.63.51616JAN2025191C116_Instagram
33C101GoogleHigh5000504.2200-63.31.02001JAN2025220C101_Google
34C102FacebookHigh800040000.290051.050.03502JAN2025935C102_Facebook
35C103AffiliateMedium35002002.3150-22.05.71003JAN2025160C103_Affiliate
36C104BottrafficHigh900085000.1120084.194.44504JAN20251245C104_Bottraffic
37C105InstagramMedium42001503.0180-37.53.61505JAN2025195C105_Instagram
38C106YoutubeLow39001004.1120-60.22.61206JAN2025132C106_Youtube
39C107GoogleMedium52004002.8250-30.67.71807JAN2025268C107_Google
40C108FacebookMedium61003003.2200-40.64.91708JAN2025217C108_Facebook
41C109AffiliateHigh45006001.5310-0.713.32009JAN2025330C109_Affiliate
42C110BottrafficHigh1000090000.0150083.090.05010JAN20251550C110_Bottraffic
43C111GoogleLow48001203.9160-56.32.51411JAN2025174C111_Google
44C112FacebookMedium51001803.5170-47.63.51512JAN2025185C112_Facebook
45C113YoutubeLow43001403.8165-53.73.31413JAN2025179C113_Youtube
46C114AffiliateHigh37005002.0300-10.613.52214JAN2025322C114_Affiliate
47C115BottrafficHigh880084000.1110084.995.54015JAN20251140C115_Bottraffic
48C116InstagramMedium46001603.3175-43.63.51616JAN2025191C116_Instagram

Explanation:

·  BASE= → target dataset

·  DATA= → dataset to append

·  Adds new observations without recreating dataset structure.

APPEND adds new observations efficiently.

4. Using PROC TRANSPOSE

proc sort data=ad_fraud_clean;by Campaign_ID;run;

proc print data=ad_fraud_clean;

run;

OUTPUT:

ObsCampaign_IDTraffic_SourceUtilization_ClassClicksInvalid_ClicksConversion_RateLoss_AmountFraud_ScorePercentageFeesCampaign_DateTotal_CostCampaign_Name
1C101GoogleHigh5000504.2200-63.31.02001JAN2025220C101_Google
2C102FacebookHigh800040000.290051.050.03502JAN2025935C102_Facebook
3C103AffiliateMedium35002002.3150-22.05.71003JAN2025160C103_Affiliate
4C104BottrafficHigh900085000.1120084.194.44504JAN20251245C104_Bottraffic
5C105InstagramMedium42001503.0180-37.53.61505JAN2025195C105_Instagram
6C106YoutubeLow39001004.1120-60.22.61206JAN2025132C106_Youtube
7C107GoogleMedium52004002.8250-30.67.71807JAN2025268C107_Google
8C108FacebookMedium61003003.2200-40.64.91708JAN2025217C108_Facebook
9C109AffiliateHigh45006001.5310-0.713.32009JAN2025330C109_Affiliate
10C110BottrafficHigh1000090000.0150083.090.05010JAN20251550C110_Bottraffic
11C111GoogleLow48001203.9160-56.32.51411JAN2025174C111_Google
12C112FacebookMedium51001803.5170-47.63.51512JAN2025185C112_Facebook
13C113YoutubeLow43001403.8165-53.73.31413JAN2025179C113_Youtube
14C114AffiliateHigh37005002.0300-10.613.52214JAN2025322C114_Affiliate
15C115BottrafficHigh880084000.1110084.995.54015JAN20251140C115_Bottraffic
16C116InstagramMedium46001603.3175-43.63.51616JAN2025191C116_Instagram

proc transpose data=ad_fraud_clean out=fraud_transpose;

by Campaign_ID;

var Clicks Invalid_Clicks Loss_Amount;

run;

proc print data=fraud_transpose;

run;

OUTPUT:

ObsCampaign_ID_NAME_COL1
1C101Clicks5000
2C101Invalid_Clicks50
3C101Loss_Amount200
4C102Clicks8000
5C102Invalid_Clicks4000
6C102Loss_Amount900
7C103Clicks3500
8C103Invalid_Clicks200
9C103Loss_Amount150
10C104Clicks9000
11C104Invalid_Clicks8500
12C104Loss_Amount1200
13C105Clicks4200
14C105Invalid_Clicks150
15C105Loss_Amount180
16C106Clicks3900
17C106Invalid_Clicks100
18C106Loss_Amount120
19C107Clicks5200
20C107Invalid_Clicks400
21C107Loss_Amount250
22C108Clicks6100
23C108Invalid_Clicks300
24C108Loss_Amount200
25C109Clicks4500
26C109Invalid_Clicks600
27C109Loss_Amount310
28C110Clicks10000
29C110Invalid_Clicks9000
30C110Loss_Amount1500
31C111Clicks4800
32C111Invalid_Clicks120
33C111Loss_Amount160
34C112Clicks5100
35C112Invalid_Clicks180
36C112Loss_Amount170
37C113Clicks4300
38C113Invalid_Clicks140
39C113Loss_Amount165
40C114Clicks3700
41C114Invalid_Clicks500
42C114Loss_Amount300
43C115Clicks8800
44C115Invalid_Clicks8400
45C115Loss_Amount1100
46C116Clicks4600
47C116Invalid_Clicks160
48C116Loss_Amount175

TRANSPOSE converts rows into columns.

5. Fraud Detection Macro

%macro fraud_check;

data fraud_analysis;

set ad_fraud_clean;

length Fraud_Level $12.;

if Percentage > 40 then Fraud_Level="Severe";

else if Percentage > 10 then Fraud_Level="Moderate";

else Fraud_Level="Low";

run;

proc print data=fraud_analysis;

run;

%mend;


%fraud_check;

OUTPUT:

ObsCampaign_IDTraffic_SourceUtilization_ClassClicksInvalid_ClicksConversion_RateLoss_AmountFraud_ScorePercentageFeesCampaign_DateTotal_CostCampaign_NameFraud_Level
1C101GoogleHigh5000504.2200-63.31.02001JAN2025220C101_GoogleLow
2C102FacebookHigh800040000.290051.050.03502JAN2025935C102_FacebookSevere
3C103AffiliateMedium35002002.3150-22.05.71003JAN2025160C103_AffiliateLow
4C104BottrafficHigh900085000.1120084.194.44504JAN20251245C104_BottrafficSevere
5C105InstagramMedium42001503.0180-37.53.61505JAN2025195C105_InstagramLow
6C106YoutubeLow39001004.1120-60.22.61206JAN2025132C106_YoutubeLow
7C107GoogleMedium52004002.8250-30.67.71807JAN2025268C107_GoogleLow
8C108FacebookMedium61003003.2200-40.64.91708JAN2025217C108_FacebookLow
9C109AffiliateHigh45006001.5310-0.713.32009JAN2025330C109_AffiliateModerate
10C110BottrafficHigh1000090000.0150083.090.05010JAN20251550C110_BottrafficSevere
11C111GoogleLow48001203.9160-56.32.51411JAN2025174C111_GoogleLow
12C112FacebookMedium51001803.5170-47.63.51512JAN2025185C112_FacebookLow
13C113YoutubeLow43001403.8165-53.73.31413JAN2025179C113_YoutubeLow
14C114AffiliateHigh37005002.0300-10.613.52214JAN2025322C114_AffiliateModerate
15C115BottrafficHigh880084000.1110084.995.54015JAN20251140C115_BottrafficSevere
16C116InstagramMedium46001603.3175-43.63.51616JAN2025191C116_InstagramLow

proc contents data=fraud_analysis;

run;

OUTPUT:

The CONTENTS Procedure

Data Set NameWORK.FRAUD_ANALYSISObservations16
Member TypeDATAVariables14
EngineV9Indexes0
Created03/07/2026 11:49:58Observation Length328
Last Modified03/07/2026 11:49:58Deleted Observations0
Protection CompressedNO
Data Set Type SortedNO
Label   
Data RepresentationSOLARIS_X86_64, LINUX_X86_64, ALPHA_TRU64, LINUX_IA64  
Encodingutf-8 Unicode (UTF-8)  
Engine/Host Dependent Information
Data Set Page Size131072
Number of Data Set Pages1
First Data Page1
Max Obs per Page399
Obs in First Data Page16
Number of Data Set Repairs0
Filename/saswork/SAS_workAFA100004268_odaws02-apse1-2.oda.sas.com/SAS_workDBAA00004268_odaws02-apse1-2.oda.sas.com/fraud_analysis.sas7bdat
Release Created9.0401M8
Host CreatedLinux
Inode Number134320160
Access Permissionrw-r--r--
Owner Nameu63247146
File Size256KB
File Size (bytes)262144
Alphabetic List of Variables and Attributes
#VariableTypeLenFormat
11Campaign_DateNum8DATE9.
1Campaign_IDChar5 
13Campaign_NameChar200 
4ClicksNum8 
6Conversion_RateNum8 
10FeesNum8 
14Fraud_LevelChar12 
8Fraud_ScoreNum8 
5Invalid_ClicksNum8 
7Loss_AmountNum8 
9PercentageNum8 
12Total_CostNum8 
2Traffic_SourceChar20 
3Utilization_ClassChar15 

proc freq data=fraud_analysis;

tables Fraud_Level Traffic_Source;

run;

OUTPUT:

The FREQ Procedure

Fraud_LevelFrequencyPercentCumulative
Frequency
Cumulative
Percent
Low1062.501062.50
Moderate212.501275.00
Severe425.0016100.00
Traffic_SourceFrequencyPercentCumulative
Frequency
Cumulative
Percent
Affiliate318.75318.75
Bottraffic318.75637.50
Facebook318.75956.25
Google318.751275.00
Instagram212.501487.50
Youtube212.5016100.00

6. Utilization Classification

%macro Utilization;

data Final;

set fraud_analysis;

length Utilization_Class $12.;

if Clicks > 8000 then Utilization_Class="Very High";

else if Clicks > 5000 then Utilization_Class="High";

else if Clicks > 3000 then Utilization_Class="Medium";

else Utilization_Class="Low";

run;

proc print data=Final;

run;

%mend;


%Utilization;

OUTPUT:

ObsCampaign_IDTraffic_SourceUtilization_ClassClicksInvalid_ClicksConversion_RateLoss_AmountFraud_ScorePercentageFeesCampaign_DateTotal_CostCampaign_NameFraud_Level
1C101GoogleMedium5000504.2200-63.31.02001JAN2025220C101_GoogleLow
2C102FacebookHigh800040000.290051.050.03502JAN2025935C102_FacebookSevere
3C103AffiliateMedium35002002.3150-22.05.71003JAN2025160C103_AffiliateLow
4C104BottrafficVery High900085000.1120084.194.44504JAN20251245C104_BottrafficSevere
5C105InstagramMedium42001503.0180-37.53.61505JAN2025195C105_InstagramLow
6C106YoutubeMedium39001004.1120-60.22.61206JAN2025132C106_YoutubeLow
7C107GoogleHigh52004002.8250-30.67.71807JAN2025268C107_GoogleLow
8C108FacebookHigh61003003.2200-40.64.91708JAN2025217C108_FacebookLow
9C109AffiliateMedium45006001.5310-0.713.32009JAN2025330C109_AffiliateModerate
10C110BottrafficVery High1000090000.0150083.090.05010JAN20251550C110_BottrafficSevere
11C111GoogleMedium48001203.9160-56.32.51411JAN2025174C111_GoogleLow
12C112FacebookHigh51001803.5170-47.63.51512JAN2025185C112_FacebookLow
13C113YoutubeMedium43001403.8165-53.73.31413JAN2025179C113_YoutubeLow
14C114AffiliateMedium37005002.0300-10.613.52214JAN2025322C114_AffiliateModerate
15C115BottrafficVery High880084000.1110084.995.54015JAN20251140C115_BottrafficSevere
16C116InstagramMedium46001603.3175-43.63.51616JAN2025191C116_InstagramLow

7. Cleaning Temporary Datasets

proc datasets library=work nolist;

delete combined_data merged_data fraud_transpose;

quit;

LOG:

NOTE: The file WORK.MERGED_DATA (memtype=DATA) was not found, but appears on a DELETE statement.
NOTE: Deleting WORK.COMBINED_DATA (memtype=DATA).
NOTE: Deleting WORK.FRAUD_TRANSPOSE (memtype=DATA).

8. Final Dataset Creation (Including Utilization Classification)

data ad_fraud_final;

set ad_fraud_clean;

length Fraud_Level $10 Utilization_Class $12;

* Fraud classification;

if Percentage > 40 then Fraud_Level="Severe";

else if Percentage > 10 then Fraud_Level="Moderate";

else Fraud_Level="Low";

* Utilization classification;

if Clicks >= 8000 then Utilization_Class="Very High";

else if Clicks >= 5000 then Utilization_Class="High";

else if Clicks >= 3000 then Utilization_Class="Medium";

else Utilization_Class="Low";

run;

proc print data=ad_fraud_final;

run;

OUTPUT:

ObsCampaign_IDTraffic_SourceUtilization_ClassClicksInvalid_ClicksConversion_RateLoss_AmountFraud_ScorePercentageFeesCampaign_DateTotal_CostCampaign_NameFraud_Level
1C101GoogleHigh5000504.2200-63.31.02001JAN2025220C101_GoogleLow
2C102FacebookVery High800040000.290051.050.03502JAN2025935C102_FacebookSevere
3C103AffiliateMedium35002002.3150-22.05.71003JAN2025160C103_AffiliateLow
4C104BottrafficVery High900085000.1120084.194.44504JAN20251245C104_BottrafficSevere
5C105InstagramMedium42001503.0180-37.53.61505JAN2025195C105_InstagramLow
6C106YoutubeMedium39001004.1120-60.22.61206JAN2025132C106_YoutubeLow
7C107GoogleHigh52004002.8250-30.67.71807JAN2025268C107_GoogleLow
8C108FacebookHigh61003003.2200-40.64.91708JAN2025217C108_FacebookLow
9C109AffiliateMedium45006001.5310-0.713.32009JAN2025330C109_AffiliateModerate
10C110BottrafficVery High1000090000.0150083.090.05010JAN20251550C110_BottrafficSevere
11C111GoogleMedium48001203.9160-56.32.51411JAN2025174C111_GoogleLow
12C112FacebookHigh51001803.5170-47.63.51512JAN2025185C112_FacebookLow
13C113YoutubeMedium43001403.8165-53.73.31413JAN2025179C113_YoutubeLow
14C114AffiliateMedium37005002.0300-10.613.52214JAN2025322C114_AffiliateModerate
15C115BottrafficVery High880084000.1110084.995.54015JAN20251140C115_BottrafficSevere
16C116InstagramMedium46001603.3175-43.63.51616JAN2025191C116_InstagramLow

Explanation of the Code

DATA ad_fraud_final

Creates the final analytical dataset that contains all cleaned and derived variables.

SET ad_fraud_clean

Reads observations from the cleaned dataset and allows us to add more derived variables.

LENGTH Fraud_Level $10 Utilization_Class $12

Defines the length of the character variables.

This is important because SAS assigns the length based on the first value encountered if not specified.

This classification evaluates campaign traffic utilization.

Click Range

Utilization

≥ 8000

Very High

5000–7999

High

3000–4999

Medium

< 3000

Low

This helps analysts identify campaign popularity and resource usage.

Final Dataset Variables

Variable

Description

Campaign_ID

Campaign identifier

Traffic_Source

Advertising platform

Clicks

Total campaign clicks

Invalid_Clicks

Fraud clicks

Conversion_Rate

Conversion percentage

Loss_Amount

Fraud loss

Fraud_Score

Fraud risk score

Percentage

Invalid click percentage

Fees

Platform fees

Total_Cost

Loss + fees

Campaign_Name

Combined campaign label

Fraud_Level

Fraud classification

Utilization_Class

Campaign utilization category

Campaign_Date

Campaign date


Key Skills Demonstrated in This Project

• DATA step programming
• Character data cleaning techniques
• Numeric calculations and derived metrics
• Date function usage
• Dataset combination methods
• SAS macro development

• Fraud detection logic implementation
• Dataset management and cleanup


20 Key Points About This Project:
  1. A simulated digital advertising fraud dataset was created to analyze campaign traffic behavior and identify fraudulent activity patterns.
  2. The dataset included key variables such as Campaign_ID, Traffic_Source, Clicks, Invalid_Clicks, Conversion_Rate, Loss_Amount, Fraud_Score, Percentage, Fees, Utilization_Class, and Campaign_Date.
  3. More than 15 observations were generated to simulate multiple advertising campaigns across different traffic sources.
  4. Intentional data issues and inconsistencies were introduced to simulate real-world raw marketing data problems.
  5. The raw dataset AD_FRAUD_RAW served as the starting point for the fraud analytics workflow.
  6. Data cleaning and standardization were performed using character functions such as STRIP, TRIM, PROPCASE, UPCASE, LOWCASE, CAT, and CATX.
  7. Numeric calculations were implemented using SUM and ROUND functions to derive campaign cost and fraud-related metrics.
  8. The invalid click percentage was calculated using the formula (Invalid_Clicks / Clicks) × 100.
  9. A Fraud_Score metric was derived to quantify the likelihood of fraudulent traffic within each campaign.
  10. Campaign date analysis was implemented using date functions MDY, INTCK, and INTNX to handle campaign timelines and date intervals.
  11. The SET statement was used to read and combine datasets vertically during data processing steps.
  12. Dataset integration techniques such as MERGE were used to combine campaign data with additional analytical variables.
  13. Additional observations could be appended using PROC APPEND, which efficiently adds new campaign records.
  14. The PROC TRANSPOSE procedure was demonstrated to restructure campaign metrics for analytical reporting.
  15. A reusable SAS macro was developed to automate fraud detection classification across campaign records.
  16. Campaigns were categorized into Fraud Levels (Low, Moderate, Severe) based on the percentage of invalid clicks.
  17. Campaign traffic was classified into Utilization Classes (Low, Medium, High, Very High) based on total click counts.
  18. Temporary or intermediate datasets were cleaned using PROC DATASETS DELETE to maintain a structured SAS working environment.
  19. The final analytical dataset AD_FRAUD_FINAL contained cleaned variables, calculated fraud metrics, and campaign classifications.
  20. The project demonstrates how SAS programming techniques can be used to detect digital advertising fraud, clean marketing datasets, and support campaign performance analytics.

Project Summary

This project demonstrates how SAS programming can be used to detect and correct errors in digital advertising fraud data.Digital advertising platforms generate massive volumes of campaign performance data every day. However, raw marketing data often contains inconsistencies, formatting issues, incorrect calculations, and missing fraud indicators. Before meaningful analysis can be performed, data must be cleaned and validated.

In this project, a simulated dataset representing digital advertising campaigns was created with variables such as campaign identifiers, traffic sources, click counts, invalid clicks, conversion rates, fraud scores, and financial loss estimates.To simulate real-world analytics scenarios, intentional errors were introduced into the dataset. These errors included inconsistent formatting of traffic source names, manually entered fraud percentages, and unverified fraud scores.

Using SAS DATA step programming, these issues were corrected through various techniques including character functions, numeric functions, and calculated variables.Several SAS character functions were applied to standardize and clean textual data. Functions such as STRIP, TRIM, PROPCASE, UPCASE, and LOWCASE were used to remove unnecessary spaces and convert text values into consistent formats.Numeric functions were used to calculate derived metrics such as fraud percentages, total campaign cost, and fraud scores. Functions like SUM and ROUND helped ensure accurate financial calculations.

The project also demonstrated the use of date manipulation functions such as MDY, INTCK, and INTNX. These functions are important when analyzing campaign timelines, calculating campaign durations, and scheduling future campaign evaluations.

Dataset manipulation techniques such as SET, MERGE, and PROC APPEND were used to combine multiple datasets and manage campaign data efficiently. The PROC TRANSPOSE procedure was used to restructure data for analytical reporting.A fraud detection macro was developed to automatically classify campaigns into fraud categories based on invalid click percentages. Macros allow SAS programmers to automate repetitive logic and build reusable fraud detection frameworks.

Finally, unnecessary temporary datasets were removed using PROC DATASETS DELETE, ensuring that the working library remained clean and organized.The final dataset produced in this project provides a structured and validated view of campaign fraud risks. Analysts can use this information to identify suspicious campaigns and prevent advertising budget losses.

Conclusion

Digital advertising fraud analytics requires strong data management and fraud detection techniques. In this project we demonstrated how SAS programming can be used to build and analyze a campaign dataset containing digital advertising performance metrics.

We created a dataset with more than fifteen observations representing advertising campaigns from multiple traffic sources. The dataset contained important variables such as clicks, invalid clicks, conversion rates, fraud scores, and loss amounts.

Intentional errors were introduced to simulate real-world data problems. These errors included inconsistent formatting, manually entered percentages, and improper fraud scoring. We then corrected these issues using SAS DATA step programming.

Key SAS techniques used in this project include:

·  Character functions for text cleaning

·  Numeric functions for calculations

·  Date functions for campaign timeline analysis

·  Dataset manipulation using SET, MERGE, and APPEND

·  Data restructuring using PROC TRANSPOSE

·  Fraud detection logic implemented through SAS macros

·  Dataset cleanup using PROC DATASETS

This type of analytics approach allows companies to detect suspicious campaign behavior, reduce financial losses, and improve digital marketing efficiency.

For SAS programmers preparing for analytics or fraud detection roles, projects like this demonstrate practical knowledge of data cleaning, dataset transformation, and automated fraud classification.

SAS INTERVIEW QUESTIONS

1. How do FIRST. and LAST. work?

Answer:

FIRST. and LAST. are temporary variables automatically created by SAS when using a BY statement after sorting data.

  • FIRST.variable = 1 for the first observation of each BY group
  • LAST.variable = 1 for the last observation of each BY group

Example: Used for group totals or identifying first/last records in a group.

2. Difference between CALL SYMPUT and CALL SYMPUTX?

Answer:

Feature

CALL SYMPUT

CALL SYMPUTX

Spaces

Keeps leading/trailing spaces

Removes spaces automatically

Type conversion

May cause numeric-to-character notes

Automatically converts values

Usage

Older method

Recommended method

Short: CALL SYMPUTX is an improved version of CALL SYMPUT.

3. Difference between INPUT and PUT functions?

Answer:

Function

Purpose

INPUT

Converts     character → numeric

PUT

Converts     numeric → character

Example:

  • INPUT(char_var,8.) → character to numeric
  • PUT(num_var,8.) → numeric to character.

--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

About the Author:

SAS Learning Hub is a data analytics and SAS programming platform focused on clinical, financial, and real-world data analysis. The content is created by professionals with academic training in Pharmaceutics and hands-on experience in Base SAS, PROC SQL, Macros, SDTM, and ADaM, providing practical and industry-relevant SAS learning resources.


Disclaimer:

The datasets and analysis in this article are created for educational and demonstration purposes only. They do not represent DIGITAL ADVERTISING data.


Our Mission:

This blog provides industry-focused SAS programming tutorials and analytics projects covering finance, healthcare, and technology.


This project is suitable for:

·  Students learning SAS

·  Data analysts building portfolios

·  Professionals preparing for SAS interviews

·  Bloggers writing about analytics 

·  Clinical SAS Programmer

·  Research Data Analyst

·  Regulatory Data Validator

--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Follow Us On : 


 
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

--->Follow our blog for more SAS-based analytics projects and industry data models.

---> Support Us By Following Our Blog..

To deepen your understanding of SAS analytics, please refer to our other data science and industry-focused projects listed below:

1.Which Country Truly Dominates the Olympics? – A Complete SAS Medal Efficiency Analytics Project

2.Which Airports Are Really the Busiest? – An End-to-End SAS Airport Traffic Analytics Project

3.Can Data Predict Election Outcomes? – A Complete SAS Voting Analytics Project

--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

About Us | Contact Privacy Policy



Comments

Popular posts from this blog

412.Can We Build And Clean A University Course Analytics & Fraud Detection System In Sas Using Only Macros While Intentionally Creating And Fixing Errors?

420.Can We Detect Errors, Prevent Fraud, And Optimize Biometric Access System Security Using Advanced SAS Programming?

418.Can We Design, Debug, Detect Fraud, And Optimize A Smart Parking System Using Advanced SAS Programming Techniques?