Detect Fraud In A Content Moderation Analytics System Using Advanced SAS

--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

HERE IN THIS PROJECT WE USED THESE SAS STATEMENTS —
DATA STEP | PROC PRINT | MACROS FREQ | PROC APPEND | SET | PROC SORT | MERGE | PROC DATASETS DELETE | DATA FUNCTIONS

Introduction

In today’s digital era, content moderation plays a critical role in maintaining platform integrity, user trust, and regulatory compliance. Social media platforms, video-sharing websites, and online forums process millions of user-generated posts daily. With increasing volumes, platforms rely on automated moderation systems combined with reviewers.

However, analytics systems supporting moderation operations can contain data inconsistencies, fraud indicators, incorrect calculations, or programming errors. If not identified and corrected, these issues can lead to inaccurate reporting, unfair content blocking, increased operational costs, and regulatory risks.

In this project, we will:

Create a Content Moderation dataset
Introduce intentional programming errors
Identify and fix those errors
Apply SAS Macros for fraud detection logic
Use:

Date functions: MDY, INTCK, INTNX
Data combination: SET, MERGE, APPEND
PROC TRANSPOSE
Numeric & character functions
PROC DATASETS DELETE
Utilization calculations
Classification logic

Provide full explanation for every step

This explanation is written in a simple style so that even beginners can understand.

Table Of Contents

Business Context
Variables Definition
Step 1 – Create Dataset (With Intentional Errors)
Identify And Explain Errors
Corrected Full-Length Dataset Code
Character Functions Usage
Numeric Functions Usage
Date Functions (MDY, INTCK, INTNX)
Utilization Calculation
Fraud Detection Macro
SET vs MERGE vs APPEND
PROC TRANSPOSE
PROC DATASETS DELETE
Final Clean Dataset
Business Insights
20 Key Points About This Project
Summary
Conclusion

Business Context

Large content platforms like:

Facebook
YouTube
Instagram
X

receive:

Millions of user posts
Spam
Hate speech
Fraud attempts
Fake reporting manipulation

Analytics teams monitor:

Reports count
False positive rate
Moderator workload
Accuracy score
Review time
Fraud signals

Our system will simulate this scenario.

Variables In Our Dataset

Variable	Type	Description
Platform	Character	Social platform name
Content_Type	Character	Type of content
Reports_Count	Numeric	Number of reports
Review_Time	Numeric	Time in minutes
False_Positive_Rate	Numeric	% incorrect flags
Moderator_Load	Numeric	Cases handled
Accuracy_Score	Numeric	% correct decisions
Review_Date	Date	Date reviewed
Utilization	Numeric	Load efficiency
Risk_Level	Character	Fraud classification

1. INTENTIONALLY WRONG CODE

data content_raw;

input Platform:$12. Content_Type $ Reports_Count Review_Time False_Positive_Rate

Moderator_Load Accuracy_Score Month $ Day Year;

format Review_Date date9.;

Review_Date = mdy(Month, Day, Year);

Utilization = Moderator_Load/Reports_Count*100;

length Risk_Level $8.;

if Accuracy_Score > 90 then Risk_Level = "Low";

else if False_Positive_Rate > 20 then Risk_Level = "High";

datalines;

Facebook Text 120 15 5 200 96 1 12 2026

YouTube Video 300 25 10 400 92 2 15 2026

Instagram Image 50 10 25 60 85 3 10 2026

X Audio 500 40 30 600 70 4 5 2026

;

run;

proc print data=content_raw;

run;

OUTPUT:


Obs	Platform	Content_Type	Reports_Count	Review_Time	False_Positive_Rate	Moderator_Load	Accuracy_Score	Month	Day	Year	Review_Date	Utilization	Risk_Level
1	Facebook	Text	120	15	5	200	96	1	12	2026	12JAN2026	166.667	Low
2	YouTube	Video	300	25	10	400	92	2	15	2026	15FEB2026	133.333	Low
3	Instagram	Image	50	10	25	60	85	3	10	2026	10MAR2026	120.000	High
4	X	Audio	500	40	30	600	70	4	5	2026	05APR2026	120.000	High

ERRORS IN ABOVE CODE

Error 1: Month defined as Character

Month $ is character
But MDY() expects numeric values

Error 2: Utilization Division Risk

If Reports_Count = 0 → division by zero.

Error 3: Risk_Level incomplete logic

If Accuracy < 95 and False_Positive <= 20 → missing category.

Error 4: No LENGTH statement

Character variables may get truncated.

2. CORRECTED FULL-LENGTH DATASET CODE

data content_moderation;

length Platform $20 Content_Type $20 Risk_Level $15 Classification $20;

input Platform $ Content_Type $ Reports_Count Review_Time False_Positive_Rate

Moderator_Load Accuracy_Score Month Day Year;

format Review_Date date9. Next_Review date9.;

Review_Date = mdy(Month, Day, Year);

/* Date Calculations */

Review_Age_Days = intck('day', Review_Date, today());

Next_Review = intnx('month', Review_Date, 1, 'same');

/* Character Cleaning */

Platform = propcase(strip(Platform));

Content_Type = upcase(trim(Content_Type));

/* Utilization Calculation */

if Reports_Count > 0 then

Utilization = (Moderator_Load / Reports_Count) * 100;

else Utilization = .;

/* Risk Classification */

if Accuracy_Score >= 95 and False_Positive_Rate < 10 then

Risk_Level = "Low";

else if 85 < Accuracy_Score < 94 then

Risk_Level = "Medium";

else Risk_Level = "High";

/* Fraud Classification */

if Reports_Count > 400 and Accuracy_Score < 80 then

Classification = "Potential Fraud";

else Classification = "Normal";

datalines;

Facebook Text 120 15 5 200 96 1 12 2026

YouTube Video 300 25 10 400 92 2 15 2026

Instagram Image 50 10 25 60 85 3 10 2026

X TEXT 500 40 30 600 70 4 5 2026

LinkedIn Text 80 20 8 100 90 5 8 2026

Reddit Text 450 35 28 500 75 6 14 2026

Snapchat Image 200 18 12 220 88 7 21 2026

TikTok Video 600 45 35 700 65 8 2 2026

Pinterest Image 150 12 7 170 93 9 18 2026

Quora Text 90 14 5 110 97 10 9 2026

Threads Text 250 22 15 300 89 11 11 2026

Discord IMAGE 310 28 18 350 87 12 4 2026

Telegram Text 420 30 20 450 82 1 25 2026

Medium Text 60 9 4 70 98 2 19 2026

Tumblr Image 180 16 11 210 91 3 30 2026

;

run;

proc print data=content_moderation;

run;

OUTPUT:


Obs	Platform	Content_Type	Risk_Level	Classification	Reports_Count	Review_Time	False_Positive_Rate	Moderator_Load	Accuracy_Score	Month	Day	Year	Review_Date	Next_Review	Review_Age_Days	Utilization
1	Facebook	TEXT	Low	Normal	120	15	5	200	96	1	12	2026	12JAN2026	12FEB2026	48	166.667
2	Youtube	VIDEO	Medium	Normal	300	25	10	400	92	2	15	2026	15FEB2026	15MAR2026	14	133.333
3	Instagram	IMAGE	High	Normal	50	10	25	60	85	3	10	2026	10MAR2026	10APR2026	-9	120.000
4	X	TEXT	High	Potential Fraud	500	40	30	600	70	4	5	2026	05APR2026	05MAY2026	-35	120.000
5	Linkedin	TEXT	Medium	Normal	80	20	8	100	90	5	8	2026	08MAY2026	08JUN2026	-68	125.000
6	Reddit	TEXT	High	Potential Fraud	450	35	28	500	75	6	14	2026	14JUN2026	14JUL2026	-105	111.111
7	Snapchat	IMAGE	Medium	Normal	200	18	12	220	88	7	21	2026	21JUL2026	21AUG2026	-142	110.000
8	Tiktok	VIDEO	High	Potential Fraud	600	45	35	700	65	8	2	2026	02AUG2026	02SEP2026	-154	116.667
9	Pinterest	IMAGE	Medium	Normal	150	12	7	170	93	9	18	2026	18SEP2026	18OCT2026	-201	113.333
10	Quora	TEXT	Low	Normal	90	14	5	110	97	10	9	2026	09OCT2026	09NOV2026	-222	122.222
11	Threads	TEXT	Medium	Normal	250	22	15	300	89	11	11	2026	11NOV2026	11DEC2026	-255	120.000
12	Discord	IMAGE	Medium	Normal	310	28	18	350	87	12	4	2026	04DEC2026	04JAN2027	-278	112.903
13	Telegram	TEXT	High	Normal	420	30	20	450	82	1	25	2026	25JAN2026	25FEB2026	35	107.143
14	Medium	TEXT	Low	Normal	60	9	4	70	98	2	19	2026	19FEB2026	19MAR2026	10	116.667
15	Tumblr	IMAGE	Medium	Normal	180	16	11	210	91	3	30	2026	30MAR2026	30APR2026	-29	116.667

EXPLANATION OF EACH CODE SECTION

LENGTH Statement

Prevents truncation.
Important in production datasets.

MDY Function

Creates SAS date.
Needed for:

· Time analysis

· Aging reports

· SLA tracking

INTCK

Counts difference in days.

Used for:

· Pending case monitoring

· KPI tracking

INTNX

Calculates next review schedule.

Used in:

· Recurring moderation audits

Character Functions

Function	Purpose
strip()	Removes leading/trailing blanks
trim()	Removes trailing blanks
propcase()	Proper case formatting
upcase()	Convert to uppercase
lowcase()	Convert to lowercase
cat()	Concatenate without separator
catx()	Concatenate with separator
coalesce()	First non-missing value

Numeric Functions

· ROUND

· SUM

· MEAN

· MAX

· MIN

· COALESCE

3. MACRO FOR FRAUD DETECTION

%macro fraud_check(data=, threshold=400);

data fraud_flagged;

set &data;

if Reports_Count > &threshold and Accuracy_Score < 80 then Fraud_Flag = 1;

else Fraud_Flag = 0;

run;

proc print data=fraud_flagged;

run;

%mend;

%fraud_check(data=content_moderation, threshold=400);

OUTPUT:


Obs	Platform	Content_Type	Risk_Level	Classification	Reports_Count	Review_Time	False_Positive_Rate	Moderator_Load	Accuracy_Score	Month	Day	Year	Review_Date	Next_Review	Review_Age_Days	Utilization	Fraud_Flag
1	Facebook	TEXT	Low	Normal	120	15	5	200	96	1	12	2026	12JAN2026	12FEB2026	48	166.667	0
2	Youtube	VIDEO	Medium	Normal	300	25	10	400	92	2	15	2026	15FEB2026	15MAR2026	14	133.333	0
3	Instagram	IMAGE	High	Normal	50	10	25	60	85	3	10	2026	10MAR2026	10APR2026	-9	120.000	0
4	X	TEXT	High	Potential Fraud	500	40	30	600	70	4	5	2026	05APR2026	05MAY2026	-35	120.000	1
5	Linkedin	TEXT	Medium	Normal	80	20	8	100	90	5	8	2026	08MAY2026	08JUN2026	-68	125.000	0
6	Reddit	TEXT	High	Potential Fraud	450	35	28	500	75	6	14	2026	14JUN2026	14JUL2026	-105	111.111	1
7	Snapchat	IMAGE	Medium	Normal	200	18	12	220	88	7	21	2026	21JUL2026	21AUG2026	-142	110.000	0
8	Tiktok	VIDEO	High	Potential Fraud	600	45	35	700	65	8	2	2026	02AUG2026	02SEP2026	-154	116.667	1
9	Pinterest	IMAGE	Medium	Normal	150	12	7	170	93	9	18	2026	18SEP2026	18OCT2026	-201	113.333	0
10	Quora	TEXT	Low	Normal	90	14	5	110	97	10	9	2026	09OCT2026	09NOV2026	-222	122.222	0
11	Threads	TEXT	Medium	Normal	250	22	15	300	89	11	11	2026	11NOV2026	11DEC2026	-255	120.000	0
12	Discord	IMAGE	Medium	Normal	310	28	18	350	87	12	4	2026	04DEC2026	04JAN2027	-278	112.903	0
13	Telegram	TEXT	High	Normal	420	30	20	450	82	1	25	2026	25JAN2026	25FEB2026	35	107.143	0
14	Medium	TEXT	Low	Normal	60	9	4	70	98	2	19	2026	19FEB2026	19MAR2026	10	116.667	0
15	Tumblr	IMAGE	Medium	Normal	180	16	11	210	91	3	30	2026	30MAR2026	30APR2026	-29	116.667	0

· Reusable

· Parameter driven

· Scalable

4. SET Statement

data combined;

set content_moderation

fraud_flagged;

run;

proc print data=combined;

run;

OUTPUT:


Obs	Platform	Content_Type	Risk_Level	Classification	Reports_Count	Review_Time	False_Positive_Rate	Moderator_Load	Accuracy_Score	Month	Day	Year	Review_Date	Next_Review	Review_Age_Days	Utilization	Fraud_Flag
1	Facebook	TEXT	Low	Normal	120	15	5	200	96	1	12	2026	12JAN2026	12FEB2026	48	166.667	.
2	Youtube	VIDEO	Medium	Normal	300	25	10	400	92	2	15	2026	15FEB2026	15MAR2026	14	133.333	.
3	Instagram	IMAGE	High	Normal	50	10	25	60	85	3	10	2026	10MAR2026	10APR2026	-9	120.000	.
4	X	TEXT	High	Potential Fraud	500	40	30	600	70	4	5	2026	05APR2026	05MAY2026	-35	120.000	.
5	Linkedin	TEXT	Medium	Normal	80	20	8	100	90	5	8	2026	08MAY2026	08JUN2026	-68	125.000	.
6	Reddit	TEXT	High	Potential Fraud	450	35	28	500	75	6	14	2026	14JUN2026	14JUL2026	-105	111.111	.
7	Snapchat	IMAGE	Medium	Normal	200	18	12	220	88	7	21	2026	21JUL2026	21AUG2026	-142	110.000	.
8	Tiktok	VIDEO	High	Potential Fraud	600	45	35	700	65	8	2	2026	02AUG2026	02SEP2026	-154	116.667	.
9	Pinterest	IMAGE	Medium	Normal	150	12	7	170	93	9	18	2026	18SEP2026	18OCT2026	-201	113.333	.
10	Quora	TEXT	Low	Normal	90	14	5	110	97	10	9	2026	09OCT2026	09NOV2026	-222	122.222	.
11	Threads	TEXT	Medium	Normal	250	22	15	300	89	11	11	2026	11NOV2026	11DEC2026	-255	120.000	.
12	Discord	IMAGE	Medium	Normal	310	28	18	350	87	12	4	2026	04DEC2026	04JAN2027	-278	112.903	.
13	Telegram	TEXT	High	Normal	420	30	20	450	82	1	25	2026	25JAN2026	25FEB2026	35	107.143	.
14	Medium	TEXT	Low	Normal	60	9	4	70	98	2	19	2026	19FEB2026	19MAR2026	10	116.667	.
15	Tumblr	IMAGE	Medium	Normal	180	16	11	210	91	3	30	2026	30MAR2026	30APR2026	-29	116.667	.
16	Facebook	TEXT	Low	Normal	120	15	5	200	96	1	12	2026	12JAN2026	12FEB2026	48	166.667	0
17	Youtube	VIDEO	Medium	Normal	300	25	10	400	92	2	15	2026	15FEB2026	15MAR2026	14	133.333	0
18	Instagram	IMAGE	High	Normal	50	10	25	60	85	3	10	2026	10MAR2026	10APR2026	-9	120.000	0
19	X	TEXT	High	Potential Fraud	500	40	30	600	70	4	5	2026	05APR2026	05MAY2026	-35	120.000	1
20	Linkedin	TEXT	Medium	Normal	80	20	8	100	90	5	8	2026	08MAY2026	08JUN2026	-68	125.000	0
21	Reddit	TEXT	High	Potential Fraud	450	35	28	500	75	6	14	2026	14JUN2026	14JUL2026	-105	111.111	1
22	Snapchat	IMAGE	Medium	Normal	200	18	12	220	88	7	21	2026	21JUL2026	21AUG2026	-142	110.000	0
23	Tiktok	VIDEO	High	Potential Fraud	600	45	35	700	65	8	2	2026	02AUG2026	02SEP2026	-154	116.667	1
24	Pinterest	IMAGE	Medium	Normal	150	12	7	170	93	9	18	2026	18SEP2026	18OCT2026	-201	113.333	0
25	Quora	TEXT	Low	Normal	90	14	5	110	97	10	9	2026	09OCT2026	09NOV2026	-222	122.222	0
26	Threads	TEXT	Medium	Normal	250	22	15	300	89	11	11	2026	11NOV2026	11DEC2026	-255	120.000	0
27	Discord	IMAGE	Medium	Normal	310	28	18	350	87	12	4	2026	04DEC2026	04JAN2027	-278	112.903	0
28	Telegram	TEXT	High	Normal	420	30	20	450	82	1	25	2026	25JAN2026	25FEB2026	35	107.143	0
29	Medium	TEXT	Low	Normal	60	9	4	70	98	2	19	2026	19FEB2026	19MAR2026	10	116.667	0
30	Tumblr	IMAGE	Medium	Normal	180	16	11	210	91	3	30	2026	30MAR2026	30APR2026	-29	116.667	0

Used to stack datasets vertically.

5. MERGE Statement

proc sort data=content_moderation;by platform;run;

proc print data=content_moderation;

run;

OUTPUT:


Obs	Platform	Content_Type	Risk_Level	Classification	Reports_Count	Review_Time	False_Positive_Rate	Moderator_Load	Accuracy_Score	Month	Day	Year	Review_Date	Next_Review	Review_Age_Days	Utilization
1	Discord	IMAGE	Medium	Normal	310	28	18	350	87	12	4	2026	04DEC2026	04JAN2027	-278	112.903
2	Facebook	TEXT	Low	Normal	120	15	5	200	96	1	12	2026	12JAN2026	12FEB2026	48	166.667
3	Instagram	IMAGE	High	Normal	50	10	25	60	85	3	10	2026	10MAR2026	10APR2026	-9	120.000
4	Linkedin	TEXT	Medium	Normal	80	20	8	100	90	5	8	2026	08MAY2026	08JUN2026	-68	125.000
5	Medium	TEXT	Low	Normal	60	9	4	70	98	2	19	2026	19FEB2026	19MAR2026	10	116.667
6	Pinterest	IMAGE	Medium	Normal	150	12	7	170	93	9	18	2026	18SEP2026	18OCT2026	-201	113.333
7	Quora	TEXT	Low	Normal	90	14	5	110	97	10	9	2026	09OCT2026	09NOV2026	-222	122.222
8	Reddit	TEXT	High	Potential Fraud	450	35	28	500	75	6	14	2026	14JUN2026	14JUL2026	-105	111.111
9	Snapchat	IMAGE	Medium	Normal	200	18	12	220	88	7	21	2026	21JUL2026	21AUG2026	-142	110.000
10	Telegram	TEXT	High	Normal	420	30	20	450	82	1	25	2026	25JAN2026	25FEB2026	35	107.143
11	Threads	TEXT	Medium	Normal	250	22	15	300	89	11	11	2026	11NOV2026	11DEC2026	-255	120.000
12	Tiktok	VIDEO	High	Potential Fraud	600	45	35	700	65	8	2	2026	02AUG2026	02SEP2026	-154	116.667
13	Tumblr	IMAGE	Medium	Normal	180	16	11	210	91	3	30	2026	30MAR2026	30APR2026	-29	116.667
14	X	TEXT	High	Potential Fraud	500	40	30	600	70	4	5	2026	05APR2026	05MAY2026	-35	120.000
15	Youtube	VIDEO	Medium	Normal	300	25	10	400	92	2	15	2026	15FEB2026	15MAR2026	14	133.333

proc sort data=fraud_flagged;by platform;run;

proc print data=fraud_flagged;

run;

OUTPUT:


Obs	Platform	Content_Type	Risk_Level	Classification	Reports_Count	Review_Time	False_Positive_Rate	Moderator_Load	Accuracy_Score	Month	Day	Year	Review_Date	Next_Review	Review_Age_Days	Utilization	Fraud_Flag
1	Discord	IMAGE	Medium	Normal	310	28	18	350	87	12	4	2026	04DEC2026	04JAN2027	-278	112.903	0
2	Facebook	TEXT	Low	Normal	120	15	5	200	96	1	12	2026	12JAN2026	12FEB2026	48	166.667	0
3	Instagram	IMAGE	High	Normal	50	10	25	60	85	3	10	2026	10MAR2026	10APR2026	-9	120.000	0
4	Linkedin	TEXT	Medium	Normal	80	20	8	100	90	5	8	2026	08MAY2026	08JUN2026	-68	125.000	0
5	Medium	TEXT	Low	Normal	60	9	4	70	98	2	19	2026	19FEB2026	19MAR2026	10	116.667	0
6	Pinterest	IMAGE	Medium	Normal	150	12	7	170	93	9	18	2026	18SEP2026	18OCT2026	-201	113.333	0
7	Quora	TEXT	Low	Normal	90	14	5	110	97	10	9	2026	09OCT2026	09NOV2026	-222	122.222	0
8	Reddit	TEXT	High	Potential Fraud	450	35	28	500	75	6	14	2026	14JUN2026	14JUL2026	-105	111.111	1
9	Snapchat	IMAGE	Medium	Normal	200	18	12	220	88	7	21	2026	21JUL2026	21AUG2026	-142	110.000	0
10	Telegram	TEXT	High	Normal	420	30	20	450	82	1	25	2026	25JAN2026	25FEB2026	35	107.143	0
11	Threads	TEXT	Medium	Normal	250	22	15	300	89	11	11	2026	11NOV2026	11DEC2026	-255	120.000	0
12	Tiktok	VIDEO	High	Potential Fraud	600	45	35	700	65	8	2	2026	02AUG2026	02SEP2026	-154	116.667	1
13	Tumblr	IMAGE	Medium	Normal	180	16	11	210	91	3	30	2026	30MAR2026	30APR2026	-29	116.667	0
14	X	TEXT	High	Potential Fraud	500	40	30	600	70	4	5	2026	05APR2026	05MAY2026	-35	120.000	1
15	Youtube	VIDEO	Medium	Normal	300	25	10	400	92	2	15	2026	15FEB2026	15MAR2026	14	133.333	0

data merged_data;

merge content_moderation

fraud_flagged;

by Platform;

run;

proc print data=merged_data;

run;

OUTPUT:


Obs	Platform	Content_Type	Risk_Level	Classification	Reports_Count	Review_Time	False_Positive_Rate	Moderator_Load	Accuracy_Score	Month	Day	Year	Review_Date	Next_Review	Review_Age_Days	Utilization	Fraud_Flag
1	Discord	IMAGE	Medium	Normal	310	28	18	350	87	12	4	2026	04DEC2026	04JAN2027	-278	112.903	0
2	Facebook	TEXT	Low	Normal	120	15	5	200	96	1	12	2026	12JAN2026	12FEB2026	48	166.667	0
3	Instagram	IMAGE	High	Normal	50	10	25	60	85	3	10	2026	10MAR2026	10APR2026	-9	120.000	0
4	Linkedin	TEXT	Medium	Normal	80	20	8	100	90	5	8	2026	08MAY2026	08JUN2026	-68	125.000	0
5	Medium	TEXT	Low	Normal	60	9	4	70	98	2	19	2026	19FEB2026	19MAR2026	10	116.667	0
6	Pinterest	IMAGE	Medium	Normal	150	12	7	170	93	9	18	2026	18SEP2026	18OCT2026	-201	113.333	0
7	Quora	TEXT	Low	Normal	90	14	5	110	97	10	9	2026	09OCT2026	09NOV2026	-222	122.222	0
8	Reddit	TEXT	High	Potential Fraud	450	35	28	500	75	6	14	2026	14JUN2026	14JUL2026	-105	111.111	1
9	Snapchat	IMAGE	Medium	Normal	200	18	12	220	88	7	21	2026	21JUL2026	21AUG2026	-142	110.000	0
10	Telegram	TEXT	High	Normal	420	30	20	450	82	1	25	2026	25JAN2026	25FEB2026	35	107.143	0
11	Threads	TEXT	Medium	Normal	250	22	15	300	89	11	11	2026	11NOV2026	11DEC2026	-255	120.000	0
12	Tiktok	VIDEO	High	Potential Fraud	600	45	35	700	65	8	2	2026	02AUG2026	02SEP2026	-154	116.667	1
13	Tumblr	IMAGE	Medium	Normal	180	16	11	210	91	3	30	2026	30MAR2026	30APR2026	-29	116.667	0
14	X	TEXT	High	Potential Fraud	500	40	30	600	70	4	5	2026	05APR2026	05MAY2026	-35	120.000	1
15	Youtube	VIDEO	Medium	Normal	300	25	10	400	92	2	15	2026	15FEB2026	15MAR2026	14	133.333	0

6. PROC APPEND

proc append base=content_moderation

data=fraud_flagged force;

run;

proc print data=content_moderation;

run;

OUTPUT:


Obs	Platform	Content_Type	Risk_Level	Classification	Reports_Count	Review_Time	False_Positive_Rate	Moderator_Load	Accuracy_Score	Month	Day	Year	Review_Date	Next_Review	Review_Age_Days	Utilization
1	Discord	IMAGE	Medium	Normal	310	28	18	350	87	12	4	2026	04DEC2026	04JAN2027	-278	112.903
2	Facebook	TEXT	Low	Normal	120	15	5	200	96	1	12	2026	12JAN2026	12FEB2026	48	166.667
3	Instagram	IMAGE	High	Normal	50	10	25	60	85	3	10	2026	10MAR2026	10APR2026	-9	120.000
4	Linkedin	TEXT	Medium	Normal	80	20	8	100	90	5	8	2026	08MAY2026	08JUN2026	-68	125.000
5	Medium	TEXT	Low	Normal	60	9	4	70	98	2	19	2026	19FEB2026	19MAR2026	10	116.667
6	Pinterest	IMAGE	Medium	Normal	150	12	7	170	93	9	18	2026	18SEP2026	18OCT2026	-201	113.333
7	Quora	TEXT	Low	Normal	90	14	5	110	97	10	9	2026	09OCT2026	09NOV2026	-222	122.222
8	Reddit	TEXT	High	Potential Fraud	450	35	28	500	75	6	14	2026	14JUN2026	14JUL2026	-105	111.111
9	Snapchat	IMAGE	Medium	Normal	200	18	12	220	88	7	21	2026	21JUL2026	21AUG2026	-142	110.000
10	Telegram	TEXT	High	Normal	420	30	20	450	82	1	25	2026	25JAN2026	25FEB2026	35	107.143
11	Threads	TEXT	Medium	Normal	250	22	15	300	89	11	11	2026	11NOV2026	11DEC2026	-255	120.000
12	Tiktok	VIDEO	High	Potential Fraud	600	45	35	700	65	8	2	2026	02AUG2026	02SEP2026	-154	116.667
13	Tumblr	IMAGE	Medium	Normal	180	16	11	210	91	3	30	2026	30MAR2026	30APR2026	-29	116.667
14	X	TEXT	High	Potential Fraud	500	40	30	600	70	4	5	2026	05APR2026	05MAY2026	-35	120.000
15	Youtube	VIDEO	Medium	Normal	300	25	10	400	92	2	15	2026	15FEB2026	15MAR2026	14	133.333
16	Discord	IMAGE	Medium	Normal	310	28	18	350	87	12	4	2026	04DEC2026	04JAN2027	-278	112.903
17	Facebook	TEXT	Low	Normal	120	15	5	200	96	1	12	2026	12JAN2026	12FEB2026	48	166.667
18	Instagram	IMAGE	High	Normal	50	10	25	60	85	3	10	2026	10MAR2026	10APR2026	-9	120.000
19	Linkedin	TEXT	Medium	Normal	80	20	8	100	90	5	8	2026	08MAY2026	08JUN2026	-68	125.000
20	Medium	TEXT	Low	Normal	60	9	4	70	98	2	19	2026	19FEB2026	19MAR2026	10	116.667
21	Pinterest	IMAGE	Medium	Normal	150	12	7	170	93	9	18	2026	18SEP2026	18OCT2026	-201	113.333
22	Quora	TEXT	Low	Normal	90	14	5	110	97	10	9	2026	09OCT2026	09NOV2026	-222	122.222
23	Reddit	TEXT	High	Potential Fraud	450	35	28	500	75	6	14	2026	14JUN2026	14JUL2026	-105	111.111
24	Snapchat	IMAGE	Medium	Normal	200	18	12	220	88	7	21	2026	21JUL2026	21AUG2026	-142	110.000
25	Telegram	TEXT	High	Normal	420	30	20	450	82	1	25	2026	25JAN2026	25FEB2026	35	107.143
26	Threads	TEXT	Medium	Normal	250	22	15	300	89	11	11	2026	11NOV2026	11DEC2026	-255	120.000
27	Tiktok	VIDEO	High	Potential Fraud	600	45	35	700	65	8	2	2026	02AUG2026	02SEP2026	-154	116.667
28	Tumblr	IMAGE	Medium	Normal	180	16	11	210	91	3	30	2026	30MAR2026	30APR2026	-29	116.667
29	X	TEXT	High	Potential Fraud	500	40	30	600	70	4	5	2026	05APR2026	05MAY2026	-35	120.000
30	Youtube	VIDEO	Medium	Normal	300	25	10	400	92	2	15	2026	15FEB2026	15MAR2026	14	133.333

Efficiently adds dataset.

7. PROC TRANSPOSE

proc transpose data=content_moderation out=transposed_data;

by Platform NotSorted;

var Reports_Count Accuracy_Score;

run;

proc print data=transposed_data;

run;

OUTPUT:


Obs	Platform	_NAME_	COL1
1	Discord	Reports_Count	310
2	Discord	Accuracy_Score	87
3	Facebook	Reports_Count	120
4	Facebook	Accuracy_Score	96
5	Instagram	Reports_Count	50
6	Instagram	Accuracy_Score	85
7	Linkedin	Reports_Count	80
8	Linkedin	Accuracy_Score	90
9	Medium	Reports_Count	60
10	Medium	Accuracy_Score	98
11	Pinterest	Reports_Count	150
12	Pinterest	Accuracy_Score	93
13	Quora	Reports_Count	90
14	Quora	Accuracy_Score	97
15	Reddit	Reports_Count	450
16	Reddit	Accuracy_Score	75
17	Snapchat	Reports_Count	200
18	Snapchat	Accuracy_Score	88
19	Telegram	Reports_Count	420
20	Telegram	Accuracy_Score	82
21	Threads	Reports_Count	250
22	Threads	Accuracy_Score	89
23	Tiktok	Reports_Count	600
24	Tiktok	Accuracy_Score	65
25	Tumblr	Reports_Count	180
26	Tumblr	Accuracy_Score	91
27	X	Reports_Count	500
28	X	Accuracy_Score	70
29	Youtube	Reports_Count	300
30	Youtube	Accuracy_Score	92
31	Discord	Reports_Count	310
32	Discord	Accuracy_Score	87
33	Facebook	Reports_Count	120
34	Facebook	Accuracy_Score	96
35	Instagram	Reports_Count	50
36	Instagram	Accuracy_Score	85
37	Linkedin	Reports_Count	80
38	Linkedin	Accuracy_Score	90
39	Medium	Reports_Count	60
40	Medium	Accuracy_Score	98
41	Pinterest	Reports_Count	150
42	Pinterest	Accuracy_Score	93
43	Quora	Reports_Count	90
44	Quora	Accuracy_Score	97
45	Reddit	Reports_Count	450
46	Reddit	Accuracy_Score	75
47	Snapchat	Reports_Count	200
48	Snapchat	Accuracy_Score	88
49	Telegram	Reports_Count	420
50	Telegram	Accuracy_Score	82
51	Threads	Reports_Count	250
52	Threads	Accuracy_Score	89
53	Tiktok	Reports_Count	600
54	Tiktok	Accuracy_Score	65
55	Tumblr	Reports_Count	180
56	Tumblr	Accuracy_Score	91
57	X	Reports_Count	500
58	X	Accuracy_Score	70
59	Youtube	Reports_Count	300
60	Youtube	Accuracy_Score	92

Convert rows to columns.

8. PROC DATASETS DELETE

proc datasets library=work;

delete content_raw transposed_data;

quit;

LOG:

 NOTE: Deleting WORK.CONTENT_RAW (memtype=DATA).

 NOTE: Deleting WORK.TRANSPOSED_DATA (memtype=DATA).

Deletes unwanted datasets.

Used to clean workspace.

UTILIZATION ANALYSIS

Utilization = Moderator_Load / Reports_Count * 100

100% → overload

80–100% → optimal

<50% → underutilized

FRAUD DETECTION LOGIC EXPLAINED

Fraud signals include:

1. High reports

2. Low accuracy

3. High false positive rate

4. Abnormal utilization

5. Repeated review spikes

Macro automates this detection.

FINAL CLEAN DATASET READY FOR:

· KPI dashboards

· SLA monitoring

· Fraud analytics

· Moderator workload balancing

· Monthly trend analysis

BUSINESS INSIGHTS FROM THIS PROJECT

· Identify overloaded moderators

· Detect fake reporting attacks

· Improve accuracy tracking

· Monitor false positives

· Optimize staffing

· Schedule next review automatically

· Standardize data cleaning

· Automate fraud classification

· Improve reporting transparency

· Reduce compliance risk

20 Key Points About This Project

· This project simulates a real-world content moderation analytics system using structured SAS programming techniques.

· It creates a dataset with more than 15 observations including operational, quality, and fraud-related variables.

· The dataset includes key metrics such as Reports_Count, Review_Time, False_Positive_Rate, Moderator_Load, and Accuracy_Score.

· Date variables are generated using the MDY function to ensure proper SAS date formatting.

· Time-based monitoring is performed using INTCK to calculate review aging in days.

· Future review scheduling is automated using the INTNX function.

· Character standardization is implemented using STRIP, TRIM, PROPERCASE, UPCASE, and LOWCASE functions.

· Numeric stability is ensured by handling division-by-zero scenarios in utilization calculations.

· Moderator utilization percentage is calculated to measure workload efficiency.

· Risk classification logic is applied based on accuracy and false positive thresholds.

· A macro-driven fraud detection rule identifies suspicious patterns dynamically.

· The macro is parameterized to allow reusable fraud threshold tuning.

· SET statements are used for vertical stacking of datasets.

· MERGE statements are applied to combine datasets based on common keys.

· PROC APPEND is utilized for efficient dataset extension without full rewrite.

· PROC TRANSPOSE is used to restructure metrics for analytical flexibility.

· PROC DATASETS DELETE cleans temporary datasets to maintain workspace efficiency.

· Intentional coding errors are introduced and corrected to demonstrate debugging capability.

· The project demonstrates both operational analytics and fraud monitoring integration.

· The final output produces a clean, standardized, scalable moderation analytics dataset ready for reporting and decision-making.

Summary:

This project demonstrates how to design, debug, and optimize a Content Moderation Analytics and Fraud Detection system using SAS in a practical and structured way. We began by creating a dataset with more than 15 observations that represents real-world moderation metrics such as reports count, review time, false positive rate, moderator workload, and accuracy score. Intentional programming errors were introduced to simulate real development challenges, and each error was carefully identified and corrected.

We applied important SAS concepts including MDY for date creation, INTCK and INTNX for time calculations, character functions like STRIP and PROPCASE for data cleaning, and numeric logic for utilization calculations. A reusable macro was developed to detect potential fraud based on configurable thresholds. Additionally, we demonstrated dataset management techniques such as SET, MERGE, APPEND, PROC TRANSPOSE, and PROC DATASETS DELETE.

Conclusion:

In conclusion, this project reflects a real production-level analytics workflow. It integrates data cleaning, classification, fraud detection, and performance monitoring into a single structured system. Such an approach improves operational efficiency, ensures data accuracy, and helps platforms make informed decisions while maintaining trust and compliance.

Here successfully built a complete Content Moderation Analytics System in SAS while intentionally introducing and correcting multiple programming errors.

We demonstrated:

Dataset creation with 15+ observations
Error debugging
Date functions (MDY, INTCK, INTNX)
Character & numeric functions
Utilization metrics
Fraud detection macro
SET, MERGE, APPEND
PROC TRANSPOSE
PROC DATASETS DELETE

This project simulates a real-world analytics workflow similar to those used by global platforms like Facebook, YouTube, Instagram, and X.

SAS INTERVIEW QUESTIONS

1.Difference between COALESCE and CASE?

2.How do you delete dataset?

3.How do you count records?

--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
About the Author:
SAS Learning Hub is a data analytics and SAS programming platform focused on clinical, financial, and real-world data analysis. The content is created by professionals with academic training in Pharmaceutics and hands-on experience in Base SAS, PROC SQL, Macros, SDTM, and ADaM, providing practical and industry-relevant SAS learning resources.

Disclaimer:
The datasets and analysis in this article are created for educational and demonstration purposes only. They do not represent CONTENT MODERATION data.

Our Mission:
This blog provides industry-focused SAS programming tutorials and analytics projects covering finance, healthcare, and technology.

This project is suitable for:
·  Students learning SAS
·  Data analysts building portfolios
·  Professionals preparing for SAS interviews
·  Bloggers writing about analytics
·  Clinical SAS Programmer
·  Research Data Analyst
·  Regulatory Data Validator

--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Follow Us On :

--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

--->Follow our blog for more SAS-based analytics projects and industry data models.

---> Support Us By Following Our Blog..

To deepen your understanding of SAS analytics, please refer to our other data science and industry-focused projects listed below:

1.Which Country Truly Dominates the Olympics? – A Complete SAS Medal Efficiency Analytics Project
2.Which Airports Are Really the Busiest? – An End-to-End SAS Airport Traffic Analytics Project
3.Can Data Predict Election Outcomes? – A Complete SAS Voting Analytics Project

--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

About Us | Contact | Privacy Policy

Search This Blog

SAS Learning Hub

415.Can We Build, Debug, And Detect Fraud In A Content Moderation Analytics System Using Advanced SAS Programming With Intentional Errors And Corrections?

Detect Fraud In A Content Moderation Analytics System Using Advanced SAS

HERE IN THIS PROJECT WE USED THESE SAS STATEMENTS —
DATA STEP | PROC PRINT | MACROS FREQ | PROC APPEND | SET | PROC SORT | MERGE | PROC DATASETS DELETE | DATA FUNCTIONS

ERRORS IN ABOVE CODE

Error 1: Month defined as Character

Error 2: Utilization Division Risk

Error 3: Risk_Level incomplete logic

Error 4: No LENGTH statement

EXPLANATION OF EACH CODE SECTION

LENGTH Statement

MDY Function

INTCK

INTNX

Character Functions

Numeric Functions

UTILIZATION ANALYSIS

FRAUD DETECTION LOGIC EXPLAINED

Follow Us On :

To deepen your understanding of SAS analytics, please refer to our other data science and industry-focused projects listed below:

1.Which Country Truly Dominates the Olympics? – A Complete SAS Medal Efficiency Analytics Project
2.Which Airports Are Really the Busiest? – An End-to-End SAS Airport Traffic Analytics Project
3.Can Data Predict Election Outcomes? – A Complete SAS Voting Analytics Project

--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

About Us | Contact | Privacy Policy

Comments

Post a Comment

Popular posts from this blog

412.Can We Build And Clean A University Course Analytics & Fraud Detection System In Sas Using Only Macros While Intentionally Creating And Fixing Errors?

420.Can We Detect Errors, Prevent Fraud, And Optimize Biometric Access System Security Using Advanced SAS Programming?

418.Can We Design, Debug, Detect Fraud, And Optimize A Smart Parking System Using Advanced SAS Programming Techniques?

415.Can We Build, Debug, And Detect Fraud In A Content Moderation Analytics System Using Advanced SAS Programming With Intentional Errors And Corrections?

Detect Fraud In A Content Moderation Analytics System Using Advanced SAS

HERE IN THIS PROJECT WE USED THESE SAS STATEMENTS —DATA STEP | PROC PRINT | MACROS FREQ | PROC APPEND | SET | PROC SORT | MERGE | PROC DATASETS DELETE | DATA FUNCTIONS

ERRORS IN ABOVE CODE

Error 1: Month defined as Character

Error 2: Utilization Division Risk

Error 3: Risk_Level incomplete logic

Error 4: No LENGTH statement

EXPLANATION OF EACH CODE SECTION

LENGTH Statement

MDY Function

INTCK

INTNX

Character Functions

Numeric Functions

UTILIZATION ANALYSIS

FRAUD DETECTION LOGIC EXPLAINED

Follow Us On :

To deepen your understanding of SAS analytics, please refer to our other data science and industry-focused projects listed below:

1.Which Country Truly Dominates the Olympics? – A Complete SAS Medal Efficiency Analytics Project2.Which Airports Are Really the Busiest? – An End-to-End SAS Airport Traffic Analytics Project3.Can Data Predict Election Outcomes? – A Complete SAS Voting Analytics Project

--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

About Us | Contact | Privacy Policy

Comments

Post a Comment

Popular posts from this blog

412.Can We Build And Clean A University Course Analytics & Fraud Detection System In Sas Using Only Macros While Intentionally Creating And Fixing Errors?

420.Can We Detect Errors, Prevent Fraud, And Optimize Biometric Access System Security Using Advanced SAS Programming?

418.Can We Design, Debug, Detect Fraud, And Optimize A Smart Parking System Using Advanced SAS Programming Techniques?

HERE IN THIS PROJECT WE USED THESE SAS STATEMENTS —
DATA STEP | PROC PRINT | MACROS FREQ | PROC APPEND | SET | PROC SORT | MERGE | PROC DATASETS DELETE | DATA FUNCTIONS

1.Which Country Truly Dominates the Olympics? – A Complete SAS Medal Efficiency Analytics Project
2.Which Airports Are Really the Busiest? – An End-to-End SAS Airport Traffic Analytics Project
3.Can Data Predict Election Outcomes? – A Complete SAS Voting Analytics Project