415.Can We Build, Debug, And Detect Fraud In A Content Moderation Analytics System Using Advanced SAS Programming With Intentional Errors And Corrections?
Detect Fraud In A Content Moderation Analytics System Using Advanced SAS
HERE IN THIS PROJECT WE USED THESE SAS STATEMENTS —
DATA STEP | PROC PRINT | MACROS FREQ | PROC APPEND | SET | PROC SORT | MERGE | PROC DATASETS DELETE | DATA FUNCTIONS
Introduction
In
today’s digital era, content moderation plays a critical role in maintaining
platform integrity, user trust, and regulatory compliance. Social media
platforms, video-sharing websites, and online forums process millions of
user-generated posts daily. With increasing volumes, platforms rely on
automated moderation systems combined with reviewers.
However,
analytics systems supporting moderation operations can contain data
inconsistencies, fraud indicators, incorrect calculations, or programming
errors. If not identified and corrected, these issues can lead to inaccurate
reporting, unfair content blocking, increased operational costs, and regulatory
risks.
In this
project, we will:
- Create a Content
Moderation dataset
- Introduce intentional
programming errors
- Identify and fix those
errors
- Apply SAS Macros for
fraud detection logic
- Use:
- Date functions: MDY, INTCK, INTNX
- Data combination: SET, MERGE, APPEND
- PROC TRANSPOSE
- Numeric & character
functions
- PROC DATASETS DELETE
- Utilization calculations
- Classification logic
- Provide full explanation for
every step
This
explanation is written in a simple style so that even beginners can understand.
Table Of Contents
- Business Context
- Variables Definition
- Step 1 – Create Dataset
(With Intentional Errors)
- Identify And Explain Errors
- Corrected Full-Length
Dataset Code
- Character Functions Usage
- Numeric Functions Usage
- Date Functions (MDY, INTCK,
INTNX)
- Utilization Calculation
- Fraud Detection Macro
- SET vs MERGE vs APPEND
- PROC TRANSPOSE
- PROC DATASETS DELETE
- Final Clean Dataset
- Business Insights
- 20 Key Points About This
Project
- Summary
- Conclusion
Business Context
Large
content platforms like:
- Facebook
- YouTube
- Instagram
- X
receive:
- Millions of user posts
- Spam
- Hate speech
- Fraud attempts
- Fake reporting manipulation
Analytics
teams monitor:
- Reports count
- False positive rate
- Moderator workload
- Accuracy score
- Review time
- Fraud signals
Our
system will simulate this scenario.
Variables In Our Dataset
|
Variable |
Type |
Description |
|
Platform |
Character |
Social
platform name |
|
Content_Type |
Character |
Type of
content |
|
Reports_Count |
Numeric |
Number
of reports |
|
Review_Time |
Numeric |
Time in
minutes |
|
False_Positive_Rate |
Numeric |
%
incorrect flags |
|
Moderator_Load |
Numeric |
Cases
handled |
|
Accuracy_Score |
Numeric |
%
correct decisions |
|
Review_Date |
Date |
Date
reviewed |
|
Utilization |
Numeric |
Load
efficiency |
|
Risk_Level |
Character |
Fraud
classification |
1. INTENTIONALLY WRONG CODE
data content_raw;
input Platform:$12. Content_Type $ Reports_Count Review_Time False_Positive_Rate
Moderator_Load Accuracy_Score Month $ Day Year;
format Review_Date date9.;
Review_Date = mdy(Month, Day, Year);
Utilization = Moderator_Load/Reports_Count*100;
length Risk_Level $8.;
if Accuracy_Score > 90 then Risk_Level = "Low";
else if False_Positive_Rate > 20 then Risk_Level = "High";
datalines;
Facebook Text 120 15 5 200 96 1 12 2026
YouTube Video 300 25 10 400 92 2 15 2026
Instagram Image 50 10 25 60 85 3 10 2026
X Audio 500 40 30 600 70 4 5 2026
;
run;
proc print data=content_raw;
run;
OUTPUT:
| Obs | Platform | Content_Type | Reports_Count | Review_Time | False_Positive_Rate | Moderator_Load | Accuracy_Score | Month | Day | Year | Review_Date | Utilization | Risk_Level |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | Text | 120 | 15 | 5 | 200 | 96 | 1 | 12 | 2026 | 12JAN2026 | 166.667 | Low | |
| 2 | YouTube | Video | 300 | 25 | 10 | 400 | 92 | 2 | 15 | 2026 | 15FEB2026 | 133.333 | Low |
| 3 | Image | 50 | 10 | 25 | 60 | 85 | 3 | 10 | 2026 | 10MAR2026 | 120.000 | High | |
| 4 | X | Audio | 500 | 40 | 30 | 600 | 70 | 4 | 5 | 2026 | 05APR2026 | 120.000 | High |
ERRORS IN ABOVE CODE
Error 1: Month defined as Character
Month
$ is character
But MDY()
expects numeric values
Error 2: Utilization Division Risk
If Reports_Count = 0 → division by zero.
Error 3: Risk_Level incomplete logic
If Accuracy < 95 and False_Positive <=
20 → missing category.
Error 4: No LENGTH statement
Character variables may get truncated.
2. CORRECTED FULL-LENGTH DATASET CODE
data content_moderation;
length Platform $20 Content_Type $20 Risk_Level $15 Classification $20;
input Platform $ Content_Type $ Reports_Count Review_Time False_Positive_Rate
Moderator_Load Accuracy_Score Month Day Year;
format Review_Date date9. Next_Review date9.;
Review_Date = mdy(Month, Day, Year);
/* Date Calculations */
Review_Age_Days = intck('day', Review_Date, today());
Next_Review = intnx('month', Review_Date, 1, 'same');
/* Character Cleaning */
Platform = propcase(strip(Platform));
Content_Type = upcase(trim(Content_Type));
/* Utilization Calculation */
if Reports_Count > 0 then
Utilization = (Moderator_Load / Reports_Count) * 100;
else Utilization = .;
/* Risk Classification */
if Accuracy_Score >= 95 and False_Positive_Rate < 10 then
Risk_Level = "Low";
else if 85 < Accuracy_Score < 94 then
Risk_Level = "Medium";
else Risk_Level = "High";
/* Fraud Classification */
if Reports_Count > 400 and Accuracy_Score < 80 then
Classification = "Potential Fraud";
else Classification = "Normal";
datalines;
Facebook Text 120 15 5 200 96 1 12 2026
YouTube Video 300 25 10 400 92 2 15 2026
Instagram Image 50 10 25 60 85 3 10 2026
X TEXT 500 40 30 600 70 4 5 2026
LinkedIn Text 80 20 8 100 90 5 8 2026
Reddit Text 450 35 28 500 75 6 14 2026
Snapchat Image 200 18 12 220 88 7 21 2026
TikTok Video 600 45 35 700 65 8 2 2026
Pinterest Image 150 12 7 170 93 9 18 2026
Quora Text 90 14 5 110 97 10 9 2026
Threads Text 250 22 15 300 89 11 11 2026
Discord IMAGE 310 28 18 350 87 12 4 2026
Telegram Text 420 30 20 450 82 1 25 2026
Medium Text 60 9 4 70 98 2 19 2026
Tumblr Image 180 16 11 210 91 3 30 2026
;
run;
proc print data=content_moderation;
run;
OUTPUT:
| Obs | Platform | Content_Type | Risk_Level | Classification | Reports_Count | Review_Time | False_Positive_Rate | Moderator_Load | Accuracy_Score | Month | Day | Year | Review_Date | Next_Review | Review_Age_Days | Utilization |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | TEXT | Low | Normal | 120 | 15 | 5 | 200 | 96 | 1 | 12 | 2026 | 12JAN2026 | 12FEB2026 | 48 | 166.667 | |
| 2 | Youtube | VIDEO | Medium | Normal | 300 | 25 | 10 | 400 | 92 | 2 | 15 | 2026 | 15FEB2026 | 15MAR2026 | 14 | 133.333 |
| 3 | IMAGE | High | Normal | 50 | 10 | 25 | 60 | 85 | 3 | 10 | 2026 | 10MAR2026 | 10APR2026 | -9 | 120.000 | |
| 4 | X | TEXT | High | Potential Fraud | 500 | 40 | 30 | 600 | 70 | 4 | 5 | 2026 | 05APR2026 | 05MAY2026 | -35 | 120.000 |
| 5 | TEXT | Medium | Normal | 80 | 20 | 8 | 100 | 90 | 5 | 8 | 2026 | 08MAY2026 | 08JUN2026 | -68 | 125.000 | |
| 6 | TEXT | High | Potential Fraud | 450 | 35 | 28 | 500 | 75 | 6 | 14 | 2026 | 14JUN2026 | 14JUL2026 | -105 | 111.111 | |
| 7 | Snapchat | IMAGE | Medium | Normal | 200 | 18 | 12 | 220 | 88 | 7 | 21 | 2026 | 21JUL2026 | 21AUG2026 | -142 | 110.000 |
| 8 | Tiktok | VIDEO | High | Potential Fraud | 600 | 45 | 35 | 700 | 65 | 8 | 2 | 2026 | 02AUG2026 | 02SEP2026 | -154 | 116.667 |
| 9 | IMAGE | Medium | Normal | 150 | 12 | 7 | 170 | 93 | 9 | 18 | 2026 | 18SEP2026 | 18OCT2026 | -201 | 113.333 | |
| 10 | Quora | TEXT | Low | Normal | 90 | 14 | 5 | 110 | 97 | 10 | 9 | 2026 | 09OCT2026 | 09NOV2026 | -222 | 122.222 |
| 11 | Threads | TEXT | Medium | Normal | 250 | 22 | 15 | 300 | 89 | 11 | 11 | 2026 | 11NOV2026 | 11DEC2026 | -255 | 120.000 |
| 12 | Discord | IMAGE | Medium | Normal | 310 | 28 | 18 | 350 | 87 | 12 | 4 | 2026 | 04DEC2026 | 04JAN2027 | -278 | 112.903 |
| 13 | Telegram | TEXT | High | Normal | 420 | 30 | 20 | 450 | 82 | 1 | 25 | 2026 | 25JAN2026 | 25FEB2026 | 35 | 107.143 |
| 14 | Medium | TEXT | Low | Normal | 60 | 9 | 4 | 70 | 98 | 2 | 19 | 2026 | 19FEB2026 | 19MAR2026 | 10 | 116.667 |
| 15 | Tumblr | IMAGE | Medium | Normal | 180 | 16 | 11 | 210 | 91 | 3 | 30 | 2026 | 30MAR2026 | 30APR2026 | -29 | 116.667 |
EXPLANATION OF EACH CODE SECTION
LENGTH Statement
Prevents truncation.
Important in production datasets.
MDY Function
Creates SAS date.
Needed for:
·
Time analysis
·
Aging reports
·
SLA tracking
INTCK
Counts difference in days.
Used for:
·
Pending case monitoring
·
KPI tracking
INTNX
Calculates next review schedule.
Used in:
·
Recurring moderation audits
Character Functions
|
Function |
Purpose |
|
strip() |
Removes leading/trailing blanks |
|
trim() |
Removes trailing blanks |
|
propcase() |
Proper case formatting |
|
upcase() |
Convert to uppercase |
|
lowcase() |
Convert to lowercase |
|
cat() |
Concatenate without separator |
|
catx() |
Concatenate with separator |
|
coalesce() |
First non-missing value |
Numeric Functions
·
ROUND
·
SUM
·
MEAN
·
MAX
·
MIN
·
COALESCE
3. MACRO FOR FRAUD DETECTION
%macro fraud_check(data=, threshold=400);
data fraud_flagged;
set &data;
if Reports_Count > &threshold and Accuracy_Score < 80 then Fraud_Flag = 1;
else Fraud_Flag = 0;
run;
proc print data=fraud_flagged;
run;
%mend;
%fraud_check(data=content_moderation, threshold=400);
OUTPUT:
| Obs | Platform | Content_Type | Risk_Level | Classification | Reports_Count | Review_Time | False_Positive_Rate | Moderator_Load | Accuracy_Score | Month | Day | Year | Review_Date | Next_Review | Review_Age_Days | Utilization | Fraud_Flag |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | TEXT | Low | Normal | 120 | 15 | 5 | 200 | 96 | 1 | 12 | 2026 | 12JAN2026 | 12FEB2026 | 48 | 166.667 | 0 | |
| 2 | Youtube | VIDEO | Medium | Normal | 300 | 25 | 10 | 400 | 92 | 2 | 15 | 2026 | 15FEB2026 | 15MAR2026 | 14 | 133.333 | 0 |
| 3 | IMAGE | High | Normal | 50 | 10 | 25 | 60 | 85 | 3 | 10 | 2026 | 10MAR2026 | 10APR2026 | -9 | 120.000 | 0 | |
| 4 | X | TEXT | High | Potential Fraud | 500 | 40 | 30 | 600 | 70 | 4 | 5 | 2026 | 05APR2026 | 05MAY2026 | -35 | 120.000 | 1 |
| 5 | TEXT | Medium | Normal | 80 | 20 | 8 | 100 | 90 | 5 | 8 | 2026 | 08MAY2026 | 08JUN2026 | -68 | 125.000 | 0 | |
| 6 | TEXT | High | Potential Fraud | 450 | 35 | 28 | 500 | 75 | 6 | 14 | 2026 | 14JUN2026 | 14JUL2026 | -105 | 111.111 | 1 | |
| 7 | Snapchat | IMAGE | Medium | Normal | 200 | 18 | 12 | 220 | 88 | 7 | 21 | 2026 | 21JUL2026 | 21AUG2026 | -142 | 110.000 | 0 |
| 8 | Tiktok | VIDEO | High | Potential Fraud | 600 | 45 | 35 | 700 | 65 | 8 | 2 | 2026 | 02AUG2026 | 02SEP2026 | -154 | 116.667 | 1 |
| 9 | IMAGE | Medium | Normal | 150 | 12 | 7 | 170 | 93 | 9 | 18 | 2026 | 18SEP2026 | 18OCT2026 | -201 | 113.333 | 0 | |
| 10 | Quora | TEXT | Low | Normal | 90 | 14 | 5 | 110 | 97 | 10 | 9 | 2026 | 09OCT2026 | 09NOV2026 | -222 | 122.222 | 0 |
| 11 | Threads | TEXT | Medium | Normal | 250 | 22 | 15 | 300 | 89 | 11 | 11 | 2026 | 11NOV2026 | 11DEC2026 | -255 | 120.000 | 0 |
| 12 | Discord | IMAGE | Medium | Normal | 310 | 28 | 18 | 350 | 87 | 12 | 4 | 2026 | 04DEC2026 | 04JAN2027 | -278 | 112.903 | 0 |
| 13 | Telegram | TEXT | High | Normal | 420 | 30 | 20 | 450 | 82 | 1 | 25 | 2026 | 25JAN2026 | 25FEB2026 | 35 | 107.143 | 0 |
| 14 | Medium | TEXT | Low | Normal | 60 | 9 | 4 | 70 | 98 | 2 | 19 | 2026 | 19FEB2026 | 19MAR2026 | 10 | 116.667 | 0 |
| 15 | Tumblr | IMAGE | Medium | Normal | 180 | 16 | 11 | 210 | 91 | 3 | 30 | 2026 | 30MAR2026 | 30APR2026 | -29 | 116.667 | 0 |
·
Reusable
·
Parameter driven
·
Scalable
4. SET Statement
data combined;
set content_moderation
fraud_flagged;
run;
proc print data=combined;
run;
OUTPUT:
| Obs | Platform | Content_Type | Risk_Level | Classification | Reports_Count | Review_Time | False_Positive_Rate | Moderator_Load | Accuracy_Score | Month | Day | Year | Review_Date | Next_Review | Review_Age_Days | Utilization | Fraud_Flag |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | TEXT | Low | Normal | 120 | 15 | 5 | 200 | 96 | 1 | 12 | 2026 | 12JAN2026 | 12FEB2026 | 48 | 166.667 | . | |
| 2 | Youtube | VIDEO | Medium | Normal | 300 | 25 | 10 | 400 | 92 | 2 | 15 | 2026 | 15FEB2026 | 15MAR2026 | 14 | 133.333 | . |
| 3 | IMAGE | High | Normal | 50 | 10 | 25 | 60 | 85 | 3 | 10 | 2026 | 10MAR2026 | 10APR2026 | -9 | 120.000 | . | |
| 4 | X | TEXT | High | Potential Fraud | 500 | 40 | 30 | 600 | 70 | 4 | 5 | 2026 | 05APR2026 | 05MAY2026 | -35 | 120.000 | . |
| 5 | TEXT | Medium | Normal | 80 | 20 | 8 | 100 | 90 | 5 | 8 | 2026 | 08MAY2026 | 08JUN2026 | -68 | 125.000 | . | |
| 6 | TEXT | High | Potential Fraud | 450 | 35 | 28 | 500 | 75 | 6 | 14 | 2026 | 14JUN2026 | 14JUL2026 | -105 | 111.111 | . | |
| 7 | Snapchat | IMAGE | Medium | Normal | 200 | 18 | 12 | 220 | 88 | 7 | 21 | 2026 | 21JUL2026 | 21AUG2026 | -142 | 110.000 | . |
| 8 | Tiktok | VIDEO | High | Potential Fraud | 600 | 45 | 35 | 700 | 65 | 8 | 2 | 2026 | 02AUG2026 | 02SEP2026 | -154 | 116.667 | . |
| 9 | IMAGE | Medium | Normal | 150 | 12 | 7 | 170 | 93 | 9 | 18 | 2026 | 18SEP2026 | 18OCT2026 | -201 | 113.333 | . | |
| 10 | Quora | TEXT | Low | Normal | 90 | 14 | 5 | 110 | 97 | 10 | 9 | 2026 | 09OCT2026 | 09NOV2026 | -222 | 122.222 | . |
| 11 | Threads | TEXT | Medium | Normal | 250 | 22 | 15 | 300 | 89 | 11 | 11 | 2026 | 11NOV2026 | 11DEC2026 | -255 | 120.000 | . |
| 12 | Discord | IMAGE | Medium | Normal | 310 | 28 | 18 | 350 | 87 | 12 | 4 | 2026 | 04DEC2026 | 04JAN2027 | -278 | 112.903 | . |
| 13 | Telegram | TEXT | High | Normal | 420 | 30 | 20 | 450 | 82 | 1 | 25 | 2026 | 25JAN2026 | 25FEB2026 | 35 | 107.143 | . |
| 14 | Medium | TEXT | Low | Normal | 60 | 9 | 4 | 70 | 98 | 2 | 19 | 2026 | 19FEB2026 | 19MAR2026 | 10 | 116.667 | . |
| 15 | Tumblr | IMAGE | Medium | Normal | 180 | 16 | 11 | 210 | 91 | 3 | 30 | 2026 | 30MAR2026 | 30APR2026 | -29 | 116.667 | . |
| 16 | TEXT | Low | Normal | 120 | 15 | 5 | 200 | 96 | 1 | 12 | 2026 | 12JAN2026 | 12FEB2026 | 48 | 166.667 | 0 | |
| 17 | Youtube | VIDEO | Medium | Normal | 300 | 25 | 10 | 400 | 92 | 2 | 15 | 2026 | 15FEB2026 | 15MAR2026 | 14 | 133.333 | 0 |
| 18 | IMAGE | High | Normal | 50 | 10 | 25 | 60 | 85 | 3 | 10 | 2026 | 10MAR2026 | 10APR2026 | -9 | 120.000 | 0 | |
| 19 | X | TEXT | High | Potential Fraud | 500 | 40 | 30 | 600 | 70 | 4 | 5 | 2026 | 05APR2026 | 05MAY2026 | -35 | 120.000 | 1 |
| 20 | TEXT | Medium | Normal | 80 | 20 | 8 | 100 | 90 | 5 | 8 | 2026 | 08MAY2026 | 08JUN2026 | -68 | 125.000 | 0 | |
| 21 | TEXT | High | Potential Fraud | 450 | 35 | 28 | 500 | 75 | 6 | 14 | 2026 | 14JUN2026 | 14JUL2026 | -105 | 111.111 | 1 | |
| 22 | Snapchat | IMAGE | Medium | Normal | 200 | 18 | 12 | 220 | 88 | 7 | 21 | 2026 | 21JUL2026 | 21AUG2026 | -142 | 110.000 | 0 |
| 23 | Tiktok | VIDEO | High | Potential Fraud | 600 | 45 | 35 | 700 | 65 | 8 | 2 | 2026 | 02AUG2026 | 02SEP2026 | -154 | 116.667 | 1 |
| 24 | IMAGE | Medium | Normal | 150 | 12 | 7 | 170 | 93 | 9 | 18 | 2026 | 18SEP2026 | 18OCT2026 | -201 | 113.333 | 0 | |
| 25 | Quora | TEXT | Low | Normal | 90 | 14 | 5 | 110 | 97 | 10 | 9 | 2026 | 09OCT2026 | 09NOV2026 | -222 | 122.222 | 0 |
| 26 | Threads | TEXT | Medium | Normal | 250 | 22 | 15 | 300 | 89 | 11 | 11 | 2026 | 11NOV2026 | 11DEC2026 | -255 | 120.000 | 0 |
| 27 | Discord | IMAGE | Medium | Normal | 310 | 28 | 18 | 350 | 87 | 12 | 4 | 2026 | 04DEC2026 | 04JAN2027 | -278 | 112.903 | 0 |
| 28 | Telegram | TEXT | High | Normal | 420 | 30 | 20 | 450 | 82 | 1 | 25 | 2026 | 25JAN2026 | 25FEB2026 | 35 | 107.143 | 0 |
| 29 | Medium | TEXT | Low | Normal | 60 | 9 | 4 | 70 | 98 | 2 | 19 | 2026 | 19FEB2026 | 19MAR2026 | 10 | 116.667 | 0 |
| 30 | Tumblr | IMAGE | Medium | Normal | 180 | 16 | 11 | 210 | 91 | 3 | 30 | 2026 | 30MAR2026 | 30APR2026 | -29 | 116.667 | 0 |
Used to stack datasets vertically.
5. MERGE Statement
proc sort data=content_moderation;by platform;run;
proc print data=content_moderation;
run;
OUTPUT:
| Obs | Platform | Content_Type | Risk_Level | Classification | Reports_Count | Review_Time | False_Positive_Rate | Moderator_Load | Accuracy_Score | Month | Day | Year | Review_Date | Next_Review | Review_Age_Days | Utilization |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | Discord | IMAGE | Medium | Normal | 310 | 28 | 18 | 350 | 87 | 12 | 4 | 2026 | 04DEC2026 | 04JAN2027 | -278 | 112.903 |
| 2 | TEXT | Low | Normal | 120 | 15 | 5 | 200 | 96 | 1 | 12 | 2026 | 12JAN2026 | 12FEB2026 | 48 | 166.667 | |
| 3 | IMAGE | High | Normal | 50 | 10 | 25 | 60 | 85 | 3 | 10 | 2026 | 10MAR2026 | 10APR2026 | -9 | 120.000 | |
| 4 | TEXT | Medium | Normal | 80 | 20 | 8 | 100 | 90 | 5 | 8 | 2026 | 08MAY2026 | 08JUN2026 | -68 | 125.000 | |
| 5 | Medium | TEXT | Low | Normal | 60 | 9 | 4 | 70 | 98 | 2 | 19 | 2026 | 19FEB2026 | 19MAR2026 | 10 | 116.667 |
| 6 | IMAGE | Medium | Normal | 150 | 12 | 7 | 170 | 93 | 9 | 18 | 2026 | 18SEP2026 | 18OCT2026 | -201 | 113.333 | |
| 7 | Quora | TEXT | Low | Normal | 90 | 14 | 5 | 110 | 97 | 10 | 9 | 2026 | 09OCT2026 | 09NOV2026 | -222 | 122.222 |
| 8 | TEXT | High | Potential Fraud | 450 | 35 | 28 | 500 | 75 | 6 | 14 | 2026 | 14JUN2026 | 14JUL2026 | -105 | 111.111 | |
| 9 | Snapchat | IMAGE | Medium | Normal | 200 | 18 | 12 | 220 | 88 | 7 | 21 | 2026 | 21JUL2026 | 21AUG2026 | -142 | 110.000 |
| 10 | Telegram | TEXT | High | Normal | 420 | 30 | 20 | 450 | 82 | 1 | 25 | 2026 | 25JAN2026 | 25FEB2026 | 35 | 107.143 |
| 11 | Threads | TEXT | Medium | Normal | 250 | 22 | 15 | 300 | 89 | 11 | 11 | 2026 | 11NOV2026 | 11DEC2026 | -255 | 120.000 |
| 12 | Tiktok | VIDEO | High | Potential Fraud | 600 | 45 | 35 | 700 | 65 | 8 | 2 | 2026 | 02AUG2026 | 02SEP2026 | -154 | 116.667 |
| 13 | Tumblr | IMAGE | Medium | Normal | 180 | 16 | 11 | 210 | 91 | 3 | 30 | 2026 | 30MAR2026 | 30APR2026 | -29 | 116.667 |
| 14 | X | TEXT | High | Potential Fraud | 500 | 40 | 30 | 600 | 70 | 4 | 5 | 2026 | 05APR2026 | 05MAY2026 | -35 | 120.000 |
| 15 | Youtube | VIDEO | Medium | Normal | 300 | 25 | 10 | 400 | 92 | 2 | 15 | 2026 | 15FEB2026 | 15MAR2026 | 14 | 133.333 |
proc sort data=fraud_flagged;by platform;run;
proc print data=fraud_flagged;
run;
OUTPUT:
| Obs | Platform | Content_Type | Risk_Level | Classification | Reports_Count | Review_Time | False_Positive_Rate | Moderator_Load | Accuracy_Score | Month | Day | Year | Review_Date | Next_Review | Review_Age_Days | Utilization | Fraud_Flag |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | Discord | IMAGE | Medium | Normal | 310 | 28 | 18 | 350 | 87 | 12 | 4 | 2026 | 04DEC2026 | 04JAN2027 | -278 | 112.903 | 0 |
| 2 | TEXT | Low | Normal | 120 | 15 | 5 | 200 | 96 | 1 | 12 | 2026 | 12JAN2026 | 12FEB2026 | 48 | 166.667 | 0 | |
| 3 | IMAGE | High | Normal | 50 | 10 | 25 | 60 | 85 | 3 | 10 | 2026 | 10MAR2026 | 10APR2026 | -9 | 120.000 | 0 | |
| 4 | TEXT | Medium | Normal | 80 | 20 | 8 | 100 | 90 | 5 | 8 | 2026 | 08MAY2026 | 08JUN2026 | -68 | 125.000 | 0 | |
| 5 | Medium | TEXT | Low | Normal | 60 | 9 | 4 | 70 | 98 | 2 | 19 | 2026 | 19FEB2026 | 19MAR2026 | 10 | 116.667 | 0 |
| 6 | IMAGE | Medium | Normal | 150 | 12 | 7 | 170 | 93 | 9 | 18 | 2026 | 18SEP2026 | 18OCT2026 | -201 | 113.333 | 0 | |
| 7 | Quora | TEXT | Low | Normal | 90 | 14 | 5 | 110 | 97 | 10 | 9 | 2026 | 09OCT2026 | 09NOV2026 | -222 | 122.222 | 0 |
| 8 | TEXT | High | Potential Fraud | 450 | 35 | 28 | 500 | 75 | 6 | 14 | 2026 | 14JUN2026 | 14JUL2026 | -105 | 111.111 | 1 | |
| 9 | Snapchat | IMAGE | Medium | Normal | 200 | 18 | 12 | 220 | 88 | 7 | 21 | 2026 | 21JUL2026 | 21AUG2026 | -142 | 110.000 | 0 |
| 10 | Telegram | TEXT | High | Normal | 420 | 30 | 20 | 450 | 82 | 1 | 25 | 2026 | 25JAN2026 | 25FEB2026 | 35 | 107.143 | 0 |
| 11 | Threads | TEXT | Medium | Normal | 250 | 22 | 15 | 300 | 89 | 11 | 11 | 2026 | 11NOV2026 | 11DEC2026 | -255 | 120.000 | 0 |
| 12 | Tiktok | VIDEO | High | Potential Fraud | 600 | 45 | 35 | 700 | 65 | 8 | 2 | 2026 | 02AUG2026 | 02SEP2026 | -154 | 116.667 | 1 |
| 13 | Tumblr | IMAGE | Medium | Normal | 180 | 16 | 11 | 210 | 91 | 3 | 30 | 2026 | 30MAR2026 | 30APR2026 | -29 | 116.667 | 0 |
| 14 | X | TEXT | High | Potential Fraud | 500 | 40 | 30 | 600 | 70 | 4 | 5 | 2026 | 05APR2026 | 05MAY2026 | -35 | 120.000 | 1 |
| 15 | Youtube | VIDEO | Medium | Normal | 300 | 25 | 10 | 400 | 92 | 2 | 15 | 2026 | 15FEB2026 | 15MAR2026 | 14 | 133.333 | 0 |
data merged_data;
merge content_moderation
fraud_flagged;
by Platform;
run;
proc print data=merged_data;
run;
OUTPUT:
| Obs | Platform | Content_Type | Risk_Level | Classification | Reports_Count | Review_Time | False_Positive_Rate | Moderator_Load | Accuracy_Score | Month | Day | Year | Review_Date | Next_Review | Review_Age_Days | Utilization | Fraud_Flag |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | Discord | IMAGE | Medium | Normal | 310 | 28 | 18 | 350 | 87 | 12 | 4 | 2026 | 04DEC2026 | 04JAN2027 | -278 | 112.903 | 0 |
| 2 | TEXT | Low | Normal | 120 | 15 | 5 | 200 | 96 | 1 | 12 | 2026 | 12JAN2026 | 12FEB2026 | 48 | 166.667 | 0 | |
| 3 | IMAGE | High | Normal | 50 | 10 | 25 | 60 | 85 | 3 | 10 | 2026 | 10MAR2026 | 10APR2026 | -9 | 120.000 | 0 | |
| 4 | TEXT | Medium | Normal | 80 | 20 | 8 | 100 | 90 | 5 | 8 | 2026 | 08MAY2026 | 08JUN2026 | -68 | 125.000 | 0 | |
| 5 | Medium | TEXT | Low | Normal | 60 | 9 | 4 | 70 | 98 | 2 | 19 | 2026 | 19FEB2026 | 19MAR2026 | 10 | 116.667 | 0 |
| 6 | IMAGE | Medium | Normal | 150 | 12 | 7 | 170 | 93 | 9 | 18 | 2026 | 18SEP2026 | 18OCT2026 | -201 | 113.333 | 0 | |
| 7 | Quora | TEXT | Low | Normal | 90 | 14 | 5 | 110 | 97 | 10 | 9 | 2026 | 09OCT2026 | 09NOV2026 | -222 | 122.222 | 0 |
| 8 | TEXT | High | Potential Fraud | 450 | 35 | 28 | 500 | 75 | 6 | 14 | 2026 | 14JUN2026 | 14JUL2026 | -105 | 111.111 | 1 | |
| 9 | Snapchat | IMAGE | Medium | Normal | 200 | 18 | 12 | 220 | 88 | 7 | 21 | 2026 | 21JUL2026 | 21AUG2026 | -142 | 110.000 | 0 |
| 10 | Telegram | TEXT | High | Normal | 420 | 30 | 20 | 450 | 82 | 1 | 25 | 2026 | 25JAN2026 | 25FEB2026 | 35 | 107.143 | 0 |
| 11 | Threads | TEXT | Medium | Normal | 250 | 22 | 15 | 300 | 89 | 11 | 11 | 2026 | 11NOV2026 | 11DEC2026 | -255 | 120.000 | 0 |
| 12 | Tiktok | VIDEO | High | Potential Fraud | 600 | 45 | 35 | 700 | 65 | 8 | 2 | 2026 | 02AUG2026 | 02SEP2026 | -154 | 116.667 | 1 |
| 13 | Tumblr | IMAGE | Medium | Normal | 180 | 16 | 11 | 210 | 91 | 3 | 30 | 2026 | 30MAR2026 | 30APR2026 | -29 | 116.667 | 0 |
| 14 | X | TEXT | High | Potential Fraud | 500 | 40 | 30 | 600 | 70 | 4 | 5 | 2026 | 05APR2026 | 05MAY2026 | -35 | 120.000 | 1 |
| 15 | Youtube | VIDEO | Medium | Normal | 300 | 25 | 10 | 400 | 92 | 2 | 15 | 2026 | 15FEB2026 | 15MAR2026 | 14 | 133.333 | 0 |
6. PROC APPEND
proc append base=content_moderation
data=fraud_flagged force;
run;
proc print data=content_moderation;
run;
OUTPUT:
| Obs | Platform | Content_Type | Risk_Level | Classification | Reports_Count | Review_Time | False_Positive_Rate | Moderator_Load | Accuracy_Score | Month | Day | Year | Review_Date | Next_Review | Review_Age_Days | Utilization |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | Discord | IMAGE | Medium | Normal | 310 | 28 | 18 | 350 | 87 | 12 | 4 | 2026 | 04DEC2026 | 04JAN2027 | -278 | 112.903 |
| 2 | TEXT | Low | Normal | 120 | 15 | 5 | 200 | 96 | 1 | 12 | 2026 | 12JAN2026 | 12FEB2026 | 48 | 166.667 | |
| 3 | IMAGE | High | Normal | 50 | 10 | 25 | 60 | 85 | 3 | 10 | 2026 | 10MAR2026 | 10APR2026 | -9 | 120.000 | |
| 4 | TEXT | Medium | Normal | 80 | 20 | 8 | 100 | 90 | 5 | 8 | 2026 | 08MAY2026 | 08JUN2026 | -68 | 125.000 | |
| 5 | Medium | TEXT | Low | Normal | 60 | 9 | 4 | 70 | 98 | 2 | 19 | 2026 | 19FEB2026 | 19MAR2026 | 10 | 116.667 |
| 6 | IMAGE | Medium | Normal | 150 | 12 | 7 | 170 | 93 | 9 | 18 | 2026 | 18SEP2026 | 18OCT2026 | -201 | 113.333 | |
| 7 | Quora | TEXT | Low | Normal | 90 | 14 | 5 | 110 | 97 | 10 | 9 | 2026 | 09OCT2026 | 09NOV2026 | -222 | 122.222 |
| 8 | TEXT | High | Potential Fraud | 450 | 35 | 28 | 500 | 75 | 6 | 14 | 2026 | 14JUN2026 | 14JUL2026 | -105 | 111.111 | |
| 9 | Snapchat | IMAGE | Medium | Normal | 200 | 18 | 12 | 220 | 88 | 7 | 21 | 2026 | 21JUL2026 | 21AUG2026 | -142 | 110.000 |
| 10 | Telegram | TEXT | High | Normal | 420 | 30 | 20 | 450 | 82 | 1 | 25 | 2026 | 25JAN2026 | 25FEB2026 | 35 | 107.143 |
| 11 | Threads | TEXT | Medium | Normal | 250 | 22 | 15 | 300 | 89 | 11 | 11 | 2026 | 11NOV2026 | 11DEC2026 | -255 | 120.000 |
| 12 | Tiktok | VIDEO | High | Potential Fraud | 600 | 45 | 35 | 700 | 65 | 8 | 2 | 2026 | 02AUG2026 | 02SEP2026 | -154 | 116.667 |
| 13 | Tumblr | IMAGE | Medium | Normal | 180 | 16 | 11 | 210 | 91 | 3 | 30 | 2026 | 30MAR2026 | 30APR2026 | -29 | 116.667 |
| 14 | X | TEXT | High | Potential Fraud | 500 | 40 | 30 | 600 | 70 | 4 | 5 | 2026 | 05APR2026 | 05MAY2026 | -35 | 120.000 |
| 15 | Youtube | VIDEO | Medium | Normal | 300 | 25 | 10 | 400 | 92 | 2 | 15 | 2026 | 15FEB2026 | 15MAR2026 | 14 | 133.333 |
| 16 | Discord | IMAGE | Medium | Normal | 310 | 28 | 18 | 350 | 87 | 12 | 4 | 2026 | 04DEC2026 | 04JAN2027 | -278 | 112.903 |
| 17 | TEXT | Low | Normal | 120 | 15 | 5 | 200 | 96 | 1 | 12 | 2026 | 12JAN2026 | 12FEB2026 | 48 | 166.667 | |
| 18 | IMAGE | High | Normal | 50 | 10 | 25 | 60 | 85 | 3 | 10 | 2026 | 10MAR2026 | 10APR2026 | -9 | 120.000 | |
| 19 | TEXT | Medium | Normal | 80 | 20 | 8 | 100 | 90 | 5 | 8 | 2026 | 08MAY2026 | 08JUN2026 | -68 | 125.000 | |
| 20 | Medium | TEXT | Low | Normal | 60 | 9 | 4 | 70 | 98 | 2 | 19 | 2026 | 19FEB2026 | 19MAR2026 | 10 | 116.667 |
| 21 | IMAGE | Medium | Normal | 150 | 12 | 7 | 170 | 93 | 9 | 18 | 2026 | 18SEP2026 | 18OCT2026 | -201 | 113.333 | |
| 22 | Quora | TEXT | Low | Normal | 90 | 14 | 5 | 110 | 97 | 10 | 9 | 2026 | 09OCT2026 | 09NOV2026 | -222 | 122.222 |
| 23 | TEXT | High | Potential Fraud | 450 | 35 | 28 | 500 | 75 | 6 | 14 | 2026 | 14JUN2026 | 14JUL2026 | -105 | 111.111 | |
| 24 | Snapchat | IMAGE | Medium | Normal | 200 | 18 | 12 | 220 | 88 | 7 | 21 | 2026 | 21JUL2026 | 21AUG2026 | -142 | 110.000 |
| 25 | Telegram | TEXT | High | Normal | 420 | 30 | 20 | 450 | 82 | 1 | 25 | 2026 | 25JAN2026 | 25FEB2026 | 35 | 107.143 |
| 26 | Threads | TEXT | Medium | Normal | 250 | 22 | 15 | 300 | 89 | 11 | 11 | 2026 | 11NOV2026 | 11DEC2026 | -255 | 120.000 |
| 27 | Tiktok | VIDEO | High | Potential Fraud | 600 | 45 | 35 | 700 | 65 | 8 | 2 | 2026 | 02AUG2026 | 02SEP2026 | -154 | 116.667 |
| 28 | Tumblr | IMAGE | Medium | Normal | 180 | 16 | 11 | 210 | 91 | 3 | 30 | 2026 | 30MAR2026 | 30APR2026 | -29 | 116.667 |
| 29 | X | TEXT | High | Potential Fraud | 500 | 40 | 30 | 600 | 70 | 4 | 5 | 2026 | 05APR2026 | 05MAY2026 | -35 | 120.000 |
| 30 | Youtube | VIDEO | Medium | Normal | 300 | 25 | 10 | 400 | 92 | 2 | 15 | 2026 | 15FEB2026 | 15MAR2026 | 14 | 133.333 |
Efficiently adds dataset.
7. PROC TRANSPOSE
proc transpose data=content_moderation out=transposed_data;
by Platform NotSorted;
var Reports_Count Accuracy_Score;
run;
proc print data=transposed_data;
run;
OUTPUT:
| Obs | Platform | _NAME_ | COL1 |
|---|---|---|---|
| 1 | Discord | Reports_Count | 310 |
| 2 | Discord | Accuracy_Score | 87 |
| 3 | Reports_Count | 120 | |
| 4 | Accuracy_Score | 96 | |
| 5 | Reports_Count | 50 | |
| 6 | Accuracy_Score | 85 | |
| 7 | Reports_Count | 80 | |
| 8 | Accuracy_Score | 90 | |
| 9 | Medium | Reports_Count | 60 |
| 10 | Medium | Accuracy_Score | 98 |
| 11 | Reports_Count | 150 | |
| 12 | Accuracy_Score | 93 | |
| 13 | Quora | Reports_Count | 90 |
| 14 | Quora | Accuracy_Score | 97 |
| 15 | Reports_Count | 450 | |
| 16 | Accuracy_Score | 75 | |
| 17 | Snapchat | Reports_Count | 200 |
| 18 | Snapchat | Accuracy_Score | 88 |
| 19 | Telegram | Reports_Count | 420 |
| 20 | Telegram | Accuracy_Score | 82 |
| 21 | Threads | Reports_Count | 250 |
| 22 | Threads | Accuracy_Score | 89 |
| 23 | Tiktok | Reports_Count | 600 |
| 24 | Tiktok | Accuracy_Score | 65 |
| 25 | Tumblr | Reports_Count | 180 |
| 26 | Tumblr | Accuracy_Score | 91 |
| 27 | X | Reports_Count | 500 |
| 28 | X | Accuracy_Score | 70 |
| 29 | Youtube | Reports_Count | 300 |
| 30 | Youtube | Accuracy_Score | 92 |
| 31 | Discord | Reports_Count | 310 |
| 32 | Discord | Accuracy_Score | 87 |
| 33 | Reports_Count | 120 | |
| 34 | Accuracy_Score | 96 | |
| 35 | Reports_Count | 50 | |
| 36 | Accuracy_Score | 85 | |
| 37 | Reports_Count | 80 | |
| 38 | Accuracy_Score | 90 | |
| 39 | Medium | Reports_Count | 60 |
| 40 | Medium | Accuracy_Score | 98 |
| 41 | Reports_Count | 150 | |
| 42 | Accuracy_Score | 93 | |
| 43 | Quora | Reports_Count | 90 |
| 44 | Quora | Accuracy_Score | 97 |
| 45 | Reports_Count | 450 | |
| 46 | Accuracy_Score | 75 | |
| 47 | Snapchat | Reports_Count | 200 |
| 48 | Snapchat | Accuracy_Score | 88 |
| 49 | Telegram | Reports_Count | 420 |
| 50 | Telegram | Accuracy_Score | 82 |
| 51 | Threads | Reports_Count | 250 |
| 52 | Threads | Accuracy_Score | 89 |
| 53 | Tiktok | Reports_Count | 600 |
| 54 | Tiktok | Accuracy_Score | 65 |
| 55 | Tumblr | Reports_Count | 180 |
| 56 | Tumblr | Accuracy_Score | 91 |
| 57 | X | Reports_Count | 500 |
| 58 | X | Accuracy_Score | 70 |
| 59 | Youtube | Reports_Count | 300 |
| 60 | Youtube | Accuracy_Score | 92 |
8. PROC DATASETS DELETE
proc datasets library=work;
delete content_raw transposed_data;
quit;
LOG:
Deletes unwanted datasets.
Used to clean workspace.
UTILIZATION ANALYSIS
Utilization = Moderator_Load / Reports_Count *
100
100% →
overload
80–100% →
optimal
<50% →
underutilized
FRAUD DETECTION LOGIC EXPLAINED
Fraud signals include:
1.
High reports
2.
Low accuracy
3.
High false positive rate
4.
Abnormal utilization
5.
Repeated review spikes
Macro automates this detection.
FINAL
CLEAN DATASET READY FOR:
·
KPI dashboards
·
SLA monitoring
·
Fraud analytics
·
Moderator workload balancing
·
Monthly trend analysis
BUSINESS INSIGHTS FROM THIS PROJECT
·
Identify overloaded moderators
·
Detect fake reporting attacks
·
Improve accuracy tracking
·
Monitor false positives
·
Optimize staffing
·
Schedule next review automatically
·
Standardize data cleaning
·
Automate fraud classification
·
Improve reporting transparency
·
Reduce compliance risk
20 Key Points About This Project
·
This project simulates a real-world
content moderation analytics system using structured SAS programming
techniques.
·
It creates a dataset with more than 15
observations including operational, quality, and fraud-related variables.
·
The dataset includes key metrics such as
Reports_Count, Review_Time, False_Positive_Rate, Moderator_Load, and
Accuracy_Score.
·
Date variables are generated using the
MDY function to ensure proper SAS date formatting.
·
Time-based monitoring is performed using
INTCK to calculate review aging in days.
·
Future review scheduling is automated
using the INTNX function.
·
Character standardization is implemented
using STRIP, TRIM, PROPERCASE, UPCASE, and LOWCASE functions.
·
Numeric stability is ensured by handling
division-by-zero scenarios in utilization calculations.
·
Moderator utilization percentage is
calculated to measure workload efficiency.
·
Risk classification logic is applied
based on accuracy and false positive thresholds.
·
A macro-driven fraud detection rule
identifies suspicious patterns dynamically.
·
The macro is parameterized to allow
reusable fraud threshold tuning.
·
SET statements are used for vertical
stacking of datasets.
·
MERGE statements are applied to combine
datasets based on common keys.
·
PROC APPEND is utilized for efficient
dataset extension without full rewrite.
·
PROC TRANSPOSE is used to restructure
metrics for analytical flexibility.
·
PROC DATASETS DELETE cleans temporary
datasets to maintain workspace efficiency.
·
Intentional coding errors are introduced
and corrected to demonstrate debugging capability.
·
The project demonstrates both
operational analytics and fraud monitoring integration.
·
The final output produces a clean,
standardized, scalable moderation analytics dataset ready for reporting and
decision-making.
This project demonstrates how to design, debug, and optimize a Content
Moderation Analytics and Fraud Detection system using SAS in a practical and
structured way. We began by creating a dataset with more than 15 observations
that represents real-world moderation metrics such as reports count, review
time, false positive rate, moderator workload, and accuracy score. Intentional
programming errors were introduced to simulate real development challenges, and
each error was carefully identified and corrected.
We applied important SAS concepts including MDY
for date creation, INTCK and INTNX for time calculations, character functions
like STRIP and PROPCASE for data cleaning, and numeric logic for utilization
calculations. A reusable macro was developed to detect potential fraud based on
configurable thresholds. Additionally, we demonstrated dataset management
techniques such as SET, MERGE, APPEND, PROC TRANSPOSE, and PROC DATASETS
DELETE.
Conclusion:
In
conclusion, this project reflects a real production-level analytics workflow.
It integrates data cleaning, classification, fraud detection, and performance
monitoring into a single structured system. Such an approach improves
operational efficiency, ensures data accuracy, and helps platforms make
informed decisions while maintaining trust and compliance.
Here successfully
built a complete Content Moderation Analytics System in SAS while intentionally
introducing and correcting multiple programming errors.
We
demonstrated:
- Dataset creation with 15+
observations
- Error debugging
- Date functions (MDY, INTCK,
INTNX)
- Character & numeric
functions
- Utilization metrics
- Fraud detection macro
- SET, MERGE, APPEND
- PROC TRANSPOSE
- PROC DATASETS DELETE
This
project simulates a real-world analytics workflow similar to those used by
global platforms like Facebook, YouTube, Instagram, and X.
SAS INTERVIEW QUESTIONS
1.Difference between COALESCE and CASE?
2.How do you delete dataset?
3.How do you count records?
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
About the Author:
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
About the Author:
SAS Learning Hub is a data analytics and SAS programming platform focused on clinical, financial, and real-world data analysis. The content is created by professionals with academic training in Pharmaceutics and hands-on experience in Base SAS, PROC SQL, Macros, SDTM, and ADaM, providing practical and industry-relevant SAS learning resources.
Disclaimer:
The datasets and analysis in this article are created for educational and demonstration purposes only. They do not represent CONTENT MODERATION data.
Our Mission:
This blog provides industry-focused SAS programming tutorials and analytics projects covering finance, healthcare, and technology.
This project is suitable for:
· Students learning SAS
· Data analysts building portfolios
· Professionals preparing for SAS interviews
· Bloggers writing about analytics
· Clinical SAS Programmer
· Research Data Analyst
· Regulatory Data Validator
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Comments
Post a Comment