415.Can We Build, Debug, And Detect Fraud In A Content Moderation Analytics System Using Advanced SAS Programming With Intentional Errors And Corrections?

Detect Fraud In A Content Moderation Analytics System Using Advanced SAS

--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

HERE IN THIS PROJECT WE USED THESE SAS STATEMENTS —
DATA STEP |  PROC PRINT | MACROS  FREQ | PROC APPEND | SET | PROC SORT | MERGE | PROC DATASETS DELETE | DATA FUNCTIONS

--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Introduction

In today’s digital era, content moderation plays a critical role in maintaining platform integrity, user trust, and regulatory compliance. Social media platforms, video-sharing websites, and online forums process millions of user-generated posts daily. With increasing volumes, platforms rely on automated moderation systems combined with reviewers.

However, analytics systems supporting moderation operations can contain data inconsistencies, fraud indicators, incorrect calculations, or programming errors. If not identified and corrected, these issues can lead to inaccurate reporting, unfair content blocking, increased operational costs, and regulatory risks.

In this project, we will:

  • Create a Content Moderation dataset
  • Introduce intentional programming errors
  • Identify and fix those errors
  • Apply SAS Macros for fraud detection logic
  • Use:
    • Date functions: MDY, INTCK, INTNX
    • Data combination: SET, MERGE, APPEND
    • PROC TRANSPOSE
    • Numeric & character functions
    • PROC DATASETS DELETE
    • Utilization calculations
    • Classification logic
  • Provide full explanation for every step

This explanation is written in a simple style so that even beginners can understand.

Table Of Contents

  1. Business Context
  2. Variables Definition
  3. Step 1 – Create Dataset (With Intentional Errors)
  4. Identify And Explain Errors
  5. Corrected Full-Length Dataset Code
  6. Character Functions Usage
  7. Numeric Functions Usage
  8. Date Functions (MDY, INTCK, INTNX)
  9. Utilization Calculation
  10. Fraud Detection Macro
  11. SET vs MERGE vs APPEND
  12. PROC TRANSPOSE
  13. PROC DATASETS DELETE
  14. Final Clean Dataset
  15. Business Insights
  16. 20 Key Points About This Project
  17. Summary
  18. Conclusion

Business Context

Large content platforms like:

  • Facebook
  • YouTube
  • Instagram
  • X

receive:

  • Millions of user posts
  • Spam
  • Hate speech
  • Fraud attempts
  • Fake reporting manipulation

Analytics teams monitor:

  • Reports count
  • False positive rate
  • Moderator workload
  • Accuracy score
  • Review time
  • Fraud signals

Our system will simulate this scenario.

Variables In Our Dataset

Variable

Type

Description

Platform

Character

Social platform name

Content_Type

Character

Type of content

Reports_Count

Numeric

Number of reports

Review_Time

Numeric

Time in minutes

False_Positive_Rate

Numeric

% incorrect flags

Moderator_Load

Numeric

Cases handled

Accuracy_Score

Numeric

% correct decisions

Review_Date

Date

Date reviewed

Utilization

Numeric

Load efficiency

Risk_Level

Character

Fraud classification

1. INTENTIONALLY WRONG CODE

data content_raw;

input Platform:$12. Content_Type $ Reports_Count Review_Time False_Positive_Rate 

      Moderator_Load Accuracy_Score Month $ Day Year;

format Review_Date date9.;

Review_Date = mdy(Month, Day, Year);

Utilization = Moderator_Load/Reports_Count*100;

length Risk_Level $8.;

if Accuracy_Score > 90 then Risk_Level = "Low";

else if False_Positive_Rate > 20 then Risk_Level = "High";

datalines;

Facebook Text 120 15 5 200 96 1 12 2026

YouTube Video 300 25 10 400 92 2 15 2026

Instagram Image 50 10 25 60 85 3 10 2026

X Audio 500 40 30 600 70 4 5 2026

;

run;

proc print data=content_raw;

run;

OUTPUT:

ObsPlatformContent_TypeReports_CountReview_TimeFalse_Positive_RateModerator_LoadAccuracy_ScoreMonthDayYearReview_DateUtilizationRisk_Level
1FacebookText12015520096112202612JAN2026166.667Low
2YouTubeVideo300251040092215202615FEB2026133.333Low
3InstagramImage5010256085310202610MAR2026120.000High
4XAudio50040306007045202605APR2026120.000High

ERRORS IN ABOVE CODE

Error 1: Month defined as Character

Month $ is character
But MDY() expects numeric values

Error 2: Utilization Division Risk

If Reports_Count = 0 → division by zero.

Error 3: Risk_Level incomplete logic

If Accuracy < 95 and False_Positive <= 20 → missing category.

Error 4: No LENGTH statement

Character variables may get truncated.

2. CORRECTED FULL-LENGTH DATASET CODE

data content_moderation;

length Platform $20 Content_Type $20 Risk_Level $15 Classification $20;

input Platform $ Content_Type $ Reports_Count Review_Time  False_Positive_Rate 

      Moderator_Load Accuracy_Score Month Day Year;

format Review_Date date9. Next_Review date9.;

Review_Date = mdy(Month, Day, Year);

/* Date Calculations */

Review_Age_Days = intck('day', Review_Date, today());

Next_Review = intnx('month', Review_Date, 1, 'same');

/* Character Cleaning */

Platform = propcase(strip(Platform));

Content_Type = upcase(trim(Content_Type));

/* Utilization Calculation */

if Reports_Count > 0 then 

    Utilization = (Moderator_Load / Reports_Count) * 100;

else Utilization = .;

/* Risk Classification */

if Accuracy_Score >= 95 and False_Positive_Rate < 10 then 

    Risk_Level = "Low";

else if 85 < Accuracy_Score < 94 then 

    Risk_Level = "Medium";

else Risk_Level = "High";

/* Fraud Classification */

if Reports_Count > 400 and Accuracy_Score < 80 then 

    Classification = "Potential Fraud";

else Classification = "Normal";

datalines;

Facebook Text 120 15 5 200 96 1 12 2026

YouTube Video 300 25 10 400 92 2 15 2026

Instagram Image 50 10 25 60 85 3 10 2026

X TEXT 500 40 30 600 70 4 5 2026

LinkedIn Text 80 20 8 100 90 5 8 2026

Reddit Text 450 35 28 500 75 6 14 2026

Snapchat Image 200 18 12 220 88 7 21 2026

TikTok Video 600 45 35 700 65 8 2 2026

Pinterest Image 150 12 7 170 93 9 18 2026

Quora Text 90 14 5 110 97 10 9 2026

Threads Text 250 22 15 300 89 11 11 2026

Discord IMAGE 310 28 18 350 87 12 4 2026

Telegram Text 420 30 20 450 82 1 25 2026

Medium Text 60 9 4 70 98 2 19 2026

Tumblr Image 180 16 11 210 91 3 30 2026

;

run;

proc print data=content_moderation;

run;

OUTPUT:

ObsPlatformContent_TypeRisk_LevelClassificationReports_CountReview_TimeFalse_Positive_RateModerator_LoadAccuracy_ScoreMonthDayYearReview_DateNext_ReviewReview_Age_DaysUtilization
1FacebookTEXTLowNormal12015520096112202612JAN202612FEB202648166.667
2YoutubeVIDEOMediumNormal300251040092215202615FEB202615MAR202614133.333
3InstagramIMAGEHighNormal5010256085310202610MAR202610APR2026-9120.000
4XTEXTHighPotential Fraud50040306007045202605APR202605MAY2026-35120.000
5LinkedinTEXTMediumNormal802081009058202608MAY202608JUN2026-68125.000
6RedditTEXTHighPotential Fraud450352850075614202614JUN202614JUL2026-105111.111
7SnapchatIMAGEMediumNormal200181222088721202621JUL202621AUG2026-142110.000
8TiktokVIDEOHighPotential Fraud60045357006582202602AUG202602SEP2026-154116.667
9PinterestIMAGEMediumNormal15012717093918202618SEP202618OCT2026-201113.333
10QuoraTEXTLowNormal9014511097109202609OCT202609NOV2026-222122.222
11ThreadsTEXTMediumNormal2502215300891111202611NOV202611DEC2026-255120.000
12DiscordIMAGEMediumNormal310281835087124202604DEC202604JAN2027-278112.903
13TelegramTEXTHighNormal420302045082125202625JAN202625FEB202635107.143
14MediumTEXTLowNormal60947098219202619FEB202619MAR202610116.667
15TumblrIMAGEMediumNormal180161121091330202630MAR202630APR2026-29116.667

EXPLANATION OF EACH CODE SECTION

LENGTH Statement

Prevents truncation.
Important in production datasets.

MDY Function

Creates SAS date.
Needed for:

·       Time analysis

·       Aging reports

·       SLA tracking

INTCK

Counts difference in days.

Used for:

·       Pending case monitoring

·       KPI tracking

INTNX

Calculates next review schedule.

Used in:

·       Recurring moderation audits

Character Functions

Function

Purpose

strip()

Removes leading/trailing blanks

trim()

Removes trailing blanks

propcase()

Proper case formatting

upcase()

Convert to uppercase

lowcase()

Convert to lowercase

cat()

Concatenate without separator

catx()

Concatenate with separator

coalesce()

First non-missing value

Numeric Functions

·       ROUND

·       SUM

·       MEAN

·       MAX

·       MIN

·       COALESCE

3. MACRO FOR FRAUD DETECTION

%macro fraud_check(data=, threshold=400);

data fraud_flagged;

set &data;

if Reports_Count > &threshold and Accuracy_Score < 80 then Fraud_Flag = 1;

else Fraud_Flag = 0;

run;

proc print data=fraud_flagged;

run;

%mend;


%fraud_check(data=content_moderation, threshold=400);

OUTPUT:

ObsPlatformContent_TypeRisk_LevelClassificationReports_CountReview_TimeFalse_Positive_RateModerator_LoadAccuracy_ScoreMonthDayYearReview_DateNext_ReviewReview_Age_DaysUtilizationFraud_Flag
1FacebookTEXTLowNormal12015520096112202612JAN202612FEB202648166.6670
2YoutubeVIDEOMediumNormal300251040092215202615FEB202615MAR202614133.3330
3InstagramIMAGEHighNormal5010256085310202610MAR202610APR2026-9120.0000
4XTEXTHighPotential Fraud50040306007045202605APR202605MAY2026-35120.0001
5LinkedinTEXTMediumNormal802081009058202608MAY202608JUN2026-68125.0000
6RedditTEXTHighPotential Fraud450352850075614202614JUN202614JUL2026-105111.1111
7SnapchatIMAGEMediumNormal200181222088721202621JUL202621AUG2026-142110.0000
8TiktokVIDEOHighPotential Fraud60045357006582202602AUG202602SEP2026-154116.6671
9PinterestIMAGEMediumNormal15012717093918202618SEP202618OCT2026-201113.3330
10QuoraTEXTLowNormal9014511097109202609OCT202609NOV2026-222122.2220
11ThreadsTEXTMediumNormal2502215300891111202611NOV202611DEC2026-255120.0000
12DiscordIMAGEMediumNormal310281835087124202604DEC202604JAN2027-278112.9030
13TelegramTEXTHighNormal420302045082125202625JAN202625FEB202635107.1430
14MediumTEXTLowNormal60947098219202619FEB202619MAR202610116.6670
15TumblrIMAGEMediumNormal180161121091330202630MAR202630APR2026-29116.6670

·  Reusable

·  Parameter driven

·  Scalable

4. SET Statement

data combined;

set content_moderation

    fraud_flagged;

run;

proc print data=combined;

run;

OUTPUT:

ObsPlatformContent_TypeRisk_LevelClassificationReports_CountReview_TimeFalse_Positive_RateModerator_LoadAccuracy_ScoreMonthDayYearReview_DateNext_ReviewReview_Age_DaysUtilizationFraud_Flag
1FacebookTEXTLowNormal12015520096112202612JAN202612FEB202648166.667.
2YoutubeVIDEOMediumNormal300251040092215202615FEB202615MAR202614133.333.
3InstagramIMAGEHighNormal5010256085310202610MAR202610APR2026-9120.000.
4XTEXTHighPotential Fraud50040306007045202605APR202605MAY2026-35120.000.
5LinkedinTEXTMediumNormal802081009058202608MAY202608JUN2026-68125.000.
6RedditTEXTHighPotential Fraud450352850075614202614JUN202614JUL2026-105111.111.
7SnapchatIMAGEMediumNormal200181222088721202621JUL202621AUG2026-142110.000.
8TiktokVIDEOHighPotential Fraud60045357006582202602AUG202602SEP2026-154116.667.
9PinterestIMAGEMediumNormal15012717093918202618SEP202618OCT2026-201113.333.
10QuoraTEXTLowNormal9014511097109202609OCT202609NOV2026-222122.222.
11ThreadsTEXTMediumNormal2502215300891111202611NOV202611DEC2026-255120.000.
12DiscordIMAGEMediumNormal310281835087124202604DEC202604JAN2027-278112.903.
13TelegramTEXTHighNormal420302045082125202625JAN202625FEB202635107.143.
14MediumTEXTLowNormal60947098219202619FEB202619MAR202610116.667.
15TumblrIMAGEMediumNormal180161121091330202630MAR202630APR2026-29116.667.
16FacebookTEXTLowNormal12015520096112202612JAN202612FEB202648166.6670
17YoutubeVIDEOMediumNormal300251040092215202615FEB202615MAR202614133.3330
18InstagramIMAGEHighNormal5010256085310202610MAR202610APR2026-9120.0000
19XTEXTHighPotential Fraud50040306007045202605APR202605MAY2026-35120.0001
20LinkedinTEXTMediumNormal802081009058202608MAY202608JUN2026-68125.0000
21RedditTEXTHighPotential Fraud450352850075614202614JUN202614JUL2026-105111.1111
22SnapchatIMAGEMediumNormal200181222088721202621JUL202621AUG2026-142110.0000
23TiktokVIDEOHighPotential Fraud60045357006582202602AUG202602SEP2026-154116.6671
24PinterestIMAGEMediumNormal15012717093918202618SEP202618OCT2026-201113.3330
25QuoraTEXTLowNormal9014511097109202609OCT202609NOV2026-222122.2220
26ThreadsTEXTMediumNormal2502215300891111202611NOV202611DEC2026-255120.0000
27DiscordIMAGEMediumNormal310281835087124202604DEC202604JAN2027-278112.9030
28TelegramTEXTHighNormal420302045082125202625JAN202625FEB202635107.1430
29MediumTEXTLowNormal60947098219202619FEB202619MAR202610116.6670
30TumblrIMAGEMediumNormal180161121091330202630MAR202630APR2026-29116.6670

Used to stack datasets vertically.

5. MERGE Statement

proc sort data=content_moderation;by platform;run;

proc print data=content_moderation;

run;

OUTPUT:

ObsPlatformContent_TypeRisk_LevelClassificationReports_CountReview_TimeFalse_Positive_RateModerator_LoadAccuracy_ScoreMonthDayYearReview_DateNext_ReviewReview_Age_DaysUtilization
1DiscordIMAGEMediumNormal310281835087124202604DEC202604JAN2027-278112.903
2FacebookTEXTLowNormal12015520096112202612JAN202612FEB202648166.667
3InstagramIMAGEHighNormal5010256085310202610MAR202610APR2026-9120.000
4LinkedinTEXTMediumNormal802081009058202608MAY202608JUN2026-68125.000
5MediumTEXTLowNormal60947098219202619FEB202619MAR202610116.667
6PinterestIMAGEMediumNormal15012717093918202618SEP202618OCT2026-201113.333
7QuoraTEXTLowNormal9014511097109202609OCT202609NOV2026-222122.222
8RedditTEXTHighPotential Fraud450352850075614202614JUN202614JUL2026-105111.111
9SnapchatIMAGEMediumNormal200181222088721202621JUL202621AUG2026-142110.000
10TelegramTEXTHighNormal420302045082125202625JAN202625FEB202635107.143
11ThreadsTEXTMediumNormal2502215300891111202611NOV202611DEC2026-255120.000
12TiktokVIDEOHighPotential Fraud60045357006582202602AUG202602SEP2026-154116.667
13TumblrIMAGEMediumNormal180161121091330202630MAR202630APR2026-29116.667
14XTEXTHighPotential Fraud50040306007045202605APR202605MAY2026-35120.000
15YoutubeVIDEOMediumNormal300251040092215202615FEB202615MAR202614133.333

proc sort data=fraud_flagged;by platform;run;

proc print data=fraud_flagged;

run;

OUTPUT:

ObsPlatformContent_TypeRisk_LevelClassificationReports_CountReview_TimeFalse_Positive_RateModerator_LoadAccuracy_ScoreMonthDayYearReview_DateNext_ReviewReview_Age_DaysUtilizationFraud_Flag
1DiscordIMAGEMediumNormal310281835087124202604DEC202604JAN2027-278112.9030
2FacebookTEXTLowNormal12015520096112202612JAN202612FEB202648166.6670
3InstagramIMAGEHighNormal5010256085310202610MAR202610APR2026-9120.0000
4LinkedinTEXTMediumNormal802081009058202608MAY202608JUN2026-68125.0000
5MediumTEXTLowNormal60947098219202619FEB202619MAR202610116.6670
6PinterestIMAGEMediumNormal15012717093918202618SEP202618OCT2026-201113.3330
7QuoraTEXTLowNormal9014511097109202609OCT202609NOV2026-222122.2220
8RedditTEXTHighPotential Fraud450352850075614202614JUN202614JUL2026-105111.1111
9SnapchatIMAGEMediumNormal200181222088721202621JUL202621AUG2026-142110.0000
10TelegramTEXTHighNormal420302045082125202625JAN202625FEB202635107.1430
11ThreadsTEXTMediumNormal2502215300891111202611NOV202611DEC2026-255120.0000
12TiktokVIDEOHighPotential Fraud60045357006582202602AUG202602SEP2026-154116.6671
13TumblrIMAGEMediumNormal180161121091330202630MAR202630APR2026-29116.6670
14XTEXTHighPotential Fraud50040306007045202605APR202605MAY2026-35120.0001
15YoutubeVIDEOMediumNormal300251040092215202615FEB202615MAR202614133.3330

data merged_data;

merge content_moderation 

      fraud_flagged;

by Platform;

run;

proc print data=merged_data;

run;

OUTPUT:

ObsPlatformContent_TypeRisk_LevelClassificationReports_CountReview_TimeFalse_Positive_RateModerator_LoadAccuracy_ScoreMonthDayYearReview_DateNext_ReviewReview_Age_DaysUtilizationFraud_Flag
1DiscordIMAGEMediumNormal310281835087124202604DEC202604JAN2027-278112.9030
2FacebookTEXTLowNormal12015520096112202612JAN202612FEB202648166.6670
3InstagramIMAGEHighNormal5010256085310202610MAR202610APR2026-9120.0000
4LinkedinTEXTMediumNormal802081009058202608MAY202608JUN2026-68125.0000
5MediumTEXTLowNormal60947098219202619FEB202619MAR202610116.6670
6PinterestIMAGEMediumNormal15012717093918202618SEP202618OCT2026-201113.3330
7QuoraTEXTLowNormal9014511097109202609OCT202609NOV2026-222122.2220
8RedditTEXTHighPotential Fraud450352850075614202614JUN202614JUL2026-105111.1111
9SnapchatIMAGEMediumNormal200181222088721202621JUL202621AUG2026-142110.0000
10TelegramTEXTHighNormal420302045082125202625JAN202625FEB202635107.1430
11ThreadsTEXTMediumNormal2502215300891111202611NOV202611DEC2026-255120.0000
12TiktokVIDEOHighPotential Fraud60045357006582202602AUG202602SEP2026-154116.6671
13TumblrIMAGEMediumNormal180161121091330202630MAR202630APR2026-29116.6670
14XTEXTHighPotential Fraud50040306007045202605APR202605MAY2026-35120.0001
15YoutubeVIDEOMediumNormal300251040092215202615FEB202615MAR202614133.3330

6. PROC APPEND

proc append base=content_moderation 

            data=fraud_flagged force;

run;

proc print data=content_moderation;

run;

OUTPUT:

ObsPlatformContent_TypeRisk_LevelClassificationReports_CountReview_TimeFalse_Positive_RateModerator_LoadAccuracy_ScoreMonthDayYearReview_DateNext_ReviewReview_Age_DaysUtilization
1DiscordIMAGEMediumNormal310281835087124202604DEC202604JAN2027-278112.903
2FacebookTEXTLowNormal12015520096112202612JAN202612FEB202648166.667
3InstagramIMAGEHighNormal5010256085310202610MAR202610APR2026-9120.000
4LinkedinTEXTMediumNormal802081009058202608MAY202608JUN2026-68125.000
5MediumTEXTLowNormal60947098219202619FEB202619MAR202610116.667
6PinterestIMAGEMediumNormal15012717093918202618SEP202618OCT2026-201113.333
7QuoraTEXTLowNormal9014511097109202609OCT202609NOV2026-222122.222
8RedditTEXTHighPotential Fraud450352850075614202614JUN202614JUL2026-105111.111
9SnapchatIMAGEMediumNormal200181222088721202621JUL202621AUG2026-142110.000
10TelegramTEXTHighNormal420302045082125202625JAN202625FEB202635107.143
11ThreadsTEXTMediumNormal2502215300891111202611NOV202611DEC2026-255120.000
12TiktokVIDEOHighPotential Fraud60045357006582202602AUG202602SEP2026-154116.667
13TumblrIMAGEMediumNormal180161121091330202630MAR202630APR2026-29116.667
14XTEXTHighPotential Fraud50040306007045202605APR202605MAY2026-35120.000
15YoutubeVIDEOMediumNormal300251040092215202615FEB202615MAR202614133.333
16DiscordIMAGEMediumNormal310281835087124202604DEC202604JAN2027-278112.903
17FacebookTEXTLowNormal12015520096112202612JAN202612FEB202648166.667
18InstagramIMAGEHighNormal5010256085310202610MAR202610APR2026-9120.000
19LinkedinTEXTMediumNormal802081009058202608MAY202608JUN2026-68125.000
20MediumTEXTLowNormal60947098219202619FEB202619MAR202610116.667
21PinterestIMAGEMediumNormal15012717093918202618SEP202618OCT2026-201113.333
22QuoraTEXTLowNormal9014511097109202609OCT202609NOV2026-222122.222
23RedditTEXTHighPotential Fraud450352850075614202614JUN202614JUL2026-105111.111
24SnapchatIMAGEMediumNormal200181222088721202621JUL202621AUG2026-142110.000
25TelegramTEXTHighNormal420302045082125202625JAN202625FEB202635107.143
26ThreadsTEXTMediumNormal2502215300891111202611NOV202611DEC2026-255120.000
27TiktokVIDEOHighPotential Fraud60045357006582202602AUG202602SEP2026-154116.667
28TumblrIMAGEMediumNormal180161121091330202630MAR202630APR2026-29116.667
29XTEXTHighPotential Fraud50040306007045202605APR202605MAY2026-35120.000
30YoutubeVIDEOMediumNormal300251040092215202615FEB202615MAR202614133.333

Efficiently adds dataset.

7. PROC TRANSPOSE

proc transpose data=content_moderation out=transposed_data;

by Platform NotSorted;

var Reports_Count Accuracy_Score;

run;

proc print data=transposed_data;

run;

OUTPUT:

ObsPlatform_NAME_COL1
1DiscordReports_Count310
2DiscordAccuracy_Score87
3FacebookReports_Count120
4FacebookAccuracy_Score96
5InstagramReports_Count50
6InstagramAccuracy_Score85
7LinkedinReports_Count80
8LinkedinAccuracy_Score90
9MediumReports_Count60
10MediumAccuracy_Score98
11PinterestReports_Count150
12PinterestAccuracy_Score93
13QuoraReports_Count90
14QuoraAccuracy_Score97
15RedditReports_Count450
16RedditAccuracy_Score75
17SnapchatReports_Count200
18SnapchatAccuracy_Score88
19TelegramReports_Count420
20TelegramAccuracy_Score82
21ThreadsReports_Count250
22ThreadsAccuracy_Score89
23TiktokReports_Count600
24TiktokAccuracy_Score65
25TumblrReports_Count180
26TumblrAccuracy_Score91
27XReports_Count500
28XAccuracy_Score70
29YoutubeReports_Count300
30YoutubeAccuracy_Score92
31DiscordReports_Count310
32DiscordAccuracy_Score87
33FacebookReports_Count120
34FacebookAccuracy_Score96
35InstagramReports_Count50
36InstagramAccuracy_Score85
37LinkedinReports_Count80
38LinkedinAccuracy_Score90
39MediumReports_Count60
40MediumAccuracy_Score98
41PinterestReports_Count150
42PinterestAccuracy_Score93
43QuoraReports_Count90
44QuoraAccuracy_Score97
45RedditReports_Count450
46RedditAccuracy_Score75
47SnapchatReports_Count200
48SnapchatAccuracy_Score88
49TelegramReports_Count420
50TelegramAccuracy_Score82
51ThreadsReports_Count250
52ThreadsAccuracy_Score89
53TiktokReports_Count600
54TiktokAccuracy_Score65
55TumblrReports_Count180
56TumblrAccuracy_Score91
57XReports_Count500
58XAccuracy_Score70
59YoutubeReports_Count300
60YoutubeAccuracy_Score92
Convert rows to columns.

8. PROC DATASETS DELETE

proc datasets library=work;

delete content_raw transposed_data;

quit;

LOG:

NOTE: Deleting WORK.CONTENT_RAW (memtype=DATA).
NOTE: Deleting WORK.TRANSPOSED_DATA (memtype=DATA).

Deletes unwanted datasets.

Used to clean workspace.

UTILIZATION ANALYSIS

Utilization = Moderator_Load / Reports_Count * 100

100% → overload

80–100% → optimal

<50% → underutilized

FRAUD DETECTION LOGIC EXPLAINED

Fraud signals include:

1.     High reports

2.     Low accuracy

3.     High false positive rate

4.     Abnormal utilization

5.     Repeated review spikes

Macro automates this detection.

 

FINAL CLEAN DATASET READY FOR:

·  KPI dashboards

·  SLA monitoring

·  Fraud analytics

·  Moderator workload balancing

·  Monthly trend analysis


BUSINESS INSIGHTS FROM THIS PROJECT

·  Identify overloaded moderators

·  Detect fake reporting attacks

·  Improve accuracy tracking

·  Monitor false positives

·  Optimize staffing

·  Schedule next review automatically

·  Standardize data cleaning

·  Automate fraud classification

·  Improve reporting transparency

·  Reduce compliance risk

 20 Key Points About This Project

·  This project simulates a real-world content moderation analytics system using structured SAS programming techniques.

·  It creates a dataset with more than 15 observations including operational, quality, and fraud-related variables.

·  The dataset includes key metrics such as Reports_Count, Review_Time, False_Positive_Rate, Moderator_Load, and Accuracy_Score.

·  Date variables are generated using the MDY function to ensure proper SAS date formatting.

·  Time-based monitoring is performed using INTCK to calculate review aging in days.

·  Future review scheduling is automated using the INTNX function.

·  Character standardization is implemented using STRIP, TRIM, PROPERCASE, UPCASE, and LOWCASE functions.

·  Numeric stability is ensured by handling division-by-zero scenarios in utilization calculations.

·  Moderator utilization percentage is calculated to measure workload efficiency.

·  Risk classification logic is applied based on accuracy and false positive thresholds.

·  A macro-driven fraud detection rule identifies suspicious patterns dynamically.

·  The macro is parameterized to allow reusable fraud threshold tuning.

·  SET statements are used for vertical stacking of datasets.

·  MERGE statements are applied to combine datasets based on common keys.

·  PROC APPEND is utilized for efficient dataset extension without full rewrite.

·  PROC TRANSPOSE is used to restructure metrics for analytical flexibility.

·  PROC DATASETS DELETE cleans temporary datasets to maintain workspace efficiency.

·  Intentional coding errors are introduced and corrected to demonstrate debugging capability.

·  The project demonstrates both operational analytics and fraud monitoring integration.

·  The final output produces a clean, standardized, scalable moderation analytics dataset ready for reporting and decision-making.

Summary:

This project demonstrates how to design, debug, and optimize a Content Moderation Analytics and Fraud Detection system using SAS in a practical and structured way. We began by creating a dataset with more than 15 observations that represents real-world moderation metrics such as reports count, review time, false positive rate, moderator workload, and accuracy score. Intentional programming errors were introduced to simulate real development challenges, and each error was carefully identified and corrected.

We applied important SAS concepts including MDY for date creation, INTCK and INTNX for time calculations, character functions like STRIP and PROPCASE for data cleaning, and numeric logic for utilization calculations. A reusable macro was developed to detect potential fraud based on configurable thresholds. Additionally, we demonstrated dataset management techniques such as SET, MERGE, APPEND, PROC TRANSPOSE, and PROC DATASETS DELETE.

Conclusion:

In conclusion, this project reflects a real production-level analytics workflow. It integrates data cleaning, classification, fraud detection, and performance monitoring into a single structured system. Such an approach improves operational efficiency, ensures data accuracy, and helps platforms make informed decisions while maintaining trust and compliance.

Here successfully built a complete Content Moderation Analytics System in SAS while intentionally introducing and correcting multiple programming errors.

We demonstrated:

  • Dataset creation with 15+ observations
  • Error debugging
  • Date functions (MDY, INTCK, INTNX)
  • Character & numeric functions
  • Utilization metrics
  • Fraud detection macro
  • SET, MERGE, APPEND
  • PROC TRANSPOSE
  • PROC DATASETS DELETE

This project simulates a real-world analytics workflow similar to those used by global platforms like Facebook, YouTube, Instagram, and X.


SAS INTERVIEW QUESTIONS

1.Difference between COALESCE and CASE?

2.How do you delete dataset?

3.How do you count records?

--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

About the Author:

SAS Learning Hub is a data analytics and SAS programming platform focused on clinical, financial, and real-world data analysis. The content is created by professionals with academic training in Pharmaceutics and hands-on experience in Base SAS, PROC SQL, Macros, SDTM, and ADaM, providing practical and industry-relevant SAS learning resources.


Disclaimer:

The datasets and analysis in this article are created for educational and demonstration purposes only. They do not represent CONTENT MODERATION data.


Our Mission:

This blog provides industry-focused SAS programming tutorials and analytics projects covering finance, healthcare, and technology.


This project is suitable for:

·  Students learning SAS

·  Data analysts building portfolios

·  Professionals preparing for SAS interviews

·  Bloggers writing about analytics 

·  Clinical SAS Programmer

·  Research Data Analyst

·  Regulatory Data Validator


--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Follow Us On : 


 
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

--->Follow our blog for more SAS-based analytics projects and industry data models.

---> Support Us By Following Our Blog..

To deepen your understanding of SAS analytics, please refer to our other data science and industry-focused projects listed below:

1.Which Country Truly Dominates the Olympics? – A Complete SAS Medal Efficiency Analytics Project

2.Which Airports Are Really the Busiest? – An End-to-End SAS Airport Traffic Analytics Project

3.Can Data Predict Election Outcomes? – A Complete SAS Voting Analytics Project

--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

About Us | Contact Privacy Policy



Comments

Popular posts from this blog

412.Can We Build And Clean A University Course Analytics & Fraud Detection System In Sas Using Only Macros While Intentionally Creating And Fixing Errors?

409.Can We Build a Reliable Emergency Services Analytics & Fraud Detection System in SAS While Identifying and Fixing Intentional Errors?

397.If a satellite has excellent signal strength but very high latency, can it still deliver good quality communication? Why or why not?A Sas Study