403.PROC FREQ Says Fraud Is High — Is It Real or a Data Issue? How Do You Validate It?

PROC FREQ Says Fraud Is High — Is It Real or a Data Issue? How Do You Validate It?

TABLE OF CONTENTS

1.     Introduction

2.     Business Context of Online Exam Proctoring

3.     Project Objectives and Analytical Scope

4.     Dataset Design and Variable Rationale

5.     Raw Data Creation with Intentional Data Issue

6.     Identification and Explanation of the Data Error

7.     Data Correction and Standardization

8.     Derived Variables and Date Handling Logic (MDY, INTCK, INTNX)

9.     Integrity and Utilization Classification Using SAS Macros

10.  Fraud Detection Rules and Flagging Logic (Macros)

11.  Descriptive Analytics Using PROC SQL

12.  Statistical Summary Using PROC MEANS and PROC UNIVARIATE

13.  Categorical Analysis Using PROC FREQ

14.  Relationship Analysis Using PROC CORR

15.  Data Visualization Using PROC SGPLOT

16.  Advanced Data Handling Using SET, MERGE, APPEND, and TRANSPOSE

17.  Character and Numeric Function Applications in SAS

18.  Dataset Maintenance and Cleanup Using PROC DATASETS

19.  TLF Overview – Reporting Strategy

20.  Table Outputs – Summary and Performance Tables

21.  Listing Outputs – Fraud and Low Integrity Session Listings

22.  Figure Outputs – Score, Violation, and Integrity Visualizations

23.  5 Key Points Of This Project

24.  Final Conclusion

1. INTRODUCTION

Online examinations have become the backbone of modern education, recruitment, and certification processes.
With this shift, online exam proctoring systems play a critical role in maintaining exam integrity, candidate authenticity, and fair evaluation.

This SAS project simulates real-world proctoring session data and applies advanced analytics, fraud detection rules, and quality control logic using Base SAS + PROC SQL + Macros.

2. BUSINESS CONTEXT

Organizations using online exams face challenges such as:

·       Cheating attempts

·       Network manipulation

·       Multiple violations during exams

·       Low integrity candidates passing exams

·       Inconsistent proctoring enforcement

This project helps:

·       EdTech companies

·       Certification bodies

·       Universities

·       Recruitment platforms

to monitor, classify, and detect suspicious exam behavior using SAS.

3. PROJECT OBJECTIVES

1.Create a realistic proctoring dataset

2. Apply fraud detection logic

3. Classify integrity & utilization

4. Analyze relationships between violations and scores

5. Demonstrate interview-level SAS skills

4.DATASET DESIGN

Core Variables

Variable

Description

Session_ID

Unique proctoring session

Exam_Type

Certification / University / Recruitment

Exam_Date

Exam conducted date

Duration

Exam duration (minutes)

Violations_Detected

Count of suspicious actions

Network_Issues

Network interruptions

Candidate_Score

Final score

Integrity_Level

Derived category

Integrity_Percentage

Calculated metric

Fraud_Flag

Yes/No

Utilization_Class

Low / Medium / High


5. RAW DATA CREATION(INTENTIONAL ERROR INCLUDED)

data proctor_raw;

    length Session_ID $6 Exam_Type $15 Exam_Date $10;

    input Session_ID $ Exam_Type $ Exam_Date $ Duration Violations_Detected

          Network_Issues Candidate_Score;

datalines;

S001 CERT 01-10-2026 120 0 1 85

S002 CERT 05-10-2026 90 2 3 65

S003 UNI 10-10-2026 180 5 6 40

S004 UNI 15-10-2026 150 1 2 78

S005 RECRUIT 20-10-2026 60 0 0 92

S006 RECRUIT 22-10-2026 75 4 5 55

S007 CERT 25-10-2026 120 3 2 68

S008 UNI 28-10-2026 180 6 8 30

S009 CERT 30-10-2026 90 1 1 80

S010 RECRUIT 02-11-2026 60 2 4 60

S011 UNI 05-11-2026 150 7 9 25

S012 CERT 08-11-2026 120 0 0 88

S013 RECRUIT 10-11-2026 90 1 1 82

;

run;

proc print data=proctor_raw;

run;

OUTPUT:

ObsSession_IDExam_TypeExam_DateDurationViolations_DetectedNetwork_IssuesCandidate_Score
1S001CERT01-10-20261200185
2S002CERT05-10-2026902365
3S003UNI10-10-20261805640
4S004UNI15-10-20261501278
5S005RECRUIT20-10-2026600092
6S006RECRUIT22-10-2026754555
7S007CERT25-10-20261203268
8S008UNI28-10-20261806830
9S009CERT30-10-2026901180
10S010RECRUIT02-11-2026602460
11S011UNI05-11-20261507925
12S012CERT08-11-20261200088
13S013RECRUIT10-11-2026901182

6. IDENTIFYING THE ERROR

ERROR EXPLANATION

·  Exam_Date is character, not numeric SAS date

·  Functions like INTCK, INTNX, MDY will fail

·  Date calculations will produce errors or missing values

7. CORRECTING THE ERROR (PROPER DATE HANDLING)

data proctor_clean;

    set proctor_raw;

    Exam_Date_Num = input(Exam_Date, ddmmyy10.);

    format Exam_Date_Num date9.;

    drop Exam_Date;

    rename Exam_Date_Num = Exam_Date;

run;

proc print data=proctor_clean;

run;

OUTPUT:

ObsSession_IDExam_TypeDurationViolations_DetectedNetwork_IssuesCandidate_ScoreExam_Date
1S001CERT120018501OCT2026
2S002CERT90236505OCT2026
3S003UNI180564010OCT2026
4S004UNI150127815OCT2026
5S005RECRUIT60009220OCT2026
6S006RECRUIT75455522OCT2026
7S007CERT120326825OCT2026
8S008UNI180683028OCT2026
9S009CERT90118030OCT2026
10S010RECRUIT60246002NOV2026
11S011UNI150792505NOV2026
12S012CERT120008808NOV2026
13S013RECRUIT90118210NOV2026

Why this works

·  Converts character → numeric date

·  Enables date arithmetic

·  Best practice in clinical & analytics domains

8. DERIVED VARIABLES & DATE LOGIC

data proctor_dates;

    set proctor_clean;

    Exam_Month = intnx('month', Exam_Date, 0, 'b');

    Days_From_Today = intck('day', Exam_Date, today());

    format Exam_Month monyy7.;

run;

proc print data=proctor_dates;

run;

OUTPUT:

ObsSession_IDExam_TypeDurationViolations_DetectedNetwork_IssuesCandidate_ScoreExam_DateExam_MonthDays_From_Today
1S001CERT120018501OCT2026OCT2026-226
2S002CERT90236505OCT2026OCT2026-230
3S003UNI180564010OCT2026OCT2026-235
4S004UNI150127815OCT2026OCT2026-240
5S005RECRUIT60009220OCT2026OCT2026-245
6S006RECRUIT75455522OCT2026OCT2026-247
7S007CERT120326825OCT2026OCT2026-250
8S008UNI180683028OCT2026OCT2026-253
9S009CERT90118030OCT2026OCT2026-255
10S010RECRUIT60246002NOV2026NOV2026-258
11S011UNI150792505NOV2026NOV2026-261
12S012CERT120008808NOV2026NOV2026-264
13S013RECRUIT90118210NOV2026NOV2026-266

9. INTEGRITY & UTILIZATION CLASSIFICATION (MACRO)

%macro classify;

data proctor_class;

    set proctor_dates;

    Integrity_Percentage = max(0, 100 - (Violations_Detected*10) - (Network_Issues*5));

    length Integrity_Level $8.;

    if Integrity_Percentage >= 80 then Integrity_Level = "HIGH";

    else if Integrity_Percentage >= 50 then Integrity_Level = "MEDIUM";

    else Integrity_Level = "LOW";

    length Utilization_Class $8.;

    if Duration >= 150 then Utilization_Class = "HIGH";

    else if Duration >= 90 then Utilization_Class = "MEDIUM";

    else Utilization_Class = "LOW";

run;

proc print data=proctor_class;

run;

%mend;


%classify;

OUTPUT:

ObsSession_IDExam_TypeDurationViolations_DetectedNetwork_IssuesCandidate_ScoreExam_DateExam_MonthDays_From_TodayIntegrity_PercentageIntegrity_LevelUtilization_Class
1S001CERT120018501OCT2026OCT2026-22695HIGHMEDIUM
2S002CERT90236505OCT2026OCT2026-23065MEDIUMMEDIUM
3S003UNI180564010OCT2026OCT2026-23520LOWHIGH
4S004UNI150127815OCT2026OCT2026-24080HIGHHIGH
5S005RECRUIT60009220OCT2026OCT2026-245100HIGHLOW
6S006RECRUIT75455522OCT2026OCT2026-24735LOWLOW
7S007CERT120326825OCT2026OCT2026-25060MEDIUMMEDIUM
8S008UNI180683028OCT2026OCT2026-2530LOWHIGH
9S009CERT90118030OCT2026OCT2026-25585HIGHMEDIUM
10S010RECRUIT60246002NOV2026NOV2026-25860MEDIUMLOW
11S011UNI150792505NOV2026NOV2026-2610LOWHIGH
12S012CERT120008808NOV2026NOV2026-264100HIGHMEDIUM
13S013RECRUIT90118210NOV2026NOV2026-26685HIGHMEDIUM

10. FRAUD DETECTION LOGIC (MACRO)

%macro fraud;

data proctor_fraud;

    set proctor_class;

    if Violations_Detected >=5 or Network_Issues >=6 then Fraud_Flag = "YES";

    else Fraud_Flag = "NO";

run;

proc print data=proctor_fraud;

run;

%mend;


%fraud;

OUTPUT:

ObsSession_IDExam_TypeDurationViolations_DetectedNetwork_IssuesCandidate_ScoreExam_DateExam_MonthDays_From_TodayIntegrity_PercentageIntegrity_LevelUtilization_ClassFraud_Flag
1S001CERT120018501OCT2026OCT2026-22695HIGHMEDIUMNO
2S002CERT90236505OCT2026OCT2026-23065MEDIUMMEDIUMNO
3S003UNI180564010OCT2026OCT2026-23520LOWHIGHYES
4S004UNI150127815OCT2026OCT2026-24080HIGHHIGHNO
5S005RECRUIT60009220OCT2026OCT2026-245100HIGHLOWNO
6S006RECRUIT75455522OCT2026OCT2026-24735LOWLOWNO
7S007CERT120326825OCT2026OCT2026-25060MEDIUMMEDIUMNO
8S008UNI180683028OCT2026OCT2026-2530LOWHIGHYES
9S009CERT90118030OCT2026OCT2026-25585HIGHMEDIUMNO
10S010RECRUIT60246002NOV2026NOV2026-25860MEDIUMLOWNO
11S011UNI150792505NOV2026NOV2026-2610LOWHIGHYES
12S012CERT120008808NOV2026NOV2026-264100HIGHMEDIUMNO
13S013RECRUIT90118210NOV2026NOV2026-26685HIGHMEDIUMNO

11. PROC SQL ANALYSIS

proc sql;

    create table score_summary as

    select Exam_Type,

           avg(Candidate_Score) as Avg_Score,

           avg(Integrity_Percentage) as Avg_Integrity

    from proctor_fraud

    group by Exam_Type;

quit;

proc print data=score_summary;

run;

OUTPUT:

ObsExam_TypeAvg_ScoreAvg_Integrity
1CERT77.2081
2RECRUIT72.2570
3UNI43.2525

12. PROC MEANS & PROC UNIVARIATE

proc means data=proctor_fraud mean min max;

    var Candidate_Score Violations_Detected Network_Issues;

run;

OUTPUT:

The MEANS Procedure

VariableMeanMinimumMaximum
Candidate_Score
Violations_Detected
Network_Issues
65.2307692
2.4615385
3.2307692
25.0000000
0
0
92.0000000
7.0000000
9.0000000

proc univariate data=proctor_fraud;

    var Integrity_Percentage;

run;

OUTPUT:

The UNIVARIATE Procedure

Variable: Integrity_Percentage

Moments
N13Sum Weights13
Mean60.3846154Sum Observations785
Std Deviation35.9665086Variance1293.58974
Skewness-0.6758585Kurtosis-0.886003
Uncorrected SS62925Corrected SS15523.0769
Coeff Variation59.562371Std Error Mean9.9753147
Basic Statistical Measures
LocationVariability
Mean60.38462Std Deviation35.96651
Median65.00000Variance1294
Mode0.00000Range100.00000
  Interquartile Range50.00000

Note: The mode displayed is the smallest of 4 modes with a count of 2.

Tests for Location: Mu0=0
TestStatisticp Value
Student's tt6.053405Pr > |t|<.0001
SignM5.5Pr >= |M|0.0010
Signed RankS33Pr >= |S|0.0010
Quantiles (Definition 5)
LevelQuantile
100% Max100
99%100
95%100
90%100
75% Q385
50% Median65
25% Q135
10%0
5%0
1%0
0% Min0
Extreme Observations
LowestHighest
ValueObsValueObs
011859
088513
203951
3561005
601010012

13. PROC FREQ

proc freq data=proctor_fraud;

    tables Integrity_Level Fraud_Flag / nocum;

run;

OUTPUT:

The FREQ Procedure

Integrity_LevelFrequencyPercent
HIGH646.15
LOW430.77
MEDIUM323.08
Fraud_FlagFrequencyPercent
NO1076.92
YES323.08

14. CORRELATION ANALYSIS

proc corr data=proctor_fraud;

    var Violations_Detected Network_Issues Candidate_Score Integrity_Percentage;

run;

OUTPUT:

The CORR Procedure

4 Variables:Violations_Detected Network_Issues Candidate_Score Integrity_Percentage
Simple Statistics
VariableNMeanStd DevSumMinimumMaximum
Violations_Detected132.461542.3669732.0000007.00000
Network_Issues133.230772.9764042.0000009.00000
Candidate_Score1365.2307722.21919848.0000025.0000092.00000
Integrity_Percentage1360.3846235.96651785.000000100.00000
Pearson Correlation Coefficients, N = 13
Prob > |r| under H0: Rho=0
 Violations_DetectedNetwork_IssuesCandidate_ScoreIntegrity_Percentage
Violations_Detected
1.00000
 
0.96540
<.0001
-0.97984
<.0001
-0.99092
<.0001
Network_Issues
0.96540
<.0001
1.00000
 
-0.99004
<.0001
-0.98174
<.0001
Candidate_Score
-0.97984
<.0001
-0.99004
<.0001
1.00000
 
0.99156
<.0001
Integrity_Percentage
-0.99092
<.0001
-0.98174
<.0001
0.99156
<.0001
1.00000
 

15. VISUALIZATION – PROC SGPLOT

proc sgplot data=proctor_fraud;

    scatter x=Violations_Detected y=Candidate_Score;

run;

OUTPUT:

The SGPlot Procedure

16. TRANSPOSE, MERGE, APPEND

TRANSPOSE

proc transpose data=score_summary out=score_t;

    id Exam_Type;

    var Avg_Score;

run;

proc print data=score_t;

run;

OUTPUT:

Obs_NAME_CERTRECRUITUNI
1Avg_Score77.272.2543.25

APPEND(INTENTIONAL ERROR INCLUDED)

proc append base=proctor_fraud 

            data=proctor_fraud force;

run;

proc print data=proctor_fraud;

run;

OUTPUT:

ObsSession_IDExam_TypeDurationViolations_DetectedNetwork_IssuesCandidate_ScoreExam_DateExam_MonthDays_From_TodayIntegrity_PercentageIntegrity_LevelUtilization_ClassFraud_Flag
1S001CERT120018501OCT2026OCT2026-22695HIGHMEDIUMNO
2S002CERT90236505OCT2026OCT2026-23065MEDIUMMEDIUMNO
3S003UNI180564010OCT2026OCT2026-23520LOWHIGHYES
4S004UNI150127815OCT2026OCT2026-24080HIGHHIGHNO
5S005RECRUIT60009220OCT2026OCT2026-245100HIGHLOWNO
6S006RECRUIT75455522OCT2026OCT2026-24735LOWLOWNO
7S007CERT120326825OCT2026OCT2026-25060MEDIUMMEDIUMNO
8S008UNI180683028OCT2026OCT2026-2530LOWHIGHYES
9S009CERT90118030OCT2026OCT2026-25585HIGHMEDIUMNO
10S010RECRUIT60246002NOV2026NOV2026-25860MEDIUMLOWNO
11S011UNI150792505NOV2026NOV2026-2610LOWHIGHYES
12S012CERT120008808NOV2026NOV2026-264100HIGHMEDIUMNO
13S013RECRUIT90118210NOV2026NOV2026-26685HIGHMEDIUMNO
14S001CERT120018501OCT2026OCT2026-22695HIGHMEDIUMNO
15S002CERT90236505OCT2026OCT2026-23065MEDIUMMEDIUMNO
16S003UNI180564010OCT2026OCT2026-23520LOWHIGHYES
17S004UNI150127815OCT2026OCT2026-24080HIGHHIGHNO
18S005RECRUIT60009220OCT2026OCT2026-245100HIGHLOWNO
19S006RECRUIT75455522OCT2026OCT2026-24735LOWLOWNO
20S007CERT120326825OCT2026OCT2026-25060MEDIUMMEDIUMNO
21S008UNI180683028OCT2026OCT2026-2530LOWHIGHYES
22S009CERT90118030OCT2026OCT2026-25585HIGHMEDIUMNO
23S010RECRUIT60246002NOV2026NOV2026-25860MEDIUMLOWNO
24S011UNI150792505NOV2026NOV2026-2610LOWHIGHYES
25S012CERT120008808NOV2026NOV2026-264100HIGHMEDIUMNO
26S013RECRUIT90118210NOV2026NOV2026-26685HIGHMEDIUMNO

Why this is wrong

·  You are appending the same dataset into itself

·  No new records are added

·  Interviewers will treat this as a logic mistake

·  In real projects, this is never done

CORRECT VERSION OF APPEND

CREATE A NEW DATASET

data proctor_new_sessions;

    length Session_ID $6 Exam_Type $10 Fraud_Flag $3 Integrity_Level $6;

    input Session_ID $ Exam_Type $ Exam_Date : date9. Duration Violations_Detected

          Network_Issues Candidate_Score Integrity_Percentage Integrity_Level $

          Fraud_Flag $;

    format Exam_Date date9.;

datalines;

S014 CERT 12NOV2026 120 1 1 83 85 HIGH NO

S015 UNI 14NOV2026 180 6 7 35 30 LOW YES

;

run;

proc print data=proctor_new_sessions;

run;

OUTPUT:

ObsSession_IDExam_TypeFraud_FlagIntegrity_LevelExam_DateDurationViolations_DetectedNetwork_IssuesCandidate_ScoreIntegrity_Percentage
1S014CERTNOHIGH12NOV2026120118385
2S015UNIYESLOW14NOV2026180673530

APPEND NEW DATA INTO MAIN DATASET

proc append base=proctor_fraud

            data=proctor_new_sessions

            force;

run;

proc print data=proctor_fraud;

run;

OUTPUT:

ObsSession_IDExam_TypeDurationViolations_DetectedNetwork_IssuesCandidate_ScoreExam_DateExam_MonthDays_From_TodayIntegrity_PercentageIntegrity_LevelUtilization_ClassFraud_Flag
1S001CERT120018501OCT2026OCT2026-22695HIGHMEDIUMNO
2S002CERT90236505OCT2026OCT2026-23065MEDIUMMEDIUMNO
3S003UNI180564010OCT2026OCT2026-23520LOWHIGHYES
4S004UNI150127815OCT2026OCT2026-24080HIGHHIGHNO
5S005RECRUIT60009220OCT2026OCT2026-245100HIGHLOWNO
6S006RECRUIT75455522OCT2026OCT2026-24735LOWLOWNO
7S007CERT120326825OCT2026OCT2026-25060MEDIUMMEDIUMNO
8S008UNI180683028OCT2026OCT2026-2530LOWHIGHYES
9S009CERT90118030OCT2026OCT2026-25585HIGHMEDIUMNO
10S010RECRUIT60246002NOV2026NOV2026-25860MEDIUMLOWNO
11S011UNI150792505NOV2026NOV2026-2610LOWHIGHYES
12S012CERT120008808NOV2026NOV2026-264100HIGHMEDIUMNO
13S013RECRUIT90118210NOV2026NOV2026-26685HIGHMEDIUMNO
14S014CERT120118312NOV2026..85HIGH NO
15S015UNI180673514NOV2026..30LOW YES

Logic

·  New exam sessions arrive periodically

·  Existing dataset = historical data

·  New dataset = incremental data

·  PROC APPEND is used for fast row-wise addition

Technical Logic

·  BASE= → target dataset

·  DATA= → incoming dataset

·  FORCE → allows append even if variable order differs

PROC SORT

proc sort data=proctor_fraud;by Exam_Type;run;

proc print data=proctor_fraud;

run;

OUTPUT:

ObsSession_IDExam_TypeDurationViolations_DetectedNetwork_IssuesCandidate_ScoreExam_DateExam_MonthDays_From_TodayIntegrity_PercentageIntegrity_LevelUtilization_ClassFraud_Flag
1S001CERT120018501OCT2026OCT2026-22695HIGHMEDIUMNO
2S002CERT90236505OCT2026OCT2026-23065MEDIUMMEDIUMNO
3S007CERT120326825OCT2026OCT2026-25060MEDIUMMEDIUMNO
4S009CERT90118030OCT2026OCT2026-25585HIGHMEDIUMNO
5S012CERT120008808NOV2026NOV2026-264100HIGHMEDIUMNO
6S014CERT120118312NOV2026..85HIGH NO
7S005RECRUIT60009220OCT2026OCT2026-245100HIGHLOWNO
8S006RECRUIT75455522OCT2026OCT2026-24735LOWLOWNO
9S010RECRUIT60246002NOV2026NOV2026-25860MEDIUMLOWNO
10S013RECRUIT90118210NOV2026NOV2026-26685HIGHMEDIUMNO
11S003UNI180564010OCT2026OCT2026-23520LOWHIGHYES
12S004UNI150127815OCT2026OCT2026-24080HIGHHIGHNO
13S008UNI180683028OCT2026OCT2026-2530LOWHIGHYES
14S011UNI150792505NOV2026NOV2026-2610LOWHIGHYES
15S015UNI180673514NOV2026..30LOW YES

proc sort data=score_summary;by Exam_Type;run;

proc print data=score_summary;

run;

OUTPUT:

ObsExam_TypeAvg_ScoreAvg_Integrity
1CERT77.2081
2RECRUIT72.2570
3UNI43.2525

MERGE

data merged_data;

    merge proctor_fraud 

          score_summary;

    by Exam_Type;

run;

proc print data=merged_data;

run;

OUTPUT:

ObsSession_IDExam_TypeDurationViolations_DetectedNetwork_IssuesCandidate_ScoreExam_DateExam_MonthDays_From_TodayIntegrity_PercentageIntegrity_LevelUtilization_ClassFraud_FlagAvg_ScoreAvg_Integrity
1S001CERT120018501OCT2026OCT2026-22695HIGHMEDIUMNO77.2081
2S002CERT90236505OCT2026OCT2026-23065MEDIUMMEDIUMNO77.2081
3S007CERT120326825OCT2026OCT2026-25060MEDIUMMEDIUMNO77.2081
4S009CERT90118030OCT2026OCT2026-25585HIGHMEDIUMNO77.2081
5S012CERT120008808NOV2026NOV2026-264100HIGHMEDIUMNO77.2081
6S014CERT120118312NOV2026..85HIGH NO77.2081
7S005RECRUIT60009220OCT2026OCT2026-245100HIGHLOWNO72.2570
8S006RECRUIT75455522OCT2026OCT2026-24735LOWLOWNO72.2570
9S010RECRUIT60246002NOV2026NOV2026-25860MEDIUMLOWNO72.2570
10S013RECRUIT90118210NOV2026NOV2026-26685HIGHMEDIUMNO72.2570
11S003UNI180564010OCT2026OCT2026-23520LOWHIGHYES43.2525
12S004UNI150127815OCT2026OCT2026-24080HIGHHIGHNO43.2525
13S008UNI180683028OCT2026OCT2026-2530LOWHIGHYES43.2525
14S011UNI150792505NOV2026NOV2026-2610LOWHIGHYES43.2525
15S015UNI180673514NOV2026..30LOW YES43.2525

17. CHARACTER FUNCTIONS

data function_demo;

    set proctor_fraud;

    Exam_Type_Upper = upcase(Exam_Type);

    Exam_Type_Lower = lowcase(Exam_Type);

    Clean_Type = propcase(strip(Exam_Type));

    Combined_ID = catx("-", Session_ID, Exam_Type);

run;

proc print data=function_demo;

   var Exam_Type Exam_Type_Upper Exam_Type_Lower Clean_Type Session_ID Combined_ID;

run;

OUTPUT:

ObsExam_TypeExam_Type_UpperExam_Type_LowerClean_TypeSession_IDCombined_ID
1CERTCERTcertCertS001S001-CERT
2CERTCERTcertCertS002S002-CERT
3CERTCERTcertCertS007S007-CERT
4CERTCERTcertCertS009S009-CERT
5CERTCERTcertCertS012S012-CERT
6CERTCERTcertCertS014S014-CERT
7RECRUITRECRUITrecruitRecruitS005S005-RECRUIT
8RECRUITRECRUITrecruitRecruitS006S006-RECRUIT
9RECRUITRECRUITrecruitRecruitS010S010-RECRUIT
10RECRUITRECRUITrecruitRecruitS013S013-RECRUIT
11UNIUNIuniUniS003S003-UNI
12UNIUNIuniUniS004S004-UNI
13UNIUNIuniUniS008S008-UNI
14UNIUNIuniUniS011S011-UNI
15UNIUNIuniUniS015S015-UNI

18. PROC DATASETS DELETE

proc datasets library=work nolist;

    delete function_demo;

quit;

LOG:

NOTE: Deleting WORK.FUNCTION_DEMO (memtype=DATA).

19. TLF OVERVIEW (WHY TLF IS IMPORTANT)

In real projects, analysis alone is not enough.
Stakeholders want outputs:

  • Tables → summarized numbers
  • Listings → detailed subject/session-level data
  • Figures → visual understanding

TLF is mandatory in:

  • Analytics reporting
  • Clinical & non-clinical domains
  • Client deliverables
  • SAS programmer interviews

20. TABLE 1 – EXAM TYPE WISE INTEGRITY & SCORE SUMMARY

proc sql;

    create table tlf_table1 as

    select Exam_Type,

           count(Session_ID) as Total_Sessions,

           mean(Candidate_Score) as Avg_Score format=6.2,

           mean(Integrity_Percentage) as Avg_Integrity format=6.2,

           sum(Violations_Detected) as Total_Violations

    from proctor_fraud

    group by Exam_Type;

quit;

proc print data=tlf_table1;

run;

OUTPUT:

ObsExam_TypeTotal_SessionsAvg_ScoreAvg_IntegrityTotal_Violations
1CERT678.1781.677
2RECRUIT472.2570.007
3UNI541.6026.0025

Explanation

·  PROC SQL → clean summary table creation

·  COUNT() → number of sessions per exam type

·  MEAN() → average performance & integrity

·  FORMAT= → presentation-ready output

·  GROUP BY Exam_Type → business-level comparison

21. LISTING 1 – FRAUD SUSPECTED PROCTORING SESSIONS

proc print data=proctor_fraud noobs label;

    where Fraud_Flag = "YES";

    var Session_ID Exam_Type Exam_Date Duration

        Violations_Detected Network_Issues

        Candidate_Score Integrity_Level Fraud_Flag;

    label

        Session_ID = "Session ID"

        Exam_Type = "Exam Type"

        Exam_Date = "Exam Date"

        Violations_Detected = "Violations"

        Network_Issues = "Network Issues"

        Candidate_Score = "Score"

        Integrity_Level = "Integrity";

run;

OUTPUT:

Session IDExam TypeExam DateDurationViolationsNetwork IssuesScoreIntegrityFraud_Flag
S003UNI10OCT20261805640LOWYES
S008UNI28OCT20261806830LOWYES
S011UNI05NOV20261507925LOWYES
S015UNI14NOV20261806735LOWYES

Explanation

·  PROC PRINT → listing-level output

·  WHERE Fraud_Flag="YES" → filters suspicious cases

·  VAR → controls column order

·  LABEL → business-friendly naming

22. LISTING 2 – LOW INTEGRITY SESSIONS (QUALITY RISK)

proc print data=proctor_fraud;

    where Integrity_Level = "LOW";

    var Session_ID Exam_Type Candidate_Score Integrity_Percentage;

run;

OUTPUT:

ObsSession_IDExam_TypeCandidate_ScoreIntegrity_Percentage
8S006RECRUIT5535
11S003UNI4020
13S008UNI300
14S011UNI250
15S015UNI3530

Explanation

·  Identifies high-risk candidates

·  Used by:

·       Academic review boards

·       Compliance teams

·  Simple filter → powerful insight

23. FIGURE 1 – VIOLATIONS vs CANDIDATE SCORE

proc sgplot data=proctor_fraud;

    scatter x=Violations_Detected y=Candidate_Score;

    xaxis label="Violations Detected";

    yaxis label="Candidate Score";

run;

OUTPUT:

The SGPlot Procedure

Explanation

·  PROC SGPLOT → modern SAS graphics

·  SCATTER → relationship analysis

·  Shows negative correlation visually

24. FIGURE 2 – INTEGRITY PERCENTAGE DISTRIBUTION

proc sgplot data=proctor_fraud;

    histogram Integrity_Percentage;

    density Integrity_Percentage;

run;

OUTPUT:

The SGPlot Procedure

Explanation

·  HISTOGRAM → distribution of integrity

·  DENSITY → smooth curve overlay

·  Helps identify:

·       Majority behavior

·       Outliers

·       Risk thresholds

25. TABLE 2 – FRAUD FLAG SUMMARY

proc freq data=proctor_fraud;

    tables Fraud_Flag Integrity_Level / nocum nopercent;

run;

OUTPUT:

The FREQ Procedure

Fraud_FlagFrequency
NO11
YES4
Integrity_LevelFrequency
HIGH7
LOW5
MEDIUM3

Explanation

·  PROC FREQ → categorical summary

·  NOCUM NOPERCENT → clean counts

·  Useful for:

·       Management dashboards

·       Compliance reporting

26. TLF QUALITY CHECK

proc means data=proctor_fraud n nmiss; 

    var Candidate_Score Integrity_Percentage;

run;

OUTPUT:

The MEANS Procedure

VariableNN Miss
Candidate_Score
Integrity_Percentage
15
15
0
0

Explanation

·  Checks missing values

·  Confirms data completeness

·  Shows production readiness

27. 5 KEY POINTS OF THIS PROJECT

1.     This project simulates a online exam proctoring system, not a dummy dataset, making it highly interview-relevant.

2.     It demonstrates how data issues happen in real life by intentionally creating an error and fixing it professionally using SAS.

3.     Fraud detection and integrity classification are built using clear business rules, not assumptions, which mirrors industry expectations.

4.     The project covers the complete analytics lifecycle – data creation, cleaning, analysis, reporting.

5.     The inclusion of TLF (Tables, Listings, Figures) makes this project production-ready and suitable for client or management reporting.

28. FINAL CONCLUSION

This Online Exam Proctoring Analytics project shows how SAS can be used to solve real-world problems related to exam integrity and fraud detection. Instead of focusing only on analysis, the project covers the full journey of data handling—from raw data creation and error identification to correction, analysis, and final reporting using TLFs. The intentional data error highlights a common industry challenge and demonstrates how a SAS programmer should identify and fix such issues confidently. By applying macros, date functions, statistical procedures, and visualization techniques, the project transforms raw proctoring data into meaningful insights. The TLF section ensures the results are easy to understand for non-technical stakeholders such as management, academic boards, or compliance teams. Overall, this project is simple to explain, strong in logic, and highly effective for interviews, training, and real analytics work.


INTERVIEW QUESTIONS FOR YOU

·  How do you optimize a slow-running SAS program?

·  What steps do you take if output is wrong?

·  How do you handle large datasets?

--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

About the Author:

SAS Learning Hub is a data analytics and SAS programming platform focused on clinical, financial, and real-world data analysis. The content is created by professionals with academic training in Pharmaceutics and hands-on experience in Base SAS, PROC SQL, Macros, SDTM, and ADaM, providing practical and industry-relevant SAS learning resources.


Disclaimer:

The datasets and analysis in this article are created for educational and demonstration purposes only. They do not represent EXAM data.


Our Mission:

This blog provides industry-focused SAS programming tutorials and analytics projects covering finance, healthcare, and technology.


This project is suitable for:

·  Students learning SAS

·  Data analysts building portfolios

·  Professionals preparing for SAS interviews

·  Bloggers writing about analytics and Exams Reviewers and Observers


--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Follow Us On : 


 
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

--->Follow our blog for more SAS-based analytics projects and industry data models.

---> Support Us By Following Our Blog..

To deepen your understanding of SAS analytics, please refer to our other data science and industry-focused projects listed below:

1.Can we analyze planets using SAS like NASA scientists?

2.Can SAS Compare How 3 Different Families Live in Their Homes?

3.Can SAS Help Us Choose the Most Efficient Cooler?

--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

About Us | Contact Privacy Policy



Comments

Popular posts from this blog

397.If a satellite has excellent signal strength but very high latency, can it still deliver good quality communication? Why or why not?A Sas Study

401.How Efficient Are Global Data Centers? A Complete SAS Analytics Study

383.Which Pharma Distributors Are High-Risk? Can SAS Detect Delays, Temperature Violations, and Fraud?