403.PROC FREQ Says Fraud Is High — Is It Real or a Data Issue? How Do You Validate It?
PROC FREQ Says Fraud Is High — Is It Real or a Data Issue? How Do You Validate It?
TABLE OF CONTENTS
1.
Introduction
2.
Business Context of Online Exam Proctoring
3.
Project Objectives and Analytical Scope
4.
Dataset Design and Variable Rationale
5.
Raw Data Creation with Intentional Data Issue
6.
Identification and Explanation of the Data Error
7.
Data Correction and Standardization
8.
Derived Variables and Date Handling Logic (MDY, INTCK,
INTNX)
9.
Integrity and Utilization Classification Using SAS
Macros
10. Fraud
Detection Rules and Flagging Logic (Macros)
11. Descriptive
Analytics Using PROC SQL
12. Statistical
Summary Using PROC MEANS and PROC UNIVARIATE
13. Categorical
Analysis Using PROC FREQ
14. Relationship
Analysis Using PROC CORR
15. Data
Visualization Using PROC SGPLOT
16. Advanced
Data Handling Using SET, MERGE, APPEND, and TRANSPOSE
17. Character
and Numeric Function Applications in SAS
18. Dataset
Maintenance and Cleanup Using PROC DATASETS
19. TLF
Overview – Reporting Strategy
20. Table
Outputs – Summary and Performance Tables
21. Listing
Outputs – Fraud and Low Integrity Session Listings
22. Figure
Outputs – Score, Violation, and Integrity Visualizations
23. 5 Key
Points Of This Project
24. Final
Conclusion
1. INTRODUCTION
Online examinations have become the backbone
of modern education, recruitment, and certification processes.
With this shift, online exam proctoring
systems play a critical role in maintaining exam integrity, candidate
authenticity, and fair
evaluation.
This SAS project simulates real-world proctoring session data and
applies advanced analytics, fraud detection rules, and quality control logic using Base SAS + PROC SQL + Macros.
2. BUSINESS CONTEXT
Organizations using online exams face
challenges such as:
·
Cheating attempts
·
Network manipulation
·
Multiple violations during exams
·
Low integrity candidates passing exams
·
Inconsistent proctoring enforcement
This project helps:
·
EdTech
companies
·
Certification
bodies
·
Universities
·
Recruitment
platforms
to monitor,
classify, and detect suspicious exam behavior using
SAS.
3. PROJECT OBJECTIVES
1.Create a realistic proctoring dataset
2. Apply fraud detection logic
3. Classify integrity & utilization
4. Analyze relationships between violations and scores
5. Demonstrate interview-level SAS
skills
4.DATASET DESIGN
Core Variables
|
Variable |
Description |
|
Session_ID |
Unique proctoring session |
|
Exam_Type |
Certification / University / Recruitment |
|
Exam_Date |
Exam conducted date |
|
Duration |
Exam duration (minutes) |
|
Violations_Detected |
Count of suspicious actions |
|
Network_Issues |
Network interruptions |
|
Candidate_Score |
Final score |
|
Integrity_Level |
Derived category |
|
Integrity_Percentage |
Calculated metric |
|
Fraud_Flag |
Yes/No |
|
Utilization_Class |
Low / Medium / High |
5. RAW DATA CREATION(INTENTIONAL ERROR INCLUDED)
data proctor_raw;
length Session_ID $6 Exam_Type $15 Exam_Date $10;
input Session_ID $ Exam_Type $ Exam_Date $ Duration Violations_Detected
Network_Issues Candidate_Score;
datalines;
S001 CERT 01-10-2026 120 0 1 85
S002 CERT 05-10-2026 90 2 3 65
S003 UNI 10-10-2026 180 5 6 40
S004 UNI 15-10-2026 150 1 2 78
S005 RECRUIT 20-10-2026 60 0 0 92
S006 RECRUIT 22-10-2026 75 4 5 55
S007 CERT 25-10-2026 120 3 2 68
S008 UNI 28-10-2026 180 6 8 30
S009 CERT 30-10-2026 90 1 1 80
S010 RECRUIT 02-11-2026 60 2 4 60
S011 UNI 05-11-2026 150 7 9 25
S012 CERT 08-11-2026 120 0 0 88
S013 RECRUIT 10-11-2026 90 1 1 82
;
run;
proc print data=proctor_raw;
run;
OUTPUT:
| Obs | Session_ID | Exam_Type | Exam_Date | Duration | Violations_Detected | Network_Issues | Candidate_Score |
|---|---|---|---|---|---|---|---|
| 1 | S001 | CERT | 01-10-2026 | 120 | 0 | 1 | 85 |
| 2 | S002 | CERT | 05-10-2026 | 90 | 2 | 3 | 65 |
| 3 | S003 | UNI | 10-10-2026 | 180 | 5 | 6 | 40 |
| 4 | S004 | UNI | 15-10-2026 | 150 | 1 | 2 | 78 |
| 5 | S005 | RECRUIT | 20-10-2026 | 60 | 0 | 0 | 92 |
| 6 | S006 | RECRUIT | 22-10-2026 | 75 | 4 | 5 | 55 |
| 7 | S007 | CERT | 25-10-2026 | 120 | 3 | 2 | 68 |
| 8 | S008 | UNI | 28-10-2026 | 180 | 6 | 8 | 30 |
| 9 | S009 | CERT | 30-10-2026 | 90 | 1 | 1 | 80 |
| 10 | S010 | RECRUIT | 02-11-2026 | 60 | 2 | 4 | 60 |
| 11 | S011 | UNI | 05-11-2026 | 150 | 7 | 9 | 25 |
| 12 | S012 | CERT | 08-11-2026 | 120 | 0 | 0 | 88 |
| 13 | S013 | RECRUIT | 10-11-2026 | 90 | 1 | 1 | 82 |
6. IDENTIFYING
THE ERROR
ERROR EXPLANATION
·
Exam_Date is character, not numeric SAS date
·
Functions like INTCK, INTNX,
MDY
will fail
·
Date calculations will produce errors or
missing values
7. CORRECTING THE ERROR (PROPER DATE HANDLING)
data proctor_clean;
set proctor_raw;
Exam_Date_Num = input(Exam_Date, ddmmyy10.);
format Exam_Date_Num date9.;
drop Exam_Date;
rename Exam_Date_Num = Exam_Date;
run;
proc print data=proctor_clean;
run;
OUTPUT:
| Obs | Session_ID | Exam_Type | Duration | Violations_Detected | Network_Issues | Candidate_Score | Exam_Date |
|---|---|---|---|---|---|---|---|
| 1 | S001 | CERT | 120 | 0 | 1 | 85 | 01OCT2026 |
| 2 | S002 | CERT | 90 | 2 | 3 | 65 | 05OCT2026 |
| 3 | S003 | UNI | 180 | 5 | 6 | 40 | 10OCT2026 |
| 4 | S004 | UNI | 150 | 1 | 2 | 78 | 15OCT2026 |
| 5 | S005 | RECRUIT | 60 | 0 | 0 | 92 | 20OCT2026 |
| 6 | S006 | RECRUIT | 75 | 4 | 5 | 55 | 22OCT2026 |
| 7 | S007 | CERT | 120 | 3 | 2 | 68 | 25OCT2026 |
| 8 | S008 | UNI | 180 | 6 | 8 | 30 | 28OCT2026 |
| 9 | S009 | CERT | 90 | 1 | 1 | 80 | 30OCT2026 |
| 10 | S010 | RECRUIT | 60 | 2 | 4 | 60 | 02NOV2026 |
| 11 | S011 | UNI | 150 | 7 | 9 | 25 | 05NOV2026 |
| 12 | S012 | CERT | 120 | 0 | 0 | 88 | 08NOV2026 |
| 13 | S013 | RECRUIT | 90 | 1 | 1 | 82 | 10NOV2026 |
Why this works
·
Converts character → numeric date
·
Enables date arithmetic
·
Best practice in clinical &
analytics domains
8. DERIVED VARIABLES & DATE LOGIC
data proctor_dates;
set proctor_clean;
Exam_Month = intnx('month', Exam_Date, 0, 'b');
Days_From_Today = intck('day', Exam_Date, today());
format Exam_Month monyy7.;
run;
proc print data=proctor_dates;
run;
OUTPUT:
| Obs | Session_ID | Exam_Type | Duration | Violations_Detected | Network_Issues | Candidate_Score | Exam_Date | Exam_Month | Days_From_Today |
|---|---|---|---|---|---|---|---|---|---|
| 1 | S001 | CERT | 120 | 0 | 1 | 85 | 01OCT2026 | OCT2026 | -226 |
| 2 | S002 | CERT | 90 | 2 | 3 | 65 | 05OCT2026 | OCT2026 | -230 |
| 3 | S003 | UNI | 180 | 5 | 6 | 40 | 10OCT2026 | OCT2026 | -235 |
| 4 | S004 | UNI | 150 | 1 | 2 | 78 | 15OCT2026 | OCT2026 | -240 |
| 5 | S005 | RECRUIT | 60 | 0 | 0 | 92 | 20OCT2026 | OCT2026 | -245 |
| 6 | S006 | RECRUIT | 75 | 4 | 5 | 55 | 22OCT2026 | OCT2026 | -247 |
| 7 | S007 | CERT | 120 | 3 | 2 | 68 | 25OCT2026 | OCT2026 | -250 |
| 8 | S008 | UNI | 180 | 6 | 8 | 30 | 28OCT2026 | OCT2026 | -253 |
| 9 | S009 | CERT | 90 | 1 | 1 | 80 | 30OCT2026 | OCT2026 | -255 |
| 10 | S010 | RECRUIT | 60 | 2 | 4 | 60 | 02NOV2026 | NOV2026 | -258 |
| 11 | S011 | UNI | 150 | 7 | 9 | 25 | 05NOV2026 | NOV2026 | -261 |
| 12 | S012 | CERT | 120 | 0 | 0 | 88 | 08NOV2026 | NOV2026 | -264 |
| 13 | S013 | RECRUIT | 90 | 1 | 1 | 82 | 10NOV2026 | NOV2026 | -266 |
9. INTEGRITY & UTILIZATION CLASSIFICATION (MACRO)
%macro classify;
data proctor_class;
set proctor_dates;
Integrity_Percentage = max(0, 100 - (Violations_Detected*10) - (Network_Issues*5));
length Integrity_Level $8.;
if Integrity_Percentage >= 80 then Integrity_Level = "HIGH";
else if Integrity_Percentage >= 50 then Integrity_Level = "MEDIUM";
else Integrity_Level = "LOW";
length Utilization_Class $8.;
if Duration >= 150 then Utilization_Class = "HIGH";
else if Duration >= 90 then Utilization_Class = "MEDIUM";
else Utilization_Class = "LOW";
run;
proc print data=proctor_class;
run;
%mend;
%classify;
OUTPUT:
| Obs | Session_ID | Exam_Type | Duration | Violations_Detected | Network_Issues | Candidate_Score | Exam_Date | Exam_Month | Days_From_Today | Integrity_Percentage | Integrity_Level | Utilization_Class |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | S001 | CERT | 120 | 0 | 1 | 85 | 01OCT2026 | OCT2026 | -226 | 95 | HIGH | MEDIUM |
| 2 | S002 | CERT | 90 | 2 | 3 | 65 | 05OCT2026 | OCT2026 | -230 | 65 | MEDIUM | MEDIUM |
| 3 | S003 | UNI | 180 | 5 | 6 | 40 | 10OCT2026 | OCT2026 | -235 | 20 | LOW | HIGH |
| 4 | S004 | UNI | 150 | 1 | 2 | 78 | 15OCT2026 | OCT2026 | -240 | 80 | HIGH | HIGH |
| 5 | S005 | RECRUIT | 60 | 0 | 0 | 92 | 20OCT2026 | OCT2026 | -245 | 100 | HIGH | LOW |
| 6 | S006 | RECRUIT | 75 | 4 | 5 | 55 | 22OCT2026 | OCT2026 | -247 | 35 | LOW | LOW |
| 7 | S007 | CERT | 120 | 3 | 2 | 68 | 25OCT2026 | OCT2026 | -250 | 60 | MEDIUM | MEDIUM |
| 8 | S008 | UNI | 180 | 6 | 8 | 30 | 28OCT2026 | OCT2026 | -253 | 0 | LOW | HIGH |
| 9 | S009 | CERT | 90 | 1 | 1 | 80 | 30OCT2026 | OCT2026 | -255 | 85 | HIGH | MEDIUM |
| 10 | S010 | RECRUIT | 60 | 2 | 4 | 60 | 02NOV2026 | NOV2026 | -258 | 60 | MEDIUM | LOW |
| 11 | S011 | UNI | 150 | 7 | 9 | 25 | 05NOV2026 | NOV2026 | -261 | 0 | LOW | HIGH |
| 12 | S012 | CERT | 120 | 0 | 0 | 88 | 08NOV2026 | NOV2026 | -264 | 100 | HIGH | MEDIUM |
| 13 | S013 | RECRUIT | 90 | 1 | 1 | 82 | 10NOV2026 | NOV2026 | -266 | 85 | HIGH | MEDIUM |
10. FRAUD DETECTION LOGIC (MACRO)
%macro fraud;
data proctor_fraud;
set proctor_class;
if Violations_Detected >=5 or Network_Issues >=6 then Fraud_Flag = "YES";
else Fraud_Flag = "NO";
run;
proc print data=proctor_fraud;
run;
%mend;
%fraud;
OUTPUT:
| Obs | Session_ID | Exam_Type | Duration | Violations_Detected | Network_Issues | Candidate_Score | Exam_Date | Exam_Month | Days_From_Today | Integrity_Percentage | Integrity_Level | Utilization_Class | Fraud_Flag |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | S001 | CERT | 120 | 0 | 1 | 85 | 01OCT2026 | OCT2026 | -226 | 95 | HIGH | MEDIUM | NO |
| 2 | S002 | CERT | 90 | 2 | 3 | 65 | 05OCT2026 | OCT2026 | -230 | 65 | MEDIUM | MEDIUM | NO |
| 3 | S003 | UNI | 180 | 5 | 6 | 40 | 10OCT2026 | OCT2026 | -235 | 20 | LOW | HIGH | YES |
| 4 | S004 | UNI | 150 | 1 | 2 | 78 | 15OCT2026 | OCT2026 | -240 | 80 | HIGH | HIGH | NO |
| 5 | S005 | RECRUIT | 60 | 0 | 0 | 92 | 20OCT2026 | OCT2026 | -245 | 100 | HIGH | LOW | NO |
| 6 | S006 | RECRUIT | 75 | 4 | 5 | 55 | 22OCT2026 | OCT2026 | -247 | 35 | LOW | LOW | NO |
| 7 | S007 | CERT | 120 | 3 | 2 | 68 | 25OCT2026 | OCT2026 | -250 | 60 | MEDIUM | MEDIUM | NO |
| 8 | S008 | UNI | 180 | 6 | 8 | 30 | 28OCT2026 | OCT2026 | -253 | 0 | LOW | HIGH | YES |
| 9 | S009 | CERT | 90 | 1 | 1 | 80 | 30OCT2026 | OCT2026 | -255 | 85 | HIGH | MEDIUM | NO |
| 10 | S010 | RECRUIT | 60 | 2 | 4 | 60 | 02NOV2026 | NOV2026 | -258 | 60 | MEDIUM | LOW | NO |
| 11 | S011 | UNI | 150 | 7 | 9 | 25 | 05NOV2026 | NOV2026 | -261 | 0 | LOW | HIGH | YES |
| 12 | S012 | CERT | 120 | 0 | 0 | 88 | 08NOV2026 | NOV2026 | -264 | 100 | HIGH | MEDIUM | NO |
| 13 | S013 | RECRUIT | 90 | 1 | 1 | 82 | 10NOV2026 | NOV2026 | -266 | 85 | HIGH | MEDIUM | NO |
11. PROC SQL ANALYSIS
proc sql;
create table score_summary as
select Exam_Type,
avg(Candidate_Score) as Avg_Score,
avg(Integrity_Percentage) as Avg_Integrity
from proctor_fraud
group by Exam_Type;
quit;
proc print data=score_summary;
run;
OUTPUT:
| Obs | Exam_Type | Avg_Score | Avg_Integrity |
|---|---|---|---|
| 1 | CERT | 77.20 | 81 |
| 2 | RECRUIT | 72.25 | 70 |
| 3 | UNI | 43.25 | 25 |
12. PROC MEANS & PROC UNIVARIATE
proc means data=proctor_fraud mean min max;
var Candidate_Score Violations_Detected Network_Issues;
run;
OUTPUT:
The MEANS Procedure
| Variable | Mean | Minimum | Maximum |
|---|---|---|---|
Candidate_Score Violations_Detected Network_Issues | 65.2307692 2.4615385 3.2307692 | 25.0000000 0 0 | 92.0000000 7.0000000 9.0000000 |
proc univariate data=proctor_fraud;
var Integrity_Percentage;
run;
OUTPUT:
The UNIVARIATE Procedure
Variable: Integrity_Percentage
| Moments | |||
|---|---|---|---|
| N | 13 | Sum Weights | 13 |
| Mean | 60.3846154 | Sum Observations | 785 |
| Std Deviation | 35.9665086 | Variance | 1293.58974 |
| Skewness | -0.6758585 | Kurtosis | -0.886003 |
| Uncorrected SS | 62925 | Corrected SS | 15523.0769 |
| Coeff Variation | 59.562371 | Std Error Mean | 9.9753147 |
| Basic Statistical Measures | |||
|---|---|---|---|
| Location | Variability | ||
| Mean | 60.38462 | Std Deviation | 35.96651 |
| Median | 65.00000 | Variance | 1294 |
| Mode | 0.00000 | Range | 100.00000 |
| Interquartile Range | 50.00000 | ||
Note: The mode displayed is the smallest of 4 modes with a count of 2.
| Tests for Location: Mu0=0 | ||||
|---|---|---|---|---|
| Test | Statistic | p Value | ||
| Student's t | t | 6.053405 | Pr > |t| | <.0001 |
| Sign | M | 5.5 | Pr >= |M| | 0.0010 |
| Signed Rank | S | 33 | Pr >= |S| | 0.0010 |
| Quantiles (Definition 5) | |
|---|---|
| Level | Quantile |
| 100% Max | 100 |
| 99% | 100 |
| 95% | 100 |
| 90% | 100 |
| 75% Q3 | 85 |
| 50% Median | 65 |
| 25% Q1 | 35 |
| 10% | 0 |
| 5% | 0 |
| 1% | 0 |
| 0% Min | 0 |
| Extreme Observations | |||
|---|---|---|---|
| Lowest | Highest | ||
| Value | Obs | Value | Obs |
| 0 | 11 | 85 | 9 |
| 0 | 8 | 85 | 13 |
| 20 | 3 | 95 | 1 |
| 35 | 6 | 100 | 5 |
| 60 | 10 | 100 | 12 |
13. PROC FREQ
proc freq data=proctor_fraud;
tables Integrity_Level Fraud_Flag / nocum;
run;
OUTPUT:
The FREQ Procedure
| Integrity_Level | Frequency | Percent |
|---|---|---|
| HIGH | 6 | 46.15 |
| LOW | 4 | 30.77 |
| MEDIUM | 3 | 23.08 |
| Fraud_Flag | Frequency | Percent |
|---|---|---|
| NO | 10 | 76.92 |
| YES | 3 | 23.08 |
14. CORRELATION ANALYSIS
proc corr data=proctor_fraud;
var Violations_Detected Network_Issues Candidate_Score Integrity_Percentage;
run;
OUTPUT:
The CORR Procedure
| 4 Variables: | Violations_Detected Network_Issues Candidate_Score Integrity_Percentage |
|---|
| Simple Statistics | ||||||
|---|---|---|---|---|---|---|
| Variable | N | Mean | Std Dev | Sum | Minimum | Maximum |
| Violations_Detected | 13 | 2.46154 | 2.36697 | 32.00000 | 0 | 7.00000 |
| Network_Issues | 13 | 3.23077 | 2.97640 | 42.00000 | 0 | 9.00000 |
| Candidate_Score | 13 | 65.23077 | 22.21919 | 848.00000 | 25.00000 | 92.00000 |
| Integrity_Percentage | 13 | 60.38462 | 35.96651 | 785.00000 | 0 | 100.00000 |
| Pearson Correlation Coefficients, N = 13 Prob > |r| under H0: Rho=0 | ||||
|---|---|---|---|---|
| Violations_Detected | Network_Issues | Candidate_Score | Integrity_Percentage | |
| Violations_Detected | 1.00000 | 0.96540 <.0001 | -0.97984 <.0001 | -0.99092 <.0001 |
| Network_Issues | 0.96540 <.0001 | 1.00000 | -0.99004 <.0001 | -0.98174 <.0001 |
| Candidate_Score | -0.97984 <.0001 | -0.99004 <.0001 | 1.00000 | 0.99156 <.0001 |
| Integrity_Percentage | -0.99092 <.0001 | -0.98174 <.0001 | 0.99156 <.0001 | 1.00000 |
15. VISUALIZATION – PROC SGPLOT
proc sgplot data=proctor_fraud;
scatter x=Violations_Detected y=Candidate_Score;
run;
OUTPUT:
16. TRANSPOSE, MERGE, APPEND
TRANSPOSE
proc transpose data=score_summary out=score_t;
id Exam_Type;
var Avg_Score;
run;
proc print data=score_t;
run;
OUTPUT:
| Obs | _NAME_ | CERT | RECRUIT | UNI |
|---|---|---|---|---|
| 1 | Avg_Score | 77.2 | 72.25 | 43.25 |
APPEND(INTENTIONAL ERROR INCLUDED)
proc append base=proctor_fraud
data=proctor_fraud force;
run;
proc print data=proctor_fraud;
run;
OUTPUT:
| Obs | Session_ID | Exam_Type | Duration | Violations_Detected | Network_Issues | Candidate_Score | Exam_Date | Exam_Month | Days_From_Today | Integrity_Percentage | Integrity_Level | Utilization_Class | Fraud_Flag |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | S001 | CERT | 120 | 0 | 1 | 85 | 01OCT2026 | OCT2026 | -226 | 95 | HIGH | MEDIUM | NO |
| 2 | S002 | CERT | 90 | 2 | 3 | 65 | 05OCT2026 | OCT2026 | -230 | 65 | MEDIUM | MEDIUM | NO |
| 3 | S003 | UNI | 180 | 5 | 6 | 40 | 10OCT2026 | OCT2026 | -235 | 20 | LOW | HIGH | YES |
| 4 | S004 | UNI | 150 | 1 | 2 | 78 | 15OCT2026 | OCT2026 | -240 | 80 | HIGH | HIGH | NO |
| 5 | S005 | RECRUIT | 60 | 0 | 0 | 92 | 20OCT2026 | OCT2026 | -245 | 100 | HIGH | LOW | NO |
| 6 | S006 | RECRUIT | 75 | 4 | 5 | 55 | 22OCT2026 | OCT2026 | -247 | 35 | LOW | LOW | NO |
| 7 | S007 | CERT | 120 | 3 | 2 | 68 | 25OCT2026 | OCT2026 | -250 | 60 | MEDIUM | MEDIUM | NO |
| 8 | S008 | UNI | 180 | 6 | 8 | 30 | 28OCT2026 | OCT2026 | -253 | 0 | LOW | HIGH | YES |
| 9 | S009 | CERT | 90 | 1 | 1 | 80 | 30OCT2026 | OCT2026 | -255 | 85 | HIGH | MEDIUM | NO |
| 10 | S010 | RECRUIT | 60 | 2 | 4 | 60 | 02NOV2026 | NOV2026 | -258 | 60 | MEDIUM | LOW | NO |
| 11 | S011 | UNI | 150 | 7 | 9 | 25 | 05NOV2026 | NOV2026 | -261 | 0 | LOW | HIGH | YES |
| 12 | S012 | CERT | 120 | 0 | 0 | 88 | 08NOV2026 | NOV2026 | -264 | 100 | HIGH | MEDIUM | NO |
| 13 | S013 | RECRUIT | 90 | 1 | 1 | 82 | 10NOV2026 | NOV2026 | -266 | 85 | HIGH | MEDIUM | NO |
| 14 | S001 | CERT | 120 | 0 | 1 | 85 | 01OCT2026 | OCT2026 | -226 | 95 | HIGH | MEDIUM | NO |
| 15 | S002 | CERT | 90 | 2 | 3 | 65 | 05OCT2026 | OCT2026 | -230 | 65 | MEDIUM | MEDIUM | NO |
| 16 | S003 | UNI | 180 | 5 | 6 | 40 | 10OCT2026 | OCT2026 | -235 | 20 | LOW | HIGH | YES |
| 17 | S004 | UNI | 150 | 1 | 2 | 78 | 15OCT2026 | OCT2026 | -240 | 80 | HIGH | HIGH | NO |
| 18 | S005 | RECRUIT | 60 | 0 | 0 | 92 | 20OCT2026 | OCT2026 | -245 | 100 | HIGH | LOW | NO |
| 19 | S006 | RECRUIT | 75 | 4 | 5 | 55 | 22OCT2026 | OCT2026 | -247 | 35 | LOW | LOW | NO |
| 20 | S007 | CERT | 120 | 3 | 2 | 68 | 25OCT2026 | OCT2026 | -250 | 60 | MEDIUM | MEDIUM | NO |
| 21 | S008 | UNI | 180 | 6 | 8 | 30 | 28OCT2026 | OCT2026 | -253 | 0 | LOW | HIGH | YES |
| 22 | S009 | CERT | 90 | 1 | 1 | 80 | 30OCT2026 | OCT2026 | -255 | 85 | HIGH | MEDIUM | NO |
| 23 | S010 | RECRUIT | 60 | 2 | 4 | 60 | 02NOV2026 | NOV2026 | -258 | 60 | MEDIUM | LOW | NO |
| 24 | S011 | UNI | 150 | 7 | 9 | 25 | 05NOV2026 | NOV2026 | -261 | 0 | LOW | HIGH | YES |
| 25 | S012 | CERT | 120 | 0 | 0 | 88 | 08NOV2026 | NOV2026 | -264 | 100 | HIGH | MEDIUM | NO |
| 26 | S013 | RECRUIT | 90 | 1 | 1 | 82 | 10NOV2026 | NOV2026 | -266 | 85 | HIGH | MEDIUM | NO |
Why this is wrong
·
You are appending the same dataset into itself
·
No new records are added
·
Interviewers will treat this as a logic mistake
·
In real projects, this is never done
CREATE A NEW DATASET
data proctor_new_sessions;
length Session_ID $6 Exam_Type $10 Fraud_Flag $3 Integrity_Level $6;
input Session_ID $ Exam_Type $ Exam_Date : date9. Duration Violations_Detected
Network_Issues Candidate_Score Integrity_Percentage Integrity_Level $
Fraud_Flag $;
format Exam_Date date9.;
datalines;
S014 CERT 12NOV2026 120 1 1 83 85 HIGH NO
S015 UNI 14NOV2026 180 6 7 35 30 LOW YES
;
run;
proc print data=proctor_new_sessions;
run;
OUTPUT:
| Obs | Session_ID | Exam_Type | Fraud_Flag | Integrity_Level | Exam_Date | Duration | Violations_Detected | Network_Issues | Candidate_Score | Integrity_Percentage |
|---|---|---|---|---|---|---|---|---|---|---|
| 1 | S014 | CERT | NO | HIGH | 12NOV2026 | 120 | 1 | 1 | 83 | 85 |
| 2 | S015 | UNI | YES | LOW | 14NOV2026 | 180 | 6 | 7 | 35 | 30 |
APPEND NEW DATA INTO MAIN DATASET
proc append base=proctor_fraud
data=proctor_new_sessions
force;
run;
proc print data=proctor_fraud;
run;
OUTPUT:
| Obs | Session_ID | Exam_Type | Duration | Violations_Detected | Network_Issues | Candidate_Score | Exam_Date | Exam_Month | Days_From_Today | Integrity_Percentage | Integrity_Level | Utilization_Class | Fraud_Flag |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | S001 | CERT | 120 | 0 | 1 | 85 | 01OCT2026 | OCT2026 | -226 | 95 | HIGH | MEDIUM | NO |
| 2 | S002 | CERT | 90 | 2 | 3 | 65 | 05OCT2026 | OCT2026 | -230 | 65 | MEDIUM | MEDIUM | NO |
| 3 | S003 | UNI | 180 | 5 | 6 | 40 | 10OCT2026 | OCT2026 | -235 | 20 | LOW | HIGH | YES |
| 4 | S004 | UNI | 150 | 1 | 2 | 78 | 15OCT2026 | OCT2026 | -240 | 80 | HIGH | HIGH | NO |
| 5 | S005 | RECRUIT | 60 | 0 | 0 | 92 | 20OCT2026 | OCT2026 | -245 | 100 | HIGH | LOW | NO |
| 6 | S006 | RECRUIT | 75 | 4 | 5 | 55 | 22OCT2026 | OCT2026 | -247 | 35 | LOW | LOW | NO |
| 7 | S007 | CERT | 120 | 3 | 2 | 68 | 25OCT2026 | OCT2026 | -250 | 60 | MEDIUM | MEDIUM | NO |
| 8 | S008 | UNI | 180 | 6 | 8 | 30 | 28OCT2026 | OCT2026 | -253 | 0 | LOW | HIGH | YES |
| 9 | S009 | CERT | 90 | 1 | 1 | 80 | 30OCT2026 | OCT2026 | -255 | 85 | HIGH | MEDIUM | NO |
| 10 | S010 | RECRUIT | 60 | 2 | 4 | 60 | 02NOV2026 | NOV2026 | -258 | 60 | MEDIUM | LOW | NO |
| 11 | S011 | UNI | 150 | 7 | 9 | 25 | 05NOV2026 | NOV2026 | -261 | 0 | LOW | HIGH | YES |
| 12 | S012 | CERT | 120 | 0 | 0 | 88 | 08NOV2026 | NOV2026 | -264 | 100 | HIGH | MEDIUM | NO |
| 13 | S013 | RECRUIT | 90 | 1 | 1 | 82 | 10NOV2026 | NOV2026 | -266 | 85 | HIGH | MEDIUM | NO |
| 14 | S014 | CERT | 120 | 1 | 1 | 83 | 12NOV2026 | . | . | 85 | HIGH | NO | |
| 15 | S015 | UNI | 180 | 6 | 7 | 35 | 14NOV2026 | . | . | 30 | LOW | YES |
Logic
·
New exam sessions arrive periodically
·
Existing dataset = historical data
·
New dataset = incremental data
·
PROC APPEND is used for fast row-wise addition
Technical
Logic
·
BASE= → target dataset
·
DATA= → incoming dataset
·
FORCE → allows append even if variable
order differs
PROC SORT
proc sort data=proctor_fraud;by Exam_Type;run;
proc print data=proctor_fraud;
run;
OUTPUT:
| Obs | Session_ID | Exam_Type | Duration | Violations_Detected | Network_Issues | Candidate_Score | Exam_Date | Exam_Month | Days_From_Today | Integrity_Percentage | Integrity_Level | Utilization_Class | Fraud_Flag |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | S001 | CERT | 120 | 0 | 1 | 85 | 01OCT2026 | OCT2026 | -226 | 95 | HIGH | MEDIUM | NO |
| 2 | S002 | CERT | 90 | 2 | 3 | 65 | 05OCT2026 | OCT2026 | -230 | 65 | MEDIUM | MEDIUM | NO |
| 3 | S007 | CERT | 120 | 3 | 2 | 68 | 25OCT2026 | OCT2026 | -250 | 60 | MEDIUM | MEDIUM | NO |
| 4 | S009 | CERT | 90 | 1 | 1 | 80 | 30OCT2026 | OCT2026 | -255 | 85 | HIGH | MEDIUM | NO |
| 5 | S012 | CERT | 120 | 0 | 0 | 88 | 08NOV2026 | NOV2026 | -264 | 100 | HIGH | MEDIUM | NO |
| 6 | S014 | CERT | 120 | 1 | 1 | 83 | 12NOV2026 | . | . | 85 | HIGH | NO | |
| 7 | S005 | RECRUIT | 60 | 0 | 0 | 92 | 20OCT2026 | OCT2026 | -245 | 100 | HIGH | LOW | NO |
| 8 | S006 | RECRUIT | 75 | 4 | 5 | 55 | 22OCT2026 | OCT2026 | -247 | 35 | LOW | LOW | NO |
| 9 | S010 | RECRUIT | 60 | 2 | 4 | 60 | 02NOV2026 | NOV2026 | -258 | 60 | MEDIUM | LOW | NO |
| 10 | S013 | RECRUIT | 90 | 1 | 1 | 82 | 10NOV2026 | NOV2026 | -266 | 85 | HIGH | MEDIUM | NO |
| 11 | S003 | UNI | 180 | 5 | 6 | 40 | 10OCT2026 | OCT2026 | -235 | 20 | LOW | HIGH | YES |
| 12 | S004 | UNI | 150 | 1 | 2 | 78 | 15OCT2026 | OCT2026 | -240 | 80 | HIGH | HIGH | NO |
| 13 | S008 | UNI | 180 | 6 | 8 | 30 | 28OCT2026 | OCT2026 | -253 | 0 | LOW | HIGH | YES |
| 14 | S011 | UNI | 150 | 7 | 9 | 25 | 05NOV2026 | NOV2026 | -261 | 0 | LOW | HIGH | YES |
| 15 | S015 | UNI | 180 | 6 | 7 | 35 | 14NOV2026 | . | . | 30 | LOW | YES |
proc sort data=score_summary;by Exam_Type;run;
proc print data=score_summary;
run;
OUTPUT:
| Obs | Exam_Type | Avg_Score | Avg_Integrity |
|---|---|---|---|
| 1 | CERT | 77.20 | 81 |
| 2 | RECRUIT | 72.25 | 70 |
| 3 | UNI | 43.25 | 25 |
MERGE
data merged_data;
merge proctor_fraud
score_summary;
by Exam_Type;
run;
proc print data=merged_data;
run;
OUTPUT:
| Obs | Session_ID | Exam_Type | Duration | Violations_Detected | Network_Issues | Candidate_Score | Exam_Date | Exam_Month | Days_From_Today | Integrity_Percentage | Integrity_Level | Utilization_Class | Fraud_Flag | Avg_Score | Avg_Integrity |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | S001 | CERT | 120 | 0 | 1 | 85 | 01OCT2026 | OCT2026 | -226 | 95 | HIGH | MEDIUM | NO | 77.20 | 81 |
| 2 | S002 | CERT | 90 | 2 | 3 | 65 | 05OCT2026 | OCT2026 | -230 | 65 | MEDIUM | MEDIUM | NO | 77.20 | 81 |
| 3 | S007 | CERT | 120 | 3 | 2 | 68 | 25OCT2026 | OCT2026 | -250 | 60 | MEDIUM | MEDIUM | NO | 77.20 | 81 |
| 4 | S009 | CERT | 90 | 1 | 1 | 80 | 30OCT2026 | OCT2026 | -255 | 85 | HIGH | MEDIUM | NO | 77.20 | 81 |
| 5 | S012 | CERT | 120 | 0 | 0 | 88 | 08NOV2026 | NOV2026 | -264 | 100 | HIGH | MEDIUM | NO | 77.20 | 81 |
| 6 | S014 | CERT | 120 | 1 | 1 | 83 | 12NOV2026 | . | . | 85 | HIGH | NO | 77.20 | 81 | |
| 7 | S005 | RECRUIT | 60 | 0 | 0 | 92 | 20OCT2026 | OCT2026 | -245 | 100 | HIGH | LOW | NO | 72.25 | 70 |
| 8 | S006 | RECRUIT | 75 | 4 | 5 | 55 | 22OCT2026 | OCT2026 | -247 | 35 | LOW | LOW | NO | 72.25 | 70 |
| 9 | S010 | RECRUIT | 60 | 2 | 4 | 60 | 02NOV2026 | NOV2026 | -258 | 60 | MEDIUM | LOW | NO | 72.25 | 70 |
| 10 | S013 | RECRUIT | 90 | 1 | 1 | 82 | 10NOV2026 | NOV2026 | -266 | 85 | HIGH | MEDIUM | NO | 72.25 | 70 |
| 11 | S003 | UNI | 180 | 5 | 6 | 40 | 10OCT2026 | OCT2026 | -235 | 20 | LOW | HIGH | YES | 43.25 | 25 |
| 12 | S004 | UNI | 150 | 1 | 2 | 78 | 15OCT2026 | OCT2026 | -240 | 80 | HIGH | HIGH | NO | 43.25 | 25 |
| 13 | S008 | UNI | 180 | 6 | 8 | 30 | 28OCT2026 | OCT2026 | -253 | 0 | LOW | HIGH | YES | 43.25 | 25 |
| 14 | S011 | UNI | 150 | 7 | 9 | 25 | 05NOV2026 | NOV2026 | -261 | 0 | LOW | HIGH | YES | 43.25 | 25 |
| 15 | S015 | UNI | 180 | 6 | 7 | 35 | 14NOV2026 | . | . | 30 | LOW | YES | 43.25 | 25 |
17. CHARACTER FUNCTIONS
data function_demo;
set proctor_fraud;
Exam_Type_Upper = upcase(Exam_Type);
Exam_Type_Lower = lowcase(Exam_Type);
Clean_Type = propcase(strip(Exam_Type));
Combined_ID = catx("-", Session_ID, Exam_Type);
run;
proc print data=function_demo;
var Exam_Type Exam_Type_Upper Exam_Type_Lower Clean_Type Session_ID Combined_ID;
run;
OUTPUT:
| Obs | Exam_Type | Exam_Type_Upper | Exam_Type_Lower | Clean_Type | Session_ID | Combined_ID |
|---|---|---|---|---|---|---|
| 1 | CERT | CERT | cert | Cert | S001 | S001-CERT |
| 2 | CERT | CERT | cert | Cert | S002 | S002-CERT |
| 3 | CERT | CERT | cert | Cert | S007 | S007-CERT |
| 4 | CERT | CERT | cert | Cert | S009 | S009-CERT |
| 5 | CERT | CERT | cert | Cert | S012 | S012-CERT |
| 6 | CERT | CERT | cert | Cert | S014 | S014-CERT |
| 7 | RECRUIT | RECRUIT | recruit | Recruit | S005 | S005-RECRUIT |
| 8 | RECRUIT | RECRUIT | recruit | Recruit | S006 | S006-RECRUIT |
| 9 | RECRUIT | RECRUIT | recruit | Recruit | S010 | S010-RECRUIT |
| 10 | RECRUIT | RECRUIT | recruit | Recruit | S013 | S013-RECRUIT |
| 11 | UNI | UNI | uni | Uni | S003 | S003-UNI |
| 12 | UNI | UNI | uni | Uni | S004 | S004-UNI |
| 13 | UNI | UNI | uni | Uni | S008 | S008-UNI |
| 14 | UNI | UNI | uni | Uni | S011 | S011-UNI |
| 15 | UNI | UNI | uni | Uni | S015 | S015-UNI |
18. PROC DATASETS DELETE
proc datasets library=work nolist;
delete function_demo;
quit;
LOG:
NOTE: Deleting WORK.FUNCTION_DEMO (memtype=DATA).
19. TLF OVERVIEW (WHY TLF IS IMPORTANT)
In real
projects, analysis alone is not enough.
Stakeholders want outputs:
- Tables → summarized numbers
- Listings → detailed
subject/session-level data
- Figures → visual understanding
TLF is
mandatory in:
- Analytics reporting
- Clinical & non-clinical
domains
- Client deliverables
- SAS programmer interviews
20. TABLE 1 – EXAM TYPE WISE INTEGRITY & SCORE SUMMARY
proc sql;
create table tlf_table1 as
select Exam_Type,
count(Session_ID) as Total_Sessions,
mean(Candidate_Score) as Avg_Score format=6.2,
mean(Integrity_Percentage) as Avg_Integrity format=6.2,
sum(Violations_Detected) as Total_Violations
from proctor_fraud
group by Exam_Type;
quit;
proc print data=tlf_table1;
run;
OUTPUT:
| Obs | Exam_Type | Total_Sessions | Avg_Score | Avg_Integrity | Total_Violations |
|---|---|---|---|---|---|
| 1 | CERT | 6 | 78.17 | 81.67 | 7 |
| 2 | RECRUIT | 4 | 72.25 | 70.00 | 7 |
| 3 | UNI | 5 | 41.60 | 26.00 | 25 |
Explanation
·
PROC SQL → clean summary table creation
·
COUNT() → number of sessions per exam
type
·
MEAN() → average performance &
integrity
·
FORMAT= → presentation-ready output
·
GROUP BY
Exam_Type →
business-level comparison
21. LISTING 1 – FRAUD SUSPECTED PROCTORING SESSIONS
proc print data=proctor_fraud noobs label;
where Fraud_Flag = "YES";
var Session_ID Exam_Type Exam_Date Duration
Violations_Detected Network_Issues
Candidate_Score Integrity_Level Fraud_Flag;
label
Session_ID = "Session ID"
Exam_Type = "Exam Type"
Exam_Date = "Exam Date"
Violations_Detected = "Violations"
Network_Issues = "Network Issues"
Candidate_Score = "Score"
Integrity_Level = "Integrity";
run;
OUTPUT:
| Session ID | Exam Type | Exam Date | Duration | Violations | Network Issues | Score | Integrity | Fraud_Flag |
|---|---|---|---|---|---|---|---|---|
| S003 | UNI | 10OCT2026 | 180 | 5 | 6 | 40 | LOW | YES |
| S008 | UNI | 28OCT2026 | 180 | 6 | 8 | 30 | LOW | YES |
| S011 | UNI | 05NOV2026 | 150 | 7 | 9 | 25 | LOW | YES |
| S015 | UNI | 14NOV2026 | 180 | 6 | 7 | 35 | LOW | YES |
Explanation
·
PROC
PRINT →
listing-level output
·
WHERE
Fraud_Flag="YES" → filters suspicious cases
·
VAR → controls column order
·
LABEL → business-friendly naming
22. LISTING 2 – LOW INTEGRITY SESSIONS (QUALITY RISK)
proc print data=proctor_fraud;
where Integrity_Level = "LOW";
var Session_ID Exam_Type Candidate_Score Integrity_Percentage;
run;
OUTPUT:
| Obs | Session_ID | Exam_Type | Candidate_Score | Integrity_Percentage |
|---|---|---|---|---|
| 8 | S006 | RECRUIT | 55 | 35 |
| 11 | S003 | UNI | 40 | 20 |
| 13 | S008 | UNI | 30 | 0 |
| 14 | S011 | UNI | 25 | 0 |
| 15 | S015 | UNI | 35 | 30 |
Explanation
·
Identifies high-risk candidates
·
Used by:
·
Academic review boards
·
Compliance teams
·
Simple filter → powerful insight
23. FIGURE 1 – VIOLATIONS vs CANDIDATE SCORE
proc sgplot data=proctor_fraud;
scatter x=Violations_Detected y=Candidate_Score;
xaxis label="Violations Detected";
yaxis label="Candidate Score";
run;
OUTPUT:
Explanation
·
PROC
SGPLOT → modern
SAS graphics
· SCATTER →
relationship analysis
·
Shows negative correlation visually
24. FIGURE 2 – INTEGRITY PERCENTAGE DISTRIBUTION
proc sgplot data=proctor_fraud;
histogram Integrity_Percentage;
density Integrity_Percentage;
run;
OUTPUT:
Explanation
·
HISTOGRAM → distribution of integrity
·
DENSITY → smooth curve overlay
·
Helps identify:
·
Majority behavior
·
Outliers
·
Risk thresholds
25. TABLE 2 – FRAUD FLAG SUMMARY
proc freq data=proctor_fraud;
tables Fraud_Flag Integrity_Level / nocum nopercent;
run;
OUTPUT:
The FREQ Procedure
| Fraud_Flag | Frequency |
|---|---|
| NO | 11 |
| YES | 4 |
| Integrity_Level | Frequency |
|---|---|
| HIGH | 7 |
| LOW | 5 |
| MEDIUM | 3 |
Explanation
·
PROC FREQ → categorical summary
·
NOCUM NOPERCENT → clean counts
·
Useful for:
·
Management dashboards
·
Compliance reporting
26. TLF QUALITY CHECK
proc means data=proctor_fraud n nmiss;
var Candidate_Score Integrity_Percentage;
run;
OUTPUT:
The MEANS Procedure
| Variable | N | N Miss |
|---|---|---|
Candidate_Score Integrity_Percentage | 15 15 | 0 0 |
Explanation
·
Checks missing values
·
Confirms data completeness
· Shows production readiness
27. 5 KEY POINTS OF THIS PROJECT
1.
This project simulates a online exam proctoring system, not a dummy dataset,
making it highly interview-relevant.
2.
It demonstrates how data issues happen in real life by intentionally
creating an error and fixing it professionally using SAS.
3.
Fraud detection and integrity classification are built
using clear business rules, not
assumptions, which mirrors industry expectations.
4.
The project covers the complete analytics lifecycle – data creation, cleaning,
analysis, reporting.
5.
The inclusion of TLF
(Tables, Listings, Figures) makes this project production-ready and
suitable for client or management reporting.
28. FINAL CONCLUSION
This Online Exam Proctoring Analytics project
shows how SAS can be used to solve real-world problems related to exam
integrity and fraud detection. Instead of focusing only on analysis, the
project covers the full journey of data handling—from raw data creation and error
identification to correction, analysis, and final reporting using TLFs. The
intentional data error highlights a common industry challenge and demonstrates
how a SAS programmer should identify and fix such issues confidently. By
applying macros, date functions, statistical procedures, and visualization
techniques, the project transforms raw proctoring data into meaningful
insights. The TLF section ensures the results are easy to understand for
non-technical stakeholders such as management, academic boards, or compliance
teams. Overall, this project is simple to explain, strong in logic, and highly
effective for interviews, training, and real analytics work.
INTERVIEW QUESTIONS FOR YOU
·
How do you optimize a slow-running SAS program?
·
What steps do you take if output is wrong?
·
How do you handle large datasets?
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
About the Author:
·
How do you optimize a slow-running SAS program?
·
What steps do you take if output is wrong?
·
How do you handle large datasets?
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
About the Author:
SAS Learning Hub is a data analytics and SAS programming platform focused on clinical, financial, and real-world data analysis. The content is created by professionals with academic training in Pharmaceutics and hands-on experience in Base SAS, PROC SQL, Macros, SDTM, and ADaM, providing practical and industry-relevant SAS learning resources.
Disclaimer:
The datasets and analysis in this article are created for educational and demonstration purposes only. They do not represent EXAM data.
Our Mission:
This blog provides industry-focused SAS programming tutorials and analytics projects covering finance, healthcare, and technology.
This project is suitable for:
· Students learning SAS
· Data analysts building portfolios
· Professionals preparing for SAS interviews
· Bloggers writing about analytics and Exams Reviewers and Observers
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Comments
Post a Comment