PROC FREQ Says Fraud Is High — Is It Real or a Data Issue? How Do You Validate It?

1. Introduction

2. Business Context of Online Exam Proctoring

3. Project Objectives and Analytical Scope

4. Dataset Design and Variable Rationale

5. Raw Data Creation with Intentional Data Issue

6. Identification and Explanation of the Data Error

7. Data Correction and Standardization

8. Derived Variables and Date Handling Logic (MDY, INTCK, INTNX)

9. Integrity and Utilization Classification Using SAS Macros

10. Fraud Detection Rules and Flagging Logic (Macros)

11. Descriptive Analytics Using PROC SQL

12. Statistical Summary Using PROC MEANS and PROC UNIVARIATE

13. Categorical Analysis Using PROC FREQ

14. Relationship Analysis Using PROC CORR

15. Data Visualization Using PROC SGPLOT

16. Advanced Data Handling Using SET, MERGE, APPEND, and TRANSPOSE

17. Character and Numeric Function Applications in SAS

18. Dataset Maintenance and Cleanup Using PROC DATASETS

19. TLF Overview – Reporting Strategy

20. Table Outputs – Summary and Performance Tables

21. Listing Outputs – Fraud and Low Integrity Session Listings

22. Figure Outputs – Score, Violation, and Integrity Visualizations

23. 5 Key Points Of This Project

24. Final Conclusion

1. INTRODUCTION

Online examinations have become the backbone of modern education, recruitment, and certification processes.
With this shift, online exam proctoring systems play a critical role in maintaining exam integrity, candidate authenticity, and fair evaluation.

This SAS project simulates real-world proctoring session data and applies advanced analytics, fraud detection rules, and quality control logic using Base SAS + PROC SQL + Macros.

2. BUSINESS CONTEXT

Organizations using online exams face challenges such as:

· Cheating attempts

· Network manipulation

· Multiple violations during exams

· Low integrity candidates passing exams

· Inconsistent proctoring enforcement

This project helps:

· EdTech companies

· Certification bodies

· Universities

· Recruitment platforms

to monitor, classify, and detect suspicious exam behavior using SAS.

3. PROJECT OBJECTIVES

1.Create a realistic proctoring dataset

2. Apply fraud detection logic

3. Classify integrity & utilization

4. Analyze relationships between violations and scores

5. Demonstrate interview-level SAS skills

4.DATASET DESIGN

Core Variables

Variable	Description
Session_ID	Unique proctoring session
Exam_Type	Certification / University / Recruitment
Exam_Date	Exam conducted date
Duration	Exam duration (minutes)
Violations_Detected	Count of suspicious actions
Network_Issues	Network interruptions
Candidate_Score	Final score
Integrity_Level	Derived category
Integrity_Percentage	Calculated metric
Fraud_Flag	Yes/No
Utilization_Class	Low / Medium / High

5. RAW DATA CREATION(INTENTIONAL ERROR INCLUDED)

data proctor_raw;

length Session_ID $6 Exam_Type $15 Exam_Date $10;

input Session_ID $ Exam_Type $ Exam_Date $ Duration Violations_Detected

Network_Issues Candidate_Score;

datalines;

S001 CERT 01-10-2026 120 0 1 85

S002 CERT 05-10-2026 90 2 3 65

S003 UNI 10-10-2026 180 5 6 40

S004 UNI 15-10-2026 150 1 2 78

S005 RECRUIT 20-10-2026 60 0 0 92

S006 RECRUIT 22-10-2026 75 4 5 55

S007 CERT 25-10-2026 120 3 2 68

S008 UNI 28-10-2026 180 6 8 30

S009 CERT 30-10-2026 90 1 1 80

S010 RECRUIT 02-11-2026 60 2 4 60

S011 UNI 05-11-2026 150 7 9 25

S012 CERT 08-11-2026 120 0 0 88

S013 RECRUIT 10-11-2026 90 1 1 82

;

run;

proc print data=proctor_raw;

run;

OUTPUT:


Obs	Session_ID	Exam_Type	Exam_Date	Duration	Violations_Detected	Network_Issues	Candidate_Score
1	S001	CERT	01-10-2026	120	0	1	85
2	S002	CERT	05-10-2026	90	2	3	65
3	S003	UNI	10-10-2026	180	5	6	40
4	S004	UNI	15-10-2026	150	1	2	78
5	S005	RECRUIT	20-10-2026	60	0	0	92
6	S006	RECRUIT	22-10-2026	75	4	5	55
7	S007	CERT	25-10-2026	120	3	2	68
8	S008	UNI	28-10-2026	180	6	8	30
9	S009	CERT	30-10-2026	90	1	1	80
10	S010	RECRUIT	02-11-2026	60	2	4	60
11	S011	UNI	05-11-2026	150	7	9	25
12	S012	CERT	08-11-2026	120	0	0	88
13	S013	RECRUIT	10-11-2026	90	1	1	82

6. IDENTIFYING THE ERROR

ERROR EXPLANATION

· Exam_Date is character, not numeric SAS date

· Functions like INTCK, INTNX, MDY will fail

· Date calculations will produce errors or missing values

7. CORRECTING THE ERROR (PROPER DATE HANDLING)

data proctor_clean;

set proctor_raw;

Exam_Date_Num = input(Exam_Date, ddmmyy10.);

format Exam_Date_Num date9.;

drop Exam_Date;

rename Exam_Date_Num = Exam_Date;

run;

proc print data=proctor_clean;

run;

OUTPUT:


Obs	Session_ID	Exam_Type	Duration	Violations_Detected	Network_Issues	Candidate_Score	Exam_Date
1	S001	CERT	120	0	1	85	01OCT2026
2	S002	CERT	90	2	3	65	05OCT2026
3	S003	UNI	180	5	6	40	10OCT2026
4	S004	UNI	150	1	2	78	15OCT2026
5	S005	RECRUIT	60	0	0	92	20OCT2026
6	S006	RECRUIT	75	4	5	55	22OCT2026
7	S007	CERT	120	3	2	68	25OCT2026
8	S008	UNI	180	6	8	30	28OCT2026
9	S009	CERT	90	1	1	80	30OCT2026
10	S010	RECRUIT	60	2	4	60	02NOV2026
11	S011	UNI	150	7	9	25	05NOV2026
12	S012	CERT	120	0	0	88	08NOV2026
13	S013	RECRUIT	90	1	1	82	10NOV2026

Why this works

· Converts character → numeric date

· Enables date arithmetic

· Best practice in clinical & analytics domains

8. DERIVED VARIABLES & DATE LOGIC

data proctor_dates;

set proctor_clean;

Exam_Month = intnx('month', Exam_Date, 0, 'b');

Days_From_Today = intck('day', Exam_Date, today());

format Exam_Month monyy7.;

run;

proc print data=proctor_dates;

run;

OUTPUT:


Obs	Session_ID	Exam_Type	Duration	Violations_Detected	Network_Issues	Candidate_Score	Exam_Date	Exam_Month	Days_From_Today
1	S001	CERT	120	0	1	85	01OCT2026	OCT2026	-226
2	S002	CERT	90	2	3	65	05OCT2026	OCT2026	-230
3	S003	UNI	180	5	6	40	10OCT2026	OCT2026	-235
4	S004	UNI	150	1	2	78	15OCT2026	OCT2026	-240
5	S005	RECRUIT	60	0	0	92	20OCT2026	OCT2026	-245
6	S006	RECRUIT	75	4	5	55	22OCT2026	OCT2026	-247
7	S007	CERT	120	3	2	68	25OCT2026	OCT2026	-250
8	S008	UNI	180	6	8	30	28OCT2026	OCT2026	-253
9	S009	CERT	90	1	1	80	30OCT2026	OCT2026	-255
10	S010	RECRUIT	60	2	4	60	02NOV2026	NOV2026	-258
11	S011	UNI	150	7	9	25	05NOV2026	NOV2026	-261
12	S012	CERT	120	0	0	88	08NOV2026	NOV2026	-264
13	S013	RECRUIT	90	1	1	82	10NOV2026	NOV2026	-266

9. INTEGRITY & UTILIZATION CLASSIFICATION (MACRO)

%macro classify;

data proctor_class;

set proctor_dates;

Integrity_Percentage = max(0, 100 - (Violations_Detected*10) - (Network_Issues*5));

length Integrity_Level $8.;

if Integrity_Percentage >= 80 then Integrity_Level = "HIGH";

else if Integrity_Percentage >= 50 then Integrity_Level = "MEDIUM";

else Integrity_Level = "LOW";

length Utilization_Class $8.;

if Duration >= 150 then Utilization_Class = "HIGH";

else if Duration >= 90 then Utilization_Class = "MEDIUM";

else Utilization_Class = "LOW";

run;

proc print data=proctor_class;

run;

%mend;

%classify;

OUTPUT:


Obs	Session_ID	Exam_Type	Duration	Violations_Detected	Network_Issues	Candidate_Score	Exam_Date	Exam_Month	Days_From_Today	Integrity_Percentage	Integrity_Level	Utilization_Class
1	S001	CERT	120	0	1	85	01OCT2026	OCT2026	-226	95	HIGH	MEDIUM
2	S002	CERT	90	2	3	65	05OCT2026	OCT2026	-230	65	MEDIUM	MEDIUM
3	S003	UNI	180	5	6	40	10OCT2026	OCT2026	-235	20	LOW	HIGH
4	S004	UNI	150	1	2	78	15OCT2026	OCT2026	-240	80	HIGH	HIGH
5	S005	RECRUIT	60	0	0	92	20OCT2026	OCT2026	-245	100	HIGH	LOW
6	S006	RECRUIT	75	4	5	55	22OCT2026	OCT2026	-247	35	LOW	LOW
7	S007	CERT	120	3	2	68	25OCT2026	OCT2026	-250	60	MEDIUM	MEDIUM
8	S008	UNI	180	6	8	30	28OCT2026	OCT2026	-253	0	LOW	HIGH
9	S009	CERT	90	1	1	80	30OCT2026	OCT2026	-255	85	HIGH	MEDIUM
10	S010	RECRUIT	60	2	4	60	02NOV2026	NOV2026	-258	60	MEDIUM	LOW
11	S011	UNI	150	7	9	25	05NOV2026	NOV2026	-261	0	LOW	HIGH
12	S012	CERT	120	0	0	88	08NOV2026	NOV2026	-264	100	HIGH	MEDIUM
13	S013	RECRUIT	90	1	1	82	10NOV2026	NOV2026	-266	85	HIGH	MEDIUM

10. FRAUD DETECTION LOGIC (MACRO)

%macro fraud;

data proctor_fraud;

set proctor_class;

if Violations_Detected >=5 or Network_Issues >=6 then Fraud_Flag = "YES";

else Fraud_Flag = "NO";

run;

proc print data=proctor_fraud;

run;

%mend;

%fraud;

OUTPUT:


Obs	Session_ID	Exam_Type	Duration	Violations_Detected	Network_Issues	Candidate_Score	Exam_Date	Exam_Month	Days_From_Today	Integrity_Percentage	Integrity_Level	Utilization_Class	Fraud_Flag
1	S001	CERT	120	0	1	85	01OCT2026	OCT2026	-226	95	HIGH	MEDIUM	NO
2	S002	CERT	90	2	3	65	05OCT2026	OCT2026	-230	65	MEDIUM	MEDIUM	NO
3	S003	UNI	180	5	6	40	10OCT2026	OCT2026	-235	20	LOW	HIGH	YES
4	S004	UNI	150	1	2	78	15OCT2026	OCT2026	-240	80	HIGH	HIGH	NO
5	S005	RECRUIT	60	0	0	92	20OCT2026	OCT2026	-245	100	HIGH	LOW	NO
6	S006	RECRUIT	75	4	5	55	22OCT2026	OCT2026	-247	35	LOW	LOW	NO
7	S007	CERT	120	3	2	68	25OCT2026	OCT2026	-250	60	MEDIUM	MEDIUM	NO
8	S008	UNI	180	6	8	30	28OCT2026	OCT2026	-253	0	LOW	HIGH	YES
9	S009	CERT	90	1	1	80	30OCT2026	OCT2026	-255	85	HIGH	MEDIUM	NO
10	S010	RECRUIT	60	2	4	60	02NOV2026	NOV2026	-258	60	MEDIUM	LOW	NO
11	S011	UNI	150	7	9	25	05NOV2026	NOV2026	-261	0	LOW	HIGH	YES
12	S012	CERT	120	0	0	88	08NOV2026	NOV2026	-264	100	HIGH	MEDIUM	NO
13	S013	RECRUIT	90	1	1	82	10NOV2026	NOV2026	-266	85	HIGH	MEDIUM	NO

11. PROC SQL ANALYSIS

proc sql;

create table score_summary as

select Exam_Type,

avg(Candidate_Score) as Avg_Score,

avg(Integrity_Percentage) as Avg_Integrity

from proctor_fraud

group by Exam_Type;

quit;

proc print data=score_summary;

run;

OUTPUT:


Obs	Exam_Type	Avg_Score	Avg_Integrity
1	CERT	77.20	81
2	RECRUIT	72.25	70
3	UNI	43.25	25

12. PROC MEANS & PROC UNIVARIATE

proc means data=proctor_fraud mean min max;

var Candidate_Score Violations_Detected Network_Issues;

run;

OUTPUT:

The MEANS Procedure


Variable	Mean	Minimum	Maximum
Candidate_Score Violations_Detected Network_Issues	65.2307692 2.4615385 3.2307692	25.0000000 0 0	92.0000000 7.0000000 9.0000000

proc univariate data=proctor_fraud;

var Integrity_Percentage;

run;

OUTPUT:

The UNIVARIATE Procedure

Variable: Integrity_Percentage


Moments
N	13	Sum Weights	13
Mean	60.3846154	Sum Observations	785
Std Deviation	35.9665086	Variance	1293.58974
Skewness	-0.6758585	Kurtosis	-0.886003
Uncorrected SS	62925	Corrected SS	15523.0769
Coeff Variation	59.562371	Std Error Mean	9.9753147


Basic Statistical Measures
Location		Variability
Mean	60.38462	Std Deviation	35.96651
Median	65.00000	Variance	1294
Mode	0.00000	Range	100.00000
		Interquartile Range	50.00000

Note: The mode displayed is the smallest of 4 modes with a count of 2.


Tests for Location: Mu0=0
Test	Statistic		p Value
Student's t	t	6.053405	Pr > \|t\|	<.0001
Sign	M	5.5	Pr >= \|M\|	0.0010
Signed Rank	S	33	Pr >= \|S\|	0.0010


Quantiles (Definition 5)
Level	Quantile
100% Max	100
99%	100
95%	100
90%	100
75% Q3	85
50% Median	65
25% Q1	35
10%	0
5%	0
1%	0
0% Min	0


Extreme Observations
Lowest		Highest
Value	Obs	Value	Obs
0	11	85	9
0	8	85	13
20	3	95	1
35	6	100	5
60	10	100	12

13. PROC FREQ

proc freq data=proctor_fraud;

tables Integrity_Level Fraud_Flag / nocum;

run;

OUTPUT:

The FREQ Procedure


Integrity_Level	Frequency	Percent
HIGH	6	46.15
LOW	4	30.77
MEDIUM	3	23.08


Fraud_Flag	Frequency	Percent
NO	10	76.92
YES	3	23.08

14. CORRELATION ANALYSIS

proc corr data=proctor_fraud;

var Violations_Detected Network_Issues Candidate_Score Integrity_Percentage;

run;

OUTPUT:

The CORR Procedure


4 Variables:	Violations_Detected Network_Issues Candidate_Score Integrity_Percentage


Simple Statistics
Variable	N	Mean	Std Dev	Sum	Minimum	Maximum
Violations_Detected	13	2.46154	2.36697	32.00000	0	7.00000
Network_Issues	13	3.23077	2.97640	42.00000	0	9.00000
Candidate_Score	13	65.23077	22.21919	848.00000	25.00000	92.00000
Integrity_Percentage	13	60.38462	35.96651	785.00000	0	100.00000


Pearson Correlation Coefficients, N = 13 Prob > \|r\| under H0: Rho=0
	Violations_Detected	Network_Issues	Candidate_Score	Integrity_Percentage
Violations_Detected	1.00000	0.96540 <.0001	-0.97984 <.0001	-0.99092 <.0001
Network_Issues	0.96540 <.0001	1.00000	-0.99004 <.0001	-0.98174 <.0001
Candidate_Score	-0.97984 <.0001	-0.99004 <.0001	1.00000	0.99156 <.0001
Integrity_Percentage	-0.99092 <.0001	-0.98174 <.0001	0.99156 <.0001	1.00000

15. VISUALIZATION – PROC SGPLOT

proc sgplot data=proctor_fraud;

scatter x=Violations_Detected y=Candidate_Score;

run;

OUTPUT:

16. TRANSPOSE, MERGE, APPEND

TRANSPOSE

proc transpose data=score_summary out=score_t;

id Exam_Type;

var Avg_Score;

run;

proc print data=score_t;

run;

OUTPUT:


Obs	_NAME_	CERT	RECRUIT	UNI
1	Avg_Score	77.2	72.25	43.25

APPEND(INTENTIONAL ERROR INCLUDED)

proc append base=proctor_fraud

data=proctor_fraud force;

run;

proc print data=proctor_fraud;

run;

OUTPUT:


Obs	Session_ID	Exam_Type	Duration	Violations_Detected	Network_Issues	Candidate_Score	Exam_Date	Exam_Month	Days_From_Today	Integrity_Percentage	Integrity_Level	Utilization_Class	Fraud_Flag
1	S001	CERT	120	0	1	85	01OCT2026	OCT2026	-226	95	HIGH	MEDIUM	NO
2	S002	CERT	90	2	3	65	05OCT2026	OCT2026	-230	65	MEDIUM	MEDIUM	NO
3	S003	UNI	180	5	6	40	10OCT2026	OCT2026	-235	20	LOW	HIGH	YES
4	S004	UNI	150	1	2	78	15OCT2026	OCT2026	-240	80	HIGH	HIGH	NO
5	S005	RECRUIT	60	0	0	92	20OCT2026	OCT2026	-245	100	HIGH	LOW	NO
6	S006	RECRUIT	75	4	5	55	22OCT2026	OCT2026	-247	35	LOW	LOW	NO
7	S007	CERT	120	3	2	68	25OCT2026	OCT2026	-250	60	MEDIUM	MEDIUM	NO
8	S008	UNI	180	6	8	30	28OCT2026	OCT2026	-253	0	LOW	HIGH	YES
9	S009	CERT	90	1	1	80	30OCT2026	OCT2026	-255	85	HIGH	MEDIUM	NO
10	S010	RECRUIT	60	2	4	60	02NOV2026	NOV2026	-258	60	MEDIUM	LOW	NO
11	S011	UNI	150	7	9	25	05NOV2026	NOV2026	-261	0	LOW	HIGH	YES
12	S012	CERT	120	0	0	88	08NOV2026	NOV2026	-264	100	HIGH	MEDIUM	NO
13	S013	RECRUIT	90	1	1	82	10NOV2026	NOV2026	-266	85	HIGH	MEDIUM	NO
14	S001	CERT	120	0	1	85	01OCT2026	OCT2026	-226	95	HIGH	MEDIUM	NO
15	S002	CERT	90	2	3	65	05OCT2026	OCT2026	-230	65	MEDIUM	MEDIUM	NO
16	S003	UNI	180	5	6	40	10OCT2026	OCT2026	-235	20	LOW	HIGH	YES
17	S004	UNI	150	1	2	78	15OCT2026	OCT2026	-240	80	HIGH	HIGH	NO
18	S005	RECRUIT	60	0	0	92	20OCT2026	OCT2026	-245	100	HIGH	LOW	NO
19	S006	RECRUIT	75	4	5	55	22OCT2026	OCT2026	-247	35	LOW	LOW	NO
20	S007	CERT	120	3	2	68	25OCT2026	OCT2026	-250	60	MEDIUM	MEDIUM	NO
21	S008	UNI	180	6	8	30	28OCT2026	OCT2026	-253	0	LOW	HIGH	YES
22	S009	CERT	90	1	1	80	30OCT2026	OCT2026	-255	85	HIGH	MEDIUM	NO
23	S010	RECRUIT	60	2	4	60	02NOV2026	NOV2026	-258	60	MEDIUM	LOW	NO
24	S011	UNI	150	7	9	25	05NOV2026	NOV2026	-261	0	LOW	HIGH	YES
25	S012	CERT	120	0	0	88	08NOV2026	NOV2026	-264	100	HIGH	MEDIUM	NO
26	S013	RECRUIT	90	1	1	82	10NOV2026	NOV2026	-266	85	HIGH	MEDIUM	NO

Why this is wrong

· You are appending the same dataset into itself

· No new records are added

· Interviewers will treat this as a logic mistake

· In real projects, this is never done

CORRECT VERSION OF APPEND

CREATE A NEW DATASET

data proctor_new_sessions;

length Session_ID $6 Exam_Type $10 Fraud_Flag $3 Integrity_Level $6;

input Session_ID $ Exam_Type $ Exam_Date : date9. Duration Violations_Detected

Network_Issues Candidate_Score Integrity_Percentage Integrity_Level $

Fraud_Flag $;

format Exam_Date date9.;

datalines;

S014 CERT 12NOV2026 120 1 1 83 85 HIGH NO

S015 UNI 14NOV2026 180 6 7 35 30 LOW YES

;

run;

proc print data=proctor_new_sessions;

run;

OUTPUT:


Obs	Session_ID	Exam_Type	Fraud_Flag	Integrity_Level	Exam_Date	Duration	Violations_Detected	Network_Issues	Candidate_Score	Integrity_Percentage
1	S014	CERT	NO	HIGH	12NOV2026	120	1	1	83	85
2	S015	UNI	YES	LOW	14NOV2026	180	6	7	35	30

APPEND NEW DATA INTO MAIN DATASET

proc append base=proctor_fraud

data=proctor_new_sessions

force;

run;

proc print data=proctor_fraud;

run;

OUTPUT:


Obs	Session_ID	Exam_Type	Duration	Violations_Detected	Network_Issues	Candidate_Score	Exam_Date	Exam_Month	Days_From_Today	Integrity_Percentage	Integrity_Level	Utilization_Class	Fraud_Flag
1	S001	CERT	120	0	1	85	01OCT2026	OCT2026	-226	95	HIGH	MEDIUM	NO
2	S002	CERT	90	2	3	65	05OCT2026	OCT2026	-230	65	MEDIUM	MEDIUM	NO
3	S003	UNI	180	5	6	40	10OCT2026	OCT2026	-235	20	LOW	HIGH	YES
4	S004	UNI	150	1	2	78	15OCT2026	OCT2026	-240	80	HIGH	HIGH	NO
5	S005	RECRUIT	60	0	0	92	20OCT2026	OCT2026	-245	100	HIGH	LOW	NO
6	S006	RECRUIT	75	4	5	55	22OCT2026	OCT2026	-247	35	LOW	LOW	NO
7	S007	CERT	120	3	2	68	25OCT2026	OCT2026	-250	60	MEDIUM	MEDIUM	NO
8	S008	UNI	180	6	8	30	28OCT2026	OCT2026	-253	0	LOW	HIGH	YES
9	S009	CERT	90	1	1	80	30OCT2026	OCT2026	-255	85	HIGH	MEDIUM	NO
10	S010	RECRUIT	60	2	4	60	02NOV2026	NOV2026	-258	60	MEDIUM	LOW	NO
11	S011	UNI	150	7	9	25	05NOV2026	NOV2026	-261	0	LOW	HIGH	YES
12	S012	CERT	120	0	0	88	08NOV2026	NOV2026	-264	100	HIGH	MEDIUM	NO
13	S013	RECRUIT	90	1	1	82	10NOV2026	NOV2026	-266	85	HIGH	MEDIUM	NO
14	S014	CERT	120	1	1	83	12NOV2026	.	.	85	HIGH		NO
15	S015	UNI	180	6	7	35	14NOV2026	.	.	30	LOW		YES

Logic

· New exam sessions arrive periodically

· Existing dataset = historical data

· New dataset = incremental data

· PROC APPEND is used for fast row-wise addition

Technical Logic

· BASE= → target dataset

· DATA= → incoming dataset

· FORCE → allows append even if variable order differs

PROC SORT

proc sort data=proctor_fraud;by Exam_Type;run;

proc print data=proctor_fraud;

run;

OUTPUT:


Obs	Session_ID	Exam_Type	Duration	Violations_Detected	Network_Issues	Candidate_Score	Exam_Date	Exam_Month	Days_From_Today	Integrity_Percentage	Integrity_Level	Utilization_Class	Fraud_Flag
1	S001	CERT	120	0	1	85	01OCT2026	OCT2026	-226	95	HIGH	MEDIUM	NO
2	S002	CERT	90	2	3	65	05OCT2026	OCT2026	-230	65	MEDIUM	MEDIUM	NO
3	S007	CERT	120	3	2	68	25OCT2026	OCT2026	-250	60	MEDIUM	MEDIUM	NO
4	S009	CERT	90	1	1	80	30OCT2026	OCT2026	-255	85	HIGH	MEDIUM	NO
5	S012	CERT	120	0	0	88	08NOV2026	NOV2026	-264	100	HIGH	MEDIUM	NO
6	S014	CERT	120	1	1	83	12NOV2026	.	.	85	HIGH		NO
7	S005	RECRUIT	60	0	0	92	20OCT2026	OCT2026	-245	100	HIGH	LOW	NO
8	S006	RECRUIT	75	4	5	55	22OCT2026	OCT2026	-247	35	LOW	LOW	NO
9	S010	RECRUIT	60	2	4	60	02NOV2026	NOV2026	-258	60	MEDIUM	LOW	NO
10	S013	RECRUIT	90	1	1	82	10NOV2026	NOV2026	-266	85	HIGH	MEDIUM	NO
11	S003	UNI	180	5	6	40	10OCT2026	OCT2026	-235	20	LOW	HIGH	YES
12	S004	UNI	150	1	2	78	15OCT2026	OCT2026	-240	80	HIGH	HIGH	NO
13	S008	UNI	180	6	8	30	28OCT2026	OCT2026	-253	0	LOW	HIGH	YES
14	S011	UNI	150	7	9	25	05NOV2026	NOV2026	-261	0	LOW	HIGH	YES
15	S015	UNI	180	6	7	35	14NOV2026	.	.	30	LOW		YES

proc sort data=score_summary;by Exam_Type;run;

proc print data=score_summary;

run;

OUTPUT:


Obs	Exam_Type	Avg_Score	Avg_Integrity
1	CERT	77.20	81
2	RECRUIT	72.25	70
3	UNI	43.25	25

MERGE

data merged_data;

merge proctor_fraud

score_summary;

by Exam_Type;

run;

proc print data=merged_data;

run;

OUTPUT:


Obs	Session_ID	Exam_Type	Duration	Violations_Detected	Network_Issues	Candidate_Score	Exam_Date	Exam_Month	Days_From_Today	Integrity_Percentage	Integrity_Level	Utilization_Class	Fraud_Flag	Avg_Score	Avg_Integrity
1	S001	CERT	120	0	1	85	01OCT2026	OCT2026	-226	95	HIGH	MEDIUM	NO	77.20	81
2	S002	CERT	90	2	3	65	05OCT2026	OCT2026	-230	65	MEDIUM	MEDIUM	NO	77.20	81
3	S007	CERT	120	3	2	68	25OCT2026	OCT2026	-250	60	MEDIUM	MEDIUM	NO	77.20	81
4	S009	CERT	90	1	1	80	30OCT2026	OCT2026	-255	85	HIGH	MEDIUM	NO	77.20	81
5	S012	CERT	120	0	0	88	08NOV2026	NOV2026	-264	100	HIGH	MEDIUM	NO	77.20	81
6	S014	CERT	120	1	1	83	12NOV2026	.	.	85	HIGH		NO	77.20	81
7	S005	RECRUIT	60	0	0	92	20OCT2026	OCT2026	-245	100	HIGH	LOW	NO	72.25	70
8	S006	RECRUIT	75	4	5	55	22OCT2026	OCT2026	-247	35	LOW	LOW	NO	72.25	70
9	S010	RECRUIT	60	2	4	60	02NOV2026	NOV2026	-258	60	MEDIUM	LOW	NO	72.25	70
10	S013	RECRUIT	90	1	1	82	10NOV2026	NOV2026	-266	85	HIGH	MEDIUM	NO	72.25	70
11	S003	UNI	180	5	6	40	10OCT2026	OCT2026	-235	20	LOW	HIGH	YES	43.25	25
12	S004	UNI	150	1	2	78	15OCT2026	OCT2026	-240	80	HIGH	HIGH	NO	43.25	25
13	S008	UNI	180	6	8	30	28OCT2026	OCT2026	-253	0	LOW	HIGH	YES	43.25	25
14	S011	UNI	150	7	9	25	05NOV2026	NOV2026	-261	0	LOW	HIGH	YES	43.25	25
15	S015	UNI	180	6	7	35	14NOV2026	.	.	30	LOW		YES	43.25	25

17. CHARACTER FUNCTIONS

data function_demo;

set proctor_fraud;

Exam_Type_Upper = upcase(Exam_Type);

Exam_Type_Lower = lowcase(Exam_Type);

Clean_Type = propcase(strip(Exam_Type));

Combined_ID = catx("-", Session_ID, Exam_Type);

run;

proc print data=function_demo;

var Exam_Type Exam_Type_Upper Exam_Type_Lower Clean_Type Session_ID Combined_ID;

run;

OUTPUT:


Obs	Exam_Type	Exam_Type_Upper	Exam_Type_Lower	Clean_Type	Session_ID	Combined_ID
1	CERT	CERT	cert	Cert	S001	S001-CERT
2	CERT	CERT	cert	Cert	S002	S002-CERT
3	CERT	CERT	cert	Cert	S007	S007-CERT
4	CERT	CERT	cert	Cert	S009	S009-CERT
5	CERT	CERT	cert	Cert	S012	S012-CERT
6	CERT	CERT	cert	Cert	S014	S014-CERT
7	RECRUIT	RECRUIT	recruit	Recruit	S005	S005-RECRUIT
8	RECRUIT	RECRUIT	recruit	Recruit	S006	S006-RECRUIT
9	RECRUIT	RECRUIT	recruit	Recruit	S010	S010-RECRUIT
10	RECRUIT	RECRUIT	recruit	Recruit	S013	S013-RECRUIT
11	UNI	UNI	uni	Uni	S003	S003-UNI
12	UNI	UNI	uni	Uni	S004	S004-UNI
13	UNI	UNI	uni	Uni	S008	S008-UNI
14	UNI	UNI	uni	Uni	S011	S011-UNI
15	UNI	UNI	uni	Uni	S015	S015-UNI

18. PROC DATASETS DELETE

proc datasets library=work nolist;

delete function_demo;

quit;

LOG:

NOTE: Deleting WORK.FUNCTION_DEMO (memtype=DATA).

19. TLF OVERVIEW (WHY TLF IS IMPORTANT)

In real projects, analysis alone is not enough.
Stakeholders want outputs:

Tables → summarized numbers
Listings → detailed subject/session-level data
Figures → visual understanding

TLF is mandatory in:

Analytics reporting
Clinical & non-clinical domains
Client deliverables
SAS programmer interviews

20. TABLE 1 – EXAM TYPE WISE INTEGRITY & SCORE SUMMARY

proc sql;

create table tlf_table1 as

select Exam_Type,

count(Session_ID) as Total_Sessions,

mean(Candidate_Score) as Avg_Score format=6.2,

mean(Integrity_Percentage) as Avg_Integrity format=6.2,

sum(Violations_Detected) as Total_Violations

from proctor_fraud

group by Exam_Type;

quit;

proc print data=tlf_table1;

run;

OUTPUT:


Obs	Exam_Type	Total_Sessions	Avg_Score	Avg_Integrity	Total_Violations
1	CERT	6	78.17	81.67	7
2	RECRUIT	4	72.25	70.00	7
3	UNI	5	41.60	26.00	25

Explanation

· PROC SQL → clean summary table creation

· COUNT() → number of sessions per exam type

· MEAN() → average performance & integrity

· FORMAT= → presentation-ready output

· GROUP BY Exam_Type → business-level comparison

21. LISTING 1 – FRAUD SUSPECTED PROCTORING SESSIONS

proc print data=proctor_fraud noobs label;

where Fraud_Flag = "YES";

var Session_ID Exam_Type Exam_Date Duration

Violations_Detected Network_Issues

Candidate_Score Integrity_Level Fraud_Flag;

label

Session_ID = "Session ID"

Exam_Type = "Exam Type"

Exam_Date = "Exam Date"

Violations_Detected = "Violations"

Network_Issues = "Network Issues"

Candidate_Score = "Score"

Integrity_Level = "Integrity";

run;

OUTPUT:


Session ID	Exam Type	Exam Date	Duration	Violations	Network Issues	Score	Integrity	Fraud_Flag
S003	UNI	10OCT2026	180	5	6	40	LOW	YES
S008	UNI	28OCT2026	180	6	8	30	LOW	YES
S011	UNI	05NOV2026	150	7	9	25	LOW	YES
S015	UNI	14NOV2026	180	6	7	35	LOW	YES

Explanation

· PROC PRINT → listing-level output

· WHERE Fraud_Flag="YES" → filters suspicious cases

· VAR → controls column order

· LABEL → business-friendly naming

22. LISTING 2 – LOW INTEGRITY SESSIONS (QUALITY RISK)

proc print data=proctor_fraud;

where Integrity_Level = "LOW";

var Session_ID Exam_Type Candidate_Score Integrity_Percentage;

run;

OUTPUT:


Obs	Session_ID	Exam_Type	Candidate_Score	Integrity_Percentage
8	S006	RECRUIT	55	35
11	S003	UNI	40	20
13	S008	UNI	30	0
14	S011	UNI	25	0
15	S015	UNI	35	30

Explanation

· Identifies high-risk candidates

· Used by:

· Academic review boards

· Compliance teams

· Simple filter → powerful insight

23. FIGURE 1 – VIOLATIONS vs CANDIDATE SCORE

proc sgplot data=proctor_fraud;

scatter x=Violations_Detected y=Candidate_Score;

xaxis label="Violations Detected";

yaxis label="Candidate Score";

run;

OUTPUT:

Explanation

· PROC SGPLOT → modern SAS graphics

· SCATTER → relationship analysis

· Shows negative correlation visually

24. FIGURE 2 – INTEGRITY PERCENTAGE DISTRIBUTION

proc sgplot data=proctor_fraud;

histogram Integrity_Percentage;

density Integrity_Percentage;

run;

OUTPUT:

Explanation

· HISTOGRAM → distribution of integrity

· DENSITY → smooth curve overlay

· Helps identify:

· Majority behavior

· Outliers

· Risk thresholds

25. TABLE 2 – FRAUD FLAG SUMMARY

proc freq data=proctor_fraud;

tables Fraud_Flag Integrity_Level / nocum nopercent;

run;

OUTPUT:

The FREQ Procedure


Fraud_Flag	Frequency
NO	11
YES	4


Integrity_Level	Frequency
HIGH	7
LOW	5
MEDIUM	3

Explanation

· PROC FREQ → categorical summary

· NOCUM NOPERCENT → clean counts

· Useful for:

· Management dashboards

· Compliance reporting

26. TLF QUALITY CHECK

proc means data=proctor_fraud n nmiss;

var Candidate_Score Integrity_Percentage;

run;

OUTPUT:

The MEANS Procedure


Variable	N	N Miss
Candidate_Score Integrity_Percentage	15 15	0 0

Explanation

· Checks missing values

· Confirms data completeness

· Shows production readiness

27. 5 KEY POINTS OF THIS PROJECT

1. This project simulates a online exam proctoring system, not a dummy dataset, making it highly interview-relevant.

2. It demonstrates how data issues happen in real life by intentionally creating an error and fixing it professionally using SAS.

3. Fraud detection and integrity classification are built using clear business rules, not assumptions, which mirrors industry expectations.

4. The project covers the complete analytics lifecycle – data creation, cleaning, analysis, reporting.

5. The inclusion of TLF (Tables, Listings, Figures) makes this project production-ready and suitable for client or management reporting.

28. FINAL CONCLUSION

This Online Exam Proctoring Analytics project shows how SAS can be used to solve real-world problems related to exam integrity and fraud detection. Instead of focusing only on analysis, the project covers the full journey of data handling—from raw data creation and error identification to correction, analysis, and final reporting using TLFs. The intentional data error highlights a common industry challenge and demonstrates how a SAS programmer should identify and fix such issues confidently. By applying macros, date functions, statistical procedures, and visualization techniques, the project transforms raw proctoring data into meaningful insights. The TLF section ensures the results are easy to understand for non-technical stakeholders such as management, academic boards, or compliance teams. Overall, this project is simple to explain, strong in logic, and highly effective for interviews, training, and real analytics work.

INTERVIEW QUESTIONS FOR YOU

· How do you optimize a slow-running SAS program?
· What steps do you take if output is wrong?

· How do you handle large datasets?
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
About the Author:
SAS Learning Hub is a data analytics and SAS programming platform focused on clinical, financial, and real-world data analysis. The content is created by professionals with academic training in Pharmaceutics and hands-on experience in Base SAS, PROC SQL, Macros, SDTM, and ADaM, providing practical and industry-relevant SAS learning resources.

Disclaimer:
The datasets and analysis in this article are created for educational and demonstration purposes only. They do not represent EXAM data.

Our Mission:
This blog provides industry-focused SAS programming tutorials and analytics projects covering finance, healthcare, and technology.

This project is suitable for:
·  Students learning SAS
·  Data analysts building portfolios
·  Professionals preparing for SAS interviews
·  Bloggers writing about analytics and Exams Reviewers and Observers

--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Follow Us On :

--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

--->Follow our blog for more SAS-based analytics projects and industry data models.

---> Support Us By Following Our Blog..

To deepen your understanding of SAS analytics, please refer to our other data science and industry-focused projects listed below:

1.Can we analyze planets using SAS like NASA scientists?
2.Can SAS Compare How 3 Different Families Live in Their Homes?
3.Can SAS Help Us Choose the Most Efficient Cooler?
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

About Us | Contact | Privacy Policy

Search This Blog

SAS Learning Hub

403.PROC FREQ Says Fraud Is High — Is It Real or a Data Issue? How Do You Validate It?

PROC FREQ Says Fraud Is High — Is It Real or a Data Issue? How Do You Validate It?

TABLE OF CONTENTS

1. INTRODUCTION

2. BUSINESS CONTEXT

3. PROJECT OBJECTIVES

4.DATASET DESIGN

Core Variables

27. 5 KEY POINTS OF THIS PROJECT

28. FINAL CONCLUSION

INTERVIEW QUESTIONS FOR YOU

Follow Us On :

To deepen your understanding of SAS analytics, please refer to our other data science and industry-focused projects listed below:

About Us | Contact | Privacy Policy

Comments

Post a Comment

Popular posts from this blog

397.If a satellite has excellent signal strength but very high latency, can it still deliver good quality communication? Why or why not?A Sas Study

401.How Efficient Are Global Data Centers? A Complete SAS Analytics Study

383.Which Pharma Distributors Are High-Risk? Can SAS Detect Delays, Temperature Violations, and Fraud?