719.Can We Identify Suspicious Exam Behavior and Correct Data Issues Using Advanced SAS Techniques?
Smart Fraud Detection and Data Error Correction in Online Exams with SAS
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
HERE IN THIS PROJECT WE USED THESE SAS STATEMENTS —DATA STEP | LENGTH | INPUT | DATALINES | FORMAT | SET | IF-THEN- KEEP | DROP | MERGE | PROC SORT | PROC PRINT | PROC APPEND | PROC TRANSPOSE | PROC DATASETS | MACRO
Introduction
In the modern digital era, online assessments
have become an essential component of education, recruitment, and professional
certification systems. Universities, corporate training platforms, government
examination boards, and online learning providers rely heavily on digital
examinations to evaluate knowledge, skills, and competencies. These systems
allow thousands or even millions of candidates to take exams remotely, making
assessment processes faster, scalable, and cost-effective.
However, the rapid adoption of online
assessments has also introduced several challenges. Since the entire testing
process occurs digitally, large volumes of data are generated every day. This
data includes information such as assessment identifiers, subjects, average
scores, completion times, difficulty levels, reliability indices, and flags
indicating possible cheating behavior. Managing and analyzing this data
efficiently requires robust data processing tools and strong data quality
management practices.
One of the biggest concerns in online
assessments is data accuracy and fraud
detection. Candidates may attempt to cheat by using unauthorized
resources, collaborating with others, or manipulating the system to obtain
higher scores. Additionally, technical issues or data entry problems can
introduce errors such as missing values, incorrect scores, invalid completion
times, or inconsistent text formats. If these errors remain undetected, they
can lead to incorrect conclusions, unfair evaluations, and unreliable
reporting.
To address these challenges, organizations
rely on advanced data analytics tools such as SAS (Statistical Analysis System). SAS is widely used in
data management, statistical analysis, fraud detection, and large-scale data
processing across industries such as healthcare, banking, pharmaceuticals,
education, and government agencies.
Using SAS programming techniques, analysts can
perform several important tasks, including:
·
Creating structured datasets from raw
data
·
Detecting and correcting data quality
issues
·
Standardizing inconsistent text and
numeric values
·
Performing date calculations and
time-based analysis
·
Combining data from multiple sources
·
Automating fraud detection logic using
macros
·
Transforming data structures for
reporting and analysis
This project demonstrates a practical SAS-based workflow for managing online
assessment data. A simulated dataset is created containing several variables
such as assessment identifiers, subject names, average scores, completion
times, cheating indicators, difficulty levels, reliability indices,
percentages, fees, and examination dates. To replicate real-world scenarios,
multiple intentional errors are introduced into the dataset, including invalid
scores, missing values, incorrect date formats, and inconsistent subject names.
The project then walks through the process of
identifying these errors and correcting them using various SAS programming
techniques. Functions for handling character strings and numeric values are
applied to clean the dataset, while date functions such as MDY, INTCK, and INTNX are used to
perform time-based calculations. Dataset manipulation techniques such as SET, MERGE, APPEND, and PROC TRANSPOSE
are used to combine and restructure the data.
Furthermore, fraud detection logic is implemented
using SAS macros. This allows the system to automatically identify suspicious
assessment patterns, such as unusually high scores combined with extremely
short completion times. Such automated detection systems are commonly used in
modern examination platforms to ensure fairness and maintain the integrity of
the evaluation process.
By the end of this project, readers will gain
a clear understanding of how SAS can be used to perform data cleaning, fraud detection, dataset
management, and automated analytics in online assessment systems.The
project also demonstrates how real-world data issues can be systematically
identified and resolved using structured programming approaches.
Table Of Contents
1.
Creating Dataset With Intentional Errors
2.
Errors Present In The Dataset
3.
Corrected Dataset Creation
4.
Utilization Classification
5.
Date Calculations Using INTCK and INTNX
6.
Fraud Detection Macro
7.
Combining Datasets Using MERGE
8.
Appending New Data Using PROC APPEND
9.
Data Restructuring Using PROC TRANSPOSE
10. Character
Functions Implementation
11. KEEP
and DROP Statements
12. Deleting
Temporary Dataset Using PROC DATASETS
13. Creating
Second Dataset (Assessment Monitoring)
14. SET
Statement For Data Concatenation
15. MERGE
Statement For Data Integration
16. Data
Standardization Using Character Functions
17. Numeric
Functions Implementation
18. Date
Handling Using MDY, INTCK, and INTNX
19. Fraud
Detection Macro Automation
20. Risk
Classification
21. PROC
TRANSPOSE For Data Transformation
22. PROC
DATASETS DELETE For Workspace Cleanup
23. Final
Analytical Dataset
24. Best
Practices For SAS Data Management
Business Context
Consider an online examination platform used by universities and
certification agencies. Thousands of students take assessments every day across
different subjects such as Mathematics, Programming, Statistics, and Business
Analytics.
Each assessment generates a record containing
several attributes such as:
·
Assessment ID
·
Subject name
·
Average score
·
Completion time
·
Difficulty level
·
Reliability index
·
Cheating flag
·
Fees paid
·
Assessment date
Online learning platforms and digital examinations have changed how organizations
conduct assessments. Traditional pen-and-paper exams are gradually being
replaced by online systems that allow candidates to take tests remotely using
computers or mobile devices. Universities, certification bodies, recruitment
agencies, and corporate training programs widely use these platforms to
evaluate candidates efficiently.
Online assessments generate large amounts of
data such as scores, completion time, subject difficulty, reliability index,
and cheating indicators. This data helps organizations analyze exam quality and
detect suspicious behavior. However, raw data often contains issues like
missing values, incorrect formats, or inconsistent entries.
To
ensure accurate analysis, tools like SAS
are used to clean data, standardize variables, merge datasets, and implement
fraud detection logic. In this project, a simulated online assessment dataset
with intentional errors is used to demonstrate how SAS techniques can identify
and correct data issues while preparing the dataset for reliable analysis.
The analytics team must analyze this dataset to answer several questions:
·
Which assessments have suspicious
completion times?
·
Are there abnormal score patterns
indicating cheating?
·
Are the data values valid and correctly
formatted?
·
What percentage of students completed
the exam successfully?
Dataset Structure
The
dataset will contain the following variables.
|
Variable |
Description |
|
Assessment_ID |
Unique
ID for each assessment |
|
Subject |
Subject
name |
|
Avg_Score |
Average
score obtained |
|
Completion_Time |
Time
taken to complete exam |
|
Cheating_Flag |
Indicates
suspicious behavior |
|
Difficulty_Level |
Easy
Medium Hard |
|
Reliability_Index |
Reliability
measure of assessment |
|
Percentage |
Score
percentage |
|
Fees |
Assessment
fee |
|
Utilization_Class |
Performance
classification |
|
Exam_Date |
Assessment
date |
1. Creating Dataset With Intentional Errors
data online_assessment_raw;
length Assessment_ID $10 Subject $20 Difficulty_Level $10 Cheating_Flag $5;
input Assessment_ID $ Subject $ Avg_Score Completion_Time Cheating_Flag $
Difficulty_Level $ Reliability_Index Percentage Fees Exam_Date:ddmmyy10.;
format Exam_Date date9.;
datalines;
A001 math 85 35 no easy 0.95 85 500 01-01-2025
A002 science 78 40 no medium 0.90 78 600 02-01-2025
A003 statistics 92 20 yes hard 0.88 92 700 03-01-2025
A004 programming 66 50 no medium 0.75 66 800 04-01-2025
A005 mathematics 110 10 yes easy 0.96 110 500 05-01-2025
A006 analytics . 45 no medium 0.91 70 650 06-01-2025
A007 data_science 89 -10 no hard 0.87 89 900 07-01-2025
A008 sql 77 30 no medium 0.93 77 750 08-01-2025
A009 python 84 25 no hard 0.92 84 800 09-01-2025
A010 machine_learning 91 15 yes hard 0.85 91 950 10-01-2025
A011 ai 88 28 no medium 0.90 88 900 11-01-2025
A012 deep_learning 95 22 yes hard 0.86 95 1000 12-01-2025
A013 statistics 82 35 no medium 0.91 82 700 13-01-2025
A014 business 74 40 no easy 0.89 74 600 14-01-2025
A015 finance 69 48 no medium 0.88 69 650 15-01-2025
;
run;
proc print data=online_assessment_raw;
run;
OUTPUT:
| Obs | Assessment_ID | Subject | Difficulty_Level | Cheating_Flag | Avg_Score | Completion_Time | Reliability_Index | Percentage | Fees | Exam_Date |
|---|---|---|---|---|---|---|---|---|---|---|
| 1 | A001 | math | easy | no | 85 | 35 | 0.95 | 85 | 500 | 01JAN2025 |
| 2 | A002 | science | medium | no | 78 | 40 | 0.90 | 78 | 600 | 02JAN2025 |
| 3 | A003 | statistics | hard | yes | 92 | 20 | 0.88 | 92 | 700 | 03JAN2025 |
| 4 | A004 | programming | medium | no | 66 | 50 | 0.75 | 66 | 800 | 04JAN2025 |
| 5 | A005 | mathematics | easy | yes | 110 | 10 | 0.96 | 110 | 500 | 05JAN2025 |
| 6 | A006 | analytics | medium | no | . | 45 | 0.91 | 70 | 650 | 06JAN2025 |
| 7 | A007 | data_science | hard | no | 89 | -10 | 0.87 | 89 | 900 | 07JAN2025 |
| 8 | A008 | sql | medium | no | 77 | 30 | 0.93 | 77 | 750 | 08JAN2025 |
| 9 | A009 | python | hard | no | 84 | 25 | 0.92 | 84 | 800 | 09JAN2025 |
| 10 | A010 | machine_learning | hard | yes | 91 | 15 | 0.85 | 91 | 950 | 10JAN2025 |
| 11 | A011 | ai | medium | no | 88 | 28 | 0.90 | 88 | 900 | 11JAN2025 |
| 12 | A012 | deep_learning | hard | yes | 95 | 22 | 0.86 | 95 | 1000 | 12JAN2025 |
| 13 | A013 | statistics | medium | no | 82 | 35 | 0.91 | 82 | 700 | 13JAN2025 |
| 14 | A014 | business | easy | no | 74 | 40 | 0.89 | 74 | 600 | 14JAN2025 |
| 15 | A015 | finance | medium | no | 69 | 48 | 0.88 | 69 | 650 | 15JAN2025 |
Errors Present In The
Dataset
Intentional
errors exist in this dataset.
Error 1 – Incorrect Date Format
The
dataset uses 01-01-2025 while SAS expects dates using DATE9 or numeric SAS date values.
This will
produce incorrect date interpretation.
Error 2 – Impossible
Score
A005 mathematics 110
Score
cannot exceed 100.
Error 3 – Missing Value
Avg_Score .
Missing
score must be handled properly.
Error 4 – Negative
Completion Time
Completion_Time -10
This is
logically impossible.
Error 5 – Inconsistent
Subject Names
Examples:
math
mathematics
statistics
data_science
These
should be standardized.
2. Corrected Dataset Creation
data online_assessment_clean;
set online_assessment_raw;
Subject = propcase(Subject);
if Subject="Math" then Subject="Mathematics";
Completion_Time = abs(Completion_Time);
if Avg_Score > 100 then Avg_Score = 100;
Avg_Score = coalesce(Avg_Score,70);
run;
proc print data=online_assessment_clean;
run;
OUTPUT:
| Obs | Assessment_ID | Subject | Difficulty_Level | Cheating_Flag | Avg_Score | Completion_Time | Reliability_Index | Percentage | Fees | Exam_Date |
|---|---|---|---|---|---|---|---|---|---|---|
| 1 | A001 | Mathematics | easy | no | 85 | 35 | 0.95 | 85 | 500 | 01JAN2025 |
| 2 | A002 | Science | medium | no | 78 | 40 | 0.90 | 78 | 600 | 02JAN2025 |
| 3 | A003 | Statistics | hard | yes | 92 | 20 | 0.88 | 92 | 700 | 03JAN2025 |
| 4 | A004 | Programming | medium | no | 66 | 50 | 0.75 | 66 | 800 | 04JAN2025 |
| 5 | A005 | Mathematics | easy | yes | 100 | 10 | 0.96 | 110 | 500 | 05JAN2025 |
| 6 | A006 | Analytics | medium | no | 70 | 45 | 0.91 | 70 | 650 | 06JAN2025 |
| 7 | A007 | Data_science | hard | no | 89 | 10 | 0.87 | 89 | 900 | 07JAN2025 |
| 8 | A008 | Sql | medium | no | 77 | 30 | 0.93 | 77 | 750 | 08JAN2025 |
| 9 | A009 | Python | hard | no | 84 | 25 | 0.92 | 84 | 800 | 09JAN2025 |
| 10 | A010 | Machine_learning | hard | yes | 91 | 15 | 0.85 | 91 | 950 | 10JAN2025 |
| 11 | A011 | Ai | medium | no | 88 | 28 | 0.90 | 88 | 900 | 11JAN2025 |
| 12 | A012 | Deep_learning | hard | yes | 95 | 22 | 0.86 | 95 | 1000 | 12JAN2025 |
| 13 | A013 | Statistics | medium | no | 82 | 35 | 0.91 | 82 | 700 | 13JAN2025 |
| 14 | A014 | Business | easy | no | 74 | 40 | 0.89 | 74 | 600 | 14JAN2025 |
| 15 | A015 | Finance | medium | no | 69 | 48 | 0.88 | 69 | 650 | 15JAN2025 |
DATA Step
data online_assessment_clean;
Creates a
new cleaned dataset.
SET Statement
set online_assessment_raw;
Reads
observations from raw dataset.
PROPCASE Function
Subject = propcase(Subject);
Converts
text to proper case.
Example:
math → Math
statistics → Statistics
ABS Function
Completion_Time =
abs(Completion_Time);
Converts negative
numbers into positive numbers.
COALESCE Function
Avg_Score = coalesce(Avg_Score,70);
Replaces
missing score with default value.
Corrected
the dataset containing several intentional errors.
The corrections included:
·
Converting subject names to proper case
·
Handling missing scores
·
Fixing negative completion time
·
Correcting invalid scores above 100
·
Converting date values properly
3. Utilization Classification
data assessment_utilization;
set online_assessment_clean;
length Utilization_Class $20;
if Percentage >=90 then Utilization_Class="Excellent";
else if Percentage >=75 then Utilization_Class="Good";
else if Percentage >=60 then Utilization_Class="Average";
else Utilization_Class="Poor";
run;
proc print data=assessment_utilization;
run;
OUTPUT:
| Obs | Assessment_ID | Subject | Difficulty_Level | Cheating_Flag | Avg_Score | Completion_Time | Reliability_Index | Percentage | Fees | Exam_Date | Utilization_Class |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | A001 | Mathematics | easy | no | 85 | 35 | 0.95 | 85 | 500 | 01JAN2025 | Good |
| 2 | A002 | Science | medium | no | 78 | 40 | 0.90 | 78 | 600 | 02JAN2025 | Good |
| 3 | A003 | Statistics | hard | yes | 92 | 20 | 0.88 | 92 | 700 | 03JAN2025 | Excellent |
| 4 | A004 | Programming | medium | no | 66 | 50 | 0.75 | 66 | 800 | 04JAN2025 | Average |
| 5 | A005 | Mathematics | easy | yes | 100 | 10 | 0.96 | 110 | 500 | 05JAN2025 | Excellent |
| 6 | A006 | Analytics | medium | no | 70 | 45 | 0.91 | 70 | 650 | 06JAN2025 | Average |
| 7 | A007 | Data_science | hard | no | 89 | 10 | 0.87 | 89 | 900 | 07JAN2025 | Good |
| 8 | A008 | Sql | medium | no | 77 | 30 | 0.93 | 77 | 750 | 08JAN2025 | Good |
| 9 | A009 | Python | hard | no | 84 | 25 | 0.92 | 84 | 800 | 09JAN2025 | Good |
| 10 | A010 | Machine_learning | hard | yes | 91 | 15 | 0.85 | 91 | 950 | 10JAN2025 | Excellent |
| 11 | A011 | Ai | medium | no | 88 | 28 | 0.90 | 88 | 900 | 11JAN2025 | Good |
| 12 | A012 | Deep_learning | hard | yes | 95 | 22 | 0.86 | 95 | 1000 | 12JAN2025 | Excellent |
| 13 | A013 | Statistics | medium | no | 82 | 35 | 0.91 | 82 | 700 | 13JAN2025 | Good |
| 14 | A014 | Business | easy | no | 74 | 40 | 0.89 | 74 | 600 | 14JAN2025 | Average |
| 15 | A015 | Finance | medium | no | 69 | 48 | 0.88 | 69 | 650 | 15JAN2025 | Average |
4. Date Calculations Using INTCK,INTNX
data assessment_duration;
set assessment_utilization;
Months_Since_Exam = intck('month',Exam_Date,today());
Next_Assessment = intnx('month',Exam_Date,6,'same');
run;
proc print data=assessment_duration;
run;
OUTPUT:
| Obs | Assessment_ID | Subject | Difficulty_Level | Cheating_Flag | Avg_Score | Completion_Time | Reliability_Index | Percentage | Fees | Exam_Date | Utilization_Class | Months_Since_Exam | Next_Assessment |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | A001 | Mathematics | easy | no | 85 | 35 | 0.95 | 85 | 500 | 01JAN2025 | Good | 14 | 23923 |
| 2 | A002 | Science | medium | no | 78 | 40 | 0.90 | 78 | 600 | 02JAN2025 | Good | 14 | 23924 |
| 3 | A003 | Statistics | hard | yes | 92 | 20 | 0.88 | 92 | 700 | 03JAN2025 | Excellent | 14 | 23925 |
| 4 | A004 | Programming | medium | no | 66 | 50 | 0.75 | 66 | 800 | 04JAN2025 | Average | 14 | 23926 |
| 5 | A005 | Mathematics | easy | yes | 100 | 10 | 0.96 | 110 | 500 | 05JAN2025 | Excellent | 14 | 23927 |
| 6 | A006 | Analytics | medium | no | 70 | 45 | 0.91 | 70 | 650 | 06JAN2025 | Average | 14 | 23928 |
| 7 | A007 | Data_science | hard | no | 89 | 10 | 0.87 | 89 | 900 | 07JAN2025 | Good | 14 | 23929 |
| 8 | A008 | Sql | medium | no | 77 | 30 | 0.93 | 77 | 750 | 08JAN2025 | Good | 14 | 23930 |
| 9 | A009 | Python | hard | no | 84 | 25 | 0.92 | 84 | 800 | 09JAN2025 | Good | 14 | 23931 |
| 10 | A010 | Machine_learning | hard | yes | 91 | 15 | 0.85 | 91 | 950 | 10JAN2025 | Excellent | 14 | 23932 |
| 11 | A011 | Ai | medium | no | 88 | 28 | 0.90 | 88 | 900 | 11JAN2025 | Good | 14 | 23933 |
| 12 | A012 | Deep_learning | hard | yes | 95 | 22 | 0.86 | 95 | 1000 | 12JAN2025 | Excellent | 14 | 23934 |
| 13 | A013 | Statistics | medium | no | 82 | 35 | 0.91 | 82 | 700 | 13JAN2025 | Good | 14 | 23935 |
| 14 | A014 | Business | easy | no | 74 | 40 | 0.89 | 74 | 600 | 14JAN2025 | Average | 14 | 23936 |
| 15 | A015 | Finance | medium | no | 69 | 48 | 0.88 | 69 | 650 | 15JAN2025 | Average | 14 | 23937 |
INTCK
calculates the number of months between exam date and current date.
Predicts
next exam after 6 months.
5. Fraud Detection Macro
%macro detect_fraud;
data fraud_cases;
set assessment_duration;
length Fraud_Flag $12;
if Completion_Time < 15 and Avg_Score > 90 then Fraud_Flag="High Risk";
else if Cheating_Flag="yes" then Fraud_Flag="Medium Risk";
else Fraud_Flag="Low Risk";
run;
proc print data=fraud_cases;
run;
%mend;
%detect_fraud;
OUTPUT:
| Obs | Assessment_ID | Subject | Difficulty_Level | Cheating_Flag | Avg_Score | Completion_Time | Reliability_Index | Percentage | Fees | Exam_Date | Utilization_Class | Months_Since_Exam | Next_Assessment | Fraud_Flag |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | A001 | Mathematics | easy | no | 85 | 35 | 0.95 | 85 | 500 | 01JAN2025 | Good | 14 | 23923 | Low Risk |
| 2 | A002 | Science | medium | no | 78 | 40 | 0.90 | 78 | 600 | 02JAN2025 | Good | 14 | 23924 | Low Risk |
| 3 | A003 | Statistics | hard | yes | 92 | 20 | 0.88 | 92 | 700 | 03JAN2025 | Excellent | 14 | 23925 | Medium Risk |
| 4 | A004 | Programming | medium | no | 66 | 50 | 0.75 | 66 | 800 | 04JAN2025 | Average | 14 | 23926 | Low Risk |
| 5 | A005 | Mathematics | easy | yes | 100 | 10 | 0.96 | 110 | 500 | 05JAN2025 | Excellent | 14 | 23927 | High Risk |
| 6 | A006 | Analytics | medium | no | 70 | 45 | 0.91 | 70 | 650 | 06JAN2025 | Average | 14 | 23928 | Low Risk |
| 7 | A007 | Data_science | hard | no | 89 | 10 | 0.87 | 89 | 900 | 07JAN2025 | Good | 14 | 23929 | Low Risk |
| 8 | A008 | Sql | medium | no | 77 | 30 | 0.93 | 77 | 750 | 08JAN2025 | Good | 14 | 23930 | Low Risk |
| 9 | A009 | Python | hard | no | 84 | 25 | 0.92 | 84 | 800 | 09JAN2025 | Good | 14 | 23931 | Low Risk |
| 10 | A010 | Machine_learning | hard | yes | 91 | 15 | 0.85 | 91 | 950 | 10JAN2025 | Excellent | 14 | 23932 | Medium Risk |
| 11 | A011 | Ai | medium | no | 88 | 28 | 0.90 | 88 | 900 | 11JAN2025 | Good | 14 | 23933 | Low Risk |
| 12 | A012 | Deep_learning | hard | yes | 95 | 22 | 0.86 | 95 | 1000 | 12JAN2025 | Excellent | 14 | 23934 | Medium Risk |
| 13 | A013 | Statistics | medium | no | 82 | 35 | 0.91 | 82 | 700 | 13JAN2025 | Good | 14 | 23935 | Low Risk |
| 14 | A014 | Business | easy | no | 74 | 40 | 0.89 | 74 | 600 | 14JAN2025 | Average | 14 | 23936 | Low Risk |
| 15 | A015 | Finance | medium | no | 69 | 48 | 0.88 | 69 | 650 | 15JAN2025 | Average | 14 | 23937 | Low Risk |
Fraud is detected using rules:
Rule 1
Very high score + very short completion time.
Rule 2
Cheating flag already marked.
Rule 3
Otherwise considered normal.
6. Combining Datasets Using MERGE
proc sort data=fraud_cases;by Assessment_ID;run;
proc print data=fraud_cases;
run;
OUTPUT:
| Obs | Assessment_ID | Subject | Difficulty_Level | Cheating_Flag | Avg_Score | Completion_Time | Reliability_Index | Percentage | Fees | Exam_Date | Utilization_Class | Months_Since_Exam | Next_Assessment | Fraud_Flag |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | A001 | Mathematics | easy | no | 85 | 35 | 0.95 | 85 | 500 | 01JAN2025 | Good | 14 | 23923 | Low Risk |
| 2 | A002 | Science | medium | no | 78 | 40 | 0.90 | 78 | 600 | 02JAN2025 | Good | 14 | 23924 | Low Risk |
| 3 | A003 | Statistics | hard | yes | 92 | 20 | 0.88 | 92 | 700 | 03JAN2025 | Excellent | 14 | 23925 | Medium Risk |
| 4 | A004 | Programming | medium | no | 66 | 50 | 0.75 | 66 | 800 | 04JAN2025 | Average | 14 | 23926 | Low Risk |
| 5 | A005 | Mathematics | easy | yes | 100 | 10 | 0.96 | 110 | 500 | 05JAN2025 | Excellent | 14 | 23927 | High Risk |
| 6 | A006 | Analytics | medium | no | 70 | 45 | 0.91 | 70 | 650 | 06JAN2025 | Average | 14 | 23928 | Low Risk |
| 7 | A007 | Data_science | hard | no | 89 | 10 | 0.87 | 89 | 900 | 07JAN2025 | Good | 14 | 23929 | Low Risk |
| 8 | A008 | Sql | medium | no | 77 | 30 | 0.93 | 77 | 750 | 08JAN2025 | Good | 14 | 23930 | Low Risk |
| 9 | A009 | Python | hard | no | 84 | 25 | 0.92 | 84 | 800 | 09JAN2025 | Good | 14 | 23931 | Low Risk |
| 10 | A010 | Machine_learning | hard | yes | 91 | 15 | 0.85 | 91 | 950 | 10JAN2025 | Excellent | 14 | 23932 | Medium Risk |
| 11 | A011 | Ai | medium | no | 88 | 28 | 0.90 | 88 | 900 | 11JAN2025 | Good | 14 | 23933 | Low Risk |
| 12 | A012 | Deep_learning | hard | yes | 95 | 22 | 0.86 | 95 | 1000 | 12JAN2025 | Excellent | 14 | 23934 | Medium Risk |
| 13 | A013 | Statistics | medium | no | 82 | 35 | 0.91 | 82 | 700 | 13JAN2025 | Good | 14 | 23935 | Low Risk |
| 14 | A014 | Business | easy | no | 74 | 40 | 0.89 | 74 | 600 | 14JAN2025 | Average | 14 | 23936 | Low Risk |
| 15 | A015 | Finance | medium | no | 69 | 48 | 0.88 | 69 | 650 | 15JAN2025 | Average | 14 | 23937 | Low Risk |
proc sort data=assessment_duration;by Assessment_ID;run;
proc print data=assessment_duration;
run;
OUTPUT:
| Obs | Assessment_ID | Subject | Difficulty_Level | Cheating_Flag | Avg_Score | Completion_Time | Reliability_Index | Percentage | Fees | Exam_Date | Utilization_Class | Months_Since_Exam | Next_Assessment |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | A001 | Mathematics | easy | no | 85 | 35 | 0.95 | 85 | 500 | 01JAN2025 | Good | 14 | 23923 |
| 2 | A002 | Science | medium | no | 78 | 40 | 0.90 | 78 | 600 | 02JAN2025 | Good | 14 | 23924 |
| 3 | A003 | Statistics | hard | yes | 92 | 20 | 0.88 | 92 | 700 | 03JAN2025 | Excellent | 14 | 23925 |
| 4 | A004 | Programming | medium | no | 66 | 50 | 0.75 | 66 | 800 | 04JAN2025 | Average | 14 | 23926 |
| 5 | A005 | Mathematics | easy | yes | 100 | 10 | 0.96 | 110 | 500 | 05JAN2025 | Excellent | 14 | 23927 |
| 6 | A006 | Analytics | medium | no | 70 | 45 | 0.91 | 70 | 650 | 06JAN2025 | Average | 14 | 23928 |
| 7 | A007 | Data_science | hard | no | 89 | 10 | 0.87 | 89 | 900 | 07JAN2025 | Good | 14 | 23929 |
| 8 | A008 | Sql | medium | no | 77 | 30 | 0.93 | 77 | 750 | 08JAN2025 | Good | 14 | 23930 |
| 9 | A009 | Python | hard | no | 84 | 25 | 0.92 | 84 | 800 | 09JAN2025 | Good | 14 | 23931 |
| 10 | A010 | Machine_learning | hard | yes | 91 | 15 | 0.85 | 91 | 950 | 10JAN2025 | Excellent | 14 | 23932 |
| 11 | A011 | Ai | medium | no | 88 | 28 | 0.90 | 88 | 900 | 11JAN2025 | Good | 14 | 23933 |
| 12 | A012 | Deep_learning | hard | yes | 95 | 22 | 0.86 | 95 | 1000 | 12JAN2025 | Excellent | 14 | 23934 |
| 13 | A013 | Statistics | medium | no | 82 | 35 | 0.91 | 82 | 700 | 13JAN2025 | Good | 14 | 23935 |
| 14 | A014 | Business | easy | no | 74 | 40 | 0.89 | 74 | 600 | 14JAN2025 | Average | 14 | 23936 |
| 15 | A015 | Finance | medium | no | 69 | 48 | 0.88 | 69 | 650 | 15JAN2025 | Average | 14 | 23937 |
data combined_dataset;
merge fraud_cases
assessment_duration;
by Assessment_ID;
run;
proc print data=combined_dataset;
run;
OUTPUT:
| Obs | Assessment_ID | Subject | Difficulty_Level | Cheating_Flag | Avg_Score | Completion_Time | Reliability_Index | Percentage | Fees | Exam_Date | Utilization_Class | Months_Since_Exam | Next_Assessment | Fraud_Flag |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | A001 | Mathematics | easy | no | 85 | 35 | 0.95 | 85 | 500 | 01JAN2025 | Good | 14 | 23923 | Low Risk |
| 2 | A002 | Science | medium | no | 78 | 40 | 0.90 | 78 | 600 | 02JAN2025 | Good | 14 | 23924 | Low Risk |
| 3 | A003 | Statistics | hard | yes | 92 | 20 | 0.88 | 92 | 700 | 03JAN2025 | Excellent | 14 | 23925 | Medium Risk |
| 4 | A004 | Programming | medium | no | 66 | 50 | 0.75 | 66 | 800 | 04JAN2025 | Average | 14 | 23926 | Low Risk |
| 5 | A005 | Mathematics | easy | yes | 100 | 10 | 0.96 | 110 | 500 | 05JAN2025 | Excellent | 14 | 23927 | High Risk |
| 6 | A006 | Analytics | medium | no | 70 | 45 | 0.91 | 70 | 650 | 06JAN2025 | Average | 14 | 23928 | Low Risk |
| 7 | A007 | Data_science | hard | no | 89 | 10 | 0.87 | 89 | 900 | 07JAN2025 | Good | 14 | 23929 | Low Risk |
| 8 | A008 | Sql | medium | no | 77 | 30 | 0.93 | 77 | 750 | 08JAN2025 | Good | 14 | 23930 | Low Risk |
| 9 | A009 | Python | hard | no | 84 | 25 | 0.92 | 84 | 800 | 09JAN2025 | Good | 14 | 23931 | Low Risk |
| 10 | A010 | Machine_learning | hard | yes | 91 | 15 | 0.85 | 91 | 950 | 10JAN2025 | Excellent | 14 | 23932 | Medium Risk |
| 11 | A011 | Ai | medium | no | 88 | 28 | 0.90 | 88 | 900 | 11JAN2025 | Good | 14 | 23933 | Low Risk |
| 12 | A012 | Deep_learning | hard | yes | 95 | 22 | 0.86 | 95 | 1000 | 12JAN2025 | Excellent | 14 | 23934 | Medium Risk |
| 13 | A013 | Statistics | medium | no | 82 | 35 | 0.91 | 82 | 700 | 13JAN2025 | Good | 14 | 23935 | Low Risk |
| 14 | A014 | Business | easy | no | 74 | 40 | 0.89 | 74 | 600 | 14JAN2025 | Average | 14 | 23936 | Low Risk |
| 15 | A015 | Finance | medium | no | 69 | 48 | 0.88 | 69 | 650 | 15JAN2025 | Average | 14 | 23937 | Low Risk |
MERGE combines datasets based on Assessment_ID.
7. Appending New Data
proc append base=combined_dataset
data=assessment_duration;
run;
proc print data=combined_dataset;
run;
OUTPUT:
| Obs | Assessment_ID | Subject | Difficulty_Level | Cheating_Flag | Avg_Score | Completion_Time | Reliability_Index | Percentage | Fees | Exam_Date | Utilization_Class | Months_Since_Exam | Next_Assessment | Fraud_Flag |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | A001 | Mathematics | easy | no | 85 | 35 | 0.95 | 85 | 500 | 01JAN2025 | Good | 14 | 23923 | Low Risk |
| 2 | A002 | Science | medium | no | 78 | 40 | 0.90 | 78 | 600 | 02JAN2025 | Good | 14 | 23924 | Low Risk |
| 3 | A003 | Statistics | hard | yes | 92 | 20 | 0.88 | 92 | 700 | 03JAN2025 | Excellent | 14 | 23925 | Medium Risk |
| 4 | A004 | Programming | medium | no | 66 | 50 | 0.75 | 66 | 800 | 04JAN2025 | Average | 14 | 23926 | Low Risk |
| 5 | A005 | Mathematics | easy | yes | 100 | 10 | 0.96 | 110 | 500 | 05JAN2025 | Excellent | 14 | 23927 | High Risk |
| 6 | A006 | Analytics | medium | no | 70 | 45 | 0.91 | 70 | 650 | 06JAN2025 | Average | 14 | 23928 | Low Risk |
| 7 | A007 | Data_science | hard | no | 89 | 10 | 0.87 | 89 | 900 | 07JAN2025 | Good | 14 | 23929 | Low Risk |
| 8 | A008 | Sql | medium | no | 77 | 30 | 0.93 | 77 | 750 | 08JAN2025 | Good | 14 | 23930 | Low Risk |
| 9 | A009 | Python | hard | no | 84 | 25 | 0.92 | 84 | 800 | 09JAN2025 | Good | 14 | 23931 | Low Risk |
| 10 | A010 | Machine_learning | hard | yes | 91 | 15 | 0.85 | 91 | 950 | 10JAN2025 | Excellent | 14 | 23932 | Medium Risk |
| 11 | A011 | Ai | medium | no | 88 | 28 | 0.90 | 88 | 900 | 11JAN2025 | Good | 14 | 23933 | Low Risk |
| 12 | A012 | Deep_learning | hard | yes | 95 | 22 | 0.86 | 95 | 1000 | 12JAN2025 | Excellent | 14 | 23934 | Medium Risk |
| 13 | A013 | Statistics | medium | no | 82 | 35 | 0.91 | 82 | 700 | 13JAN2025 | Good | 14 | 23935 | Low Risk |
| 14 | A014 | Business | easy | no | 74 | 40 | 0.89 | 74 | 600 | 14JAN2025 | Average | 14 | 23936 | Low Risk |
| 15 | A015 | Finance | medium | no | 69 | 48 | 0.88 | 69 | 650 | 15JAN2025 | Average | 14 | 23937 | Low Risk |
| 16 | A001 | Mathematics | easy | no | 85 | 35 | 0.95 | 85 | 500 | 01JAN2025 | Good | 14 | 23923 | |
| 17 | A002 | Science | medium | no | 78 | 40 | 0.90 | 78 | 600 | 02JAN2025 | Good | 14 | 23924 | |
| 18 | A003 | Statistics | hard | yes | 92 | 20 | 0.88 | 92 | 700 | 03JAN2025 | Excellent | 14 | 23925 | |
| 19 | A004 | Programming | medium | no | 66 | 50 | 0.75 | 66 | 800 | 04JAN2025 | Average | 14 | 23926 | |
| 20 | A005 | Mathematics | easy | yes | 100 | 10 | 0.96 | 110 | 500 | 05JAN2025 | Excellent | 14 | 23927 | |
| 21 | A006 | Analytics | medium | no | 70 | 45 | 0.91 | 70 | 650 | 06JAN2025 | Average | 14 | 23928 | |
| 22 | A007 | Data_science | hard | no | 89 | 10 | 0.87 | 89 | 900 | 07JAN2025 | Good | 14 | 23929 | |
| 23 | A008 | Sql | medium | no | 77 | 30 | 0.93 | 77 | 750 | 08JAN2025 | Good | 14 | 23930 | |
| 24 | A009 | Python | hard | no | 84 | 25 | 0.92 | 84 | 800 | 09JAN2025 | Good | 14 | 23931 | |
| 25 | A010 | Machine_learning | hard | yes | 91 | 15 | 0.85 | 91 | 950 | 10JAN2025 | Excellent | 14 | 23932 | |
| 26 | A011 | Ai | medium | no | 88 | 28 | 0.90 | 88 | 900 | 11JAN2025 | Good | 14 | 23933 | |
| 27 | A012 | Deep_learning | hard | yes | 95 | 22 | 0.86 | 95 | 1000 | 12JAN2025 | Excellent | 14 | 23934 | |
| 28 | A013 | Statistics | medium | no | 82 | 35 | 0.91 | 82 | 700 | 13JAN2025 | Good | 14 | 23935 | |
| 29 | A014 | Business | easy | no | 74 | 40 | 0.89 | 74 | 600 | 14JAN2025 | Average | 14 | 23936 | |
| 30 | A015 | Finance | medium | no | 69 | 48 | 0.88 | 69 | 650 | 15JAN2025 | Average | 14 | 23937 |
Adds additional records to dataset.
8. Using PROC TRANSPOSE
proc transpose data=combined_dataset out=transpose_scores;
var Avg_Score Percentage;
run;
proc print data=transpose_scores;
run;
OUTPUT:
| Obs | _NAME_ | COL1 | COL2 | COL3 | COL4 | COL5 | COL6 | COL7 | COL8 | COL9 | COL10 | COL11 | COL12 | COL13 | COL14 | COL15 | COL16 | COL17 | COL18 | COL19 | COL20 | COL21 | COL22 | COL23 | COL24 | COL25 | COL26 | COL27 | COL28 | COL29 | COL30 |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | Avg_Score | 85 | 78 | 92 | 66 | 100 | 70 | 89 | 77 | 84 | 91 | 88 | 95 | 82 | 74 | 69 | 85 | 78 | 92 | 66 | 100 | 70 | 89 | 77 | 84 | 91 | 88 | 95 | 82 | 74 | 69 |
| 2 | Percentage | 85 | 78 | 92 | 66 | 110 | 70 | 89 | 77 | 84 | 91 | 88 | 95 | 82 | 74 | 69 | 85 | 78 | 92 | 66 | 110 | 70 | 89 | 77 | 84 | 91 | 88 | 95 | 82 | 74 | 69 |
Restructures rows into columns.
9. Character Functions
data assessment_char;
set assessment_utilization;
Subject_Clean = strip(trim(Subject));
Subject_Code = catx("-",Subject,Assessment_ID);
run;
proc print data=assessment_char;
run;
OUTPUT:
| Obs | Assessment_ID | Subject | Difficulty_Level | Cheating_Flag | Avg_Score | Completion_Time | Reliability_Index | Percentage | Fees | Exam_Date | Utilization_Class | Subject_Clean | Subject_Code |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | A001 | Mathematics | easy | no | 85 | 35 | 0.95 | 85 | 500 | 01JAN2025 | Good | Mathematics | Mathematics-A001 |
| 2 | A002 | Science | medium | no | 78 | 40 | 0.90 | 78 | 600 | 02JAN2025 | Good | Science | Science-A002 |
| 3 | A003 | Statistics | hard | yes | 92 | 20 | 0.88 | 92 | 700 | 03JAN2025 | Excellent | Statistics | Statistics-A003 |
| 4 | A004 | Programming | medium | no | 66 | 50 | 0.75 | 66 | 800 | 04JAN2025 | Average | Programming | Programming-A004 |
| 5 | A005 | Mathematics | easy | yes | 100 | 10 | 0.96 | 110 | 500 | 05JAN2025 | Excellent | Mathematics | Mathematics-A005 |
| 6 | A006 | Analytics | medium | no | 70 | 45 | 0.91 | 70 | 650 | 06JAN2025 | Average | Analytics | Analytics-A006 |
| 7 | A007 | Data_science | hard | no | 89 | 10 | 0.87 | 89 | 900 | 07JAN2025 | Good | Data_science | Data_science-A007 |
| 8 | A008 | Sql | medium | no | 77 | 30 | 0.93 | 77 | 750 | 08JAN2025 | Good | Sql | Sql-A008 |
| 9 | A009 | Python | hard | no | 84 | 25 | 0.92 | 84 | 800 | 09JAN2025 | Good | Python | Python-A009 |
| 10 | A010 | Machine_learning | hard | yes | 91 | 15 | 0.85 | 91 | 950 | 10JAN2025 | Excellent | Machine_learning | Machine_learning-A010 |
| 11 | A011 | Ai | medium | no | 88 | 28 | 0.90 | 88 | 900 | 11JAN2025 | Good | Ai | Ai-A011 |
| 12 | A012 | Deep_learning | hard | yes | 95 | 22 | 0.86 | 95 | 1000 | 12JAN2025 | Excellent | Deep_learning | Deep_learning-A012 |
| 13 | A013 | Statistics | medium | no | 82 | 35 | 0.91 | 82 | 700 | 13JAN2025 | Good | Statistics | Statistics-A013 |
| 14 | A014 | Business | easy | no | 74 | 40 | 0.89 | 74 | 600 | 14JAN2025 | Average | Business | Business-A014 |
| 15 | A015 | Finance | medium | no | 69 | 48 | 0.88 | 69 | 650 | 15JAN2025 | Average | Finance | Finance-A015 |
Functions
used:
|
Function |
Purpose |
|
STRIP |
Remove
leading/trailing spaces |
|
TRIM |
Remove
trailing spaces |
|
CAT |
Concatenate
strings |
|
CATX |
Concatenate
with delimiter |
|
UPCASE |
Convert
to uppercase |
|
LOWCASE |
Convert
to lowercase |
10. Keep And Drop
data assessment_K;
set assessment_char;
keep Assessment_ID Subject Percentage Fees Exam_Date Utilization_Class
Subject_Code;
run;
proc print data=assessment_K;
run;
OUTPUT:
| Obs | Assessment_ID | Subject | Percentage | Fees | Exam_Date | Utilization_Class | Subject_Code |
|---|---|---|---|---|---|---|---|
| 1 | A001 | Mathematics | 85 | 500 | 01JAN2025 | Good | Mathematics-A001 |
| 2 | A002 | Science | 78 | 600 | 02JAN2025 | Good | Science-A002 |
| 3 | A003 | Statistics | 92 | 700 | 03JAN2025 | Excellent | Statistics-A003 |
| 4 | A004 | Programming | 66 | 800 | 04JAN2025 | Average | Programming-A004 |
| 5 | A005 | Mathematics | 110 | 500 | 05JAN2025 | Excellent | Mathematics-A005 |
| 6 | A006 | Analytics | 70 | 650 | 06JAN2025 | Average | Analytics-A006 |
| 7 | A007 | Data_science | 89 | 900 | 07JAN2025 | Good | Data_science-A007 |
| 8 | A008 | Sql | 77 | 750 | 08JAN2025 | Good | Sql-A008 |
| 9 | A009 | Python | 84 | 800 | 09JAN2025 | Good | Python-A009 |
| 10 | A010 | Machine_learning | 91 | 950 | 10JAN2025 | Excellent | Machine_learning-A010 |
| 11 | A011 | Ai | 88 | 900 | 11JAN2025 | Good | Ai-A011 |
| 12 | A012 | Deep_learning | 95 | 1000 | 12JAN2025 | Excellent | Deep_learning-A012 |
| 13 | A013 | Statistics | 82 | 700 | 13JAN2025 | Good | Statistics-A013 |
| 14 | A014 | Business | 74 | 600 | 14JAN2025 | Average | Business-A014 |
| 15 | A015 | Finance | 69 | 650 | 15JAN2025 | Average | Finance-A015 |
data assessment_D;
set assessment_char;
Drop Subject Difficulty_Level Cheating_Flag Avg_Score Completion_Time
Reliability_Index Percentage Fees ;
run;
proc print data=assessment_D;
run;
OUTPUT:
| Obs | Assessment_ID | Exam_Date | Utilization_Class | Subject_Clean | Subject_Code |
|---|---|---|---|---|---|
| 1 | A001 | 01JAN2025 | Good | Mathematics | Mathematics-A001 |
| 2 | A002 | 02JAN2025 | Good | Science | Science-A002 |
| 3 | A003 | 03JAN2025 | Excellent | Statistics | Statistics-A003 |
| 4 | A004 | 04JAN2025 | Average | Programming | Programming-A004 |
| 5 | A005 | 05JAN2025 | Excellent | Mathematics | Mathematics-A005 |
| 6 | A006 | 06JAN2025 | Average | Analytics | Analytics-A006 |
| 7 | A007 | 07JAN2025 | Good | Data_science | Data_science-A007 |
| 8 | A008 | 08JAN2025 | Good | Sql | Sql-A008 |
| 9 | A009 | 09JAN2025 | Good | Python | Python-A009 |
| 10 | A010 | 10JAN2025 | Excellent | Machine_learning | Machine_learning-A010 |
| 11 | A011 | 11JAN2025 | Good | Ai | Ai-A011 |
| 12 | A012 | 12JAN2025 | Excellent | Deep_learning | Deep_learning-A012 |
| 13 | A013 | 13JAN2025 | Good | Statistics | Statistics-A013 |
| 14 | A014 | 14JAN2025 | Average | Business | Business-A014 |
| 15 | A015 | 15JAN2025 | Average | Finance | Finance-A015 |
11. Deleting Temporary Dataset
proc datasets library=work;
delete assessment_D transpose_scores;
quit;
LOG:
12. Creating Second Dataset
data assessment_monitoring;
length Assessment_ID $10 Invigilator $20 Location $20;
input Assessment_ID $ Invigilator $ Location $;
datalines;
A001 Rahul Hyderabad
A002 Anita Delhi
A003 Kiran Bangalore
A004 Mohan Chennai
A005 Ritu Mumbai
A006 Arjun Pune
A007 Suresh Hyderabad
A008 Kavya Delhi
A009 Meena Bangalore
A010 Rajesh Chennai
A011 Sunil Hyderabad
A012 Priya Mumbai
A013 Pooja Delhi
A014 Ajay Bangalore
A015 Deepak Chennai
;
run;
proc print data=assessment_monitoring;
run;
OUTPUT:
| Obs | Assessment_ID | Invigilator | Location |
|---|---|---|---|
| 1 | A001 | Rahul | Hyderabad |
| 2 | A002 | Anita | Delhi |
| 3 | A003 | Kiran | Bangalore |
| 4 | A004 | Mohan | Chennai |
| 5 | A005 | Ritu | Mumbai |
| 6 | A006 | Arjun | Pune |
| 7 | A007 | Suresh | Hyderabad |
| 8 | A008 | Kavya | Delhi |
| 9 | A009 | Meena | Bangalore |
| 10 | A010 | Rajesh | Chennai |
| 11 | A011 | Sunil | Hyderabad |
| 12 | A012 | Priya | Mumbai |
| 13 | A013 | Pooja | Delhi |
| 14 | A014 | Ajay | Bangalore |
| 15 | A015 | Deepak | Chennai |
Explanation Of Monitoring Dataset
This dataset represents invigilation or monitoring information
collected during the assessment.
Variables included:
|
Variable |
Meaning |
|
Assessment_ID |
Unique assessment identifier |
|
Invigilator |
Supervisor responsible |
|
Location |
Examination location |
Such data is useful for identifying location-based fraud patterns.
13. SET Statement For Data Concatenation
data assessment_combined;
set online_assessment_clean
assessment_utilization;
run;
proc print data=assessment_combined;
run;
OUTPUT:
| Obs | Assessment_ID | Subject | Difficulty_Level | Cheating_Flag | Avg_Score | Completion_Time | Reliability_Index | Percentage | Fees | Exam_Date | Utilization_Class |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | A001 | Mathematics | easy | no | 85 | 35 | 0.95 | 85 | 500 | 01JAN2025 | |
| 2 | A002 | Science | medium | no | 78 | 40 | 0.90 | 78 | 600 | 02JAN2025 | |
| 3 | A003 | Statistics | hard | yes | 92 | 20 | 0.88 | 92 | 700 | 03JAN2025 | |
| 4 | A004 | Programming | medium | no | 66 | 50 | 0.75 | 66 | 800 | 04JAN2025 | |
| 5 | A005 | Mathematics | easy | yes | 100 | 10 | 0.96 | 110 | 500 | 05JAN2025 | |
| 6 | A006 | Analytics | medium | no | 70 | 45 | 0.91 | 70 | 650 | 06JAN2025 | |
| 7 | A007 | Data_science | hard | no | 89 | 10 | 0.87 | 89 | 900 | 07JAN2025 | |
| 8 | A008 | Sql | medium | no | 77 | 30 | 0.93 | 77 | 750 | 08JAN2025 | |
| 9 | A009 | Python | hard | no | 84 | 25 | 0.92 | 84 | 800 | 09JAN2025 | |
| 10 | A010 | Machine_learning | hard | yes | 91 | 15 | 0.85 | 91 | 950 | 10JAN2025 | |
| 11 | A011 | Ai | medium | no | 88 | 28 | 0.90 | 88 | 900 | 11JAN2025 | |
| 12 | A012 | Deep_learning | hard | yes | 95 | 22 | 0.86 | 95 | 1000 | 12JAN2025 | |
| 13 | A013 | Statistics | medium | no | 82 | 35 | 0.91 | 82 | 700 | 13JAN2025 | |
| 14 | A014 | Business | easy | no | 74 | 40 | 0.89 | 74 | 600 | 14JAN2025 | |
| 15 | A015 | Finance | medium | no | 69 | 48 | 0.88 | 69 | 650 | 15JAN2025 | |
| 16 | A001 | Mathematics | easy | no | 85 | 35 | 0.95 | 85 | 500 | 01JAN2025 | Good |
| 17 | A002 | Science | medium | no | 78 | 40 | 0.90 | 78 | 600 | 02JAN2025 | Good |
| 18 | A003 | Statistics | hard | yes | 92 | 20 | 0.88 | 92 | 700 | 03JAN2025 | Excellent |
| 19 | A004 | Programming | medium | no | 66 | 50 | 0.75 | 66 | 800 | 04JAN2025 | Average |
| 20 | A005 | Mathematics | easy | yes | 100 | 10 | 0.96 | 110 | 500 | 05JAN2025 | Excellent |
| 21 | A006 | Analytics | medium | no | 70 | 45 | 0.91 | 70 | 650 | 06JAN2025 | Average |
| 22 | A007 | Data_science | hard | no | 89 | 10 | 0.87 | 89 | 900 | 07JAN2025 | Good |
| 23 | A008 | Sql | medium | no | 77 | 30 | 0.93 | 77 | 750 | 08JAN2025 | Good |
| 24 | A009 | Python | hard | no | 84 | 25 | 0.92 | 84 | 800 | 09JAN2025 | Good |
| 25 | A010 | Machine_learning | hard | yes | 91 | 15 | 0.85 | 91 | 950 | 10JAN2025 | Excellent |
| 26 | A011 | Ai | medium | no | 88 | 28 | 0.90 | 88 | 900 | 11JAN2025 | Good |
| 27 | A012 | Deep_learning | hard | yes | 95 | 22 | 0.86 | 95 | 1000 | 12JAN2025 | Excellent |
| 28 | A013 | Statistics | medium | no | 82 | 35 | 0.91 | 82 | 700 | 13JAN2025 | Good |
| 29 | A014 | Business | easy | no | 74 | 40 | 0.89 | 74 | 600 | 14JAN2025 | Average |
| 30 | A015 | Finance | medium | no | 69 | 48 | 0.88 | 69 | 650 | 15JAN2025 | Average |
Explanation Of SET Statement
The SET
statement reads observations sequentially from datasets and combines
them.
Key characteristics:
·
Observations from first dataset appear first
·
Then observations from next dataset
·
Variables are aligned automatically
Example
workflow:
Dataset1
A001
A002
Dataset2
A003
A004
After
SET:
A001
A002
A003
A004
14. MERGE Statement For Data Integration
proc sort data=assessment_utilization;
by Assessment_ID;
run;
proc print data=assessment_utilization;
run;
OUTPUT:
| Obs | Assessment_ID | Subject | Difficulty_Level | Cheating_Flag | Avg_Score | Completion_Time | Reliability_Index | Percentage | Fees | Exam_Date | Utilization_Class |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | A001 | Mathematics | easy | no | 85 | 35 | 0.95 | 85 | 500 | 01JAN2025 | Good |
| 2 | A002 | Science | medium | no | 78 | 40 | 0.90 | 78 | 600 | 02JAN2025 | Good |
| 3 | A003 | Statistics | hard | yes | 92 | 20 | 0.88 | 92 | 700 | 03JAN2025 | Excellent |
| 4 | A004 | Programming | medium | no | 66 | 50 | 0.75 | 66 | 800 | 04JAN2025 | Average |
| 5 | A005 | Mathematics | easy | yes | 100 | 10 | 0.96 | 110 | 500 | 05JAN2025 | Excellent |
| 6 | A006 | Analytics | medium | no | 70 | 45 | 0.91 | 70 | 650 | 06JAN2025 | Average |
| 7 | A007 | Data_science | hard | no | 89 | 10 | 0.87 | 89 | 900 | 07JAN2025 | Good |
| 8 | A008 | Sql | medium | no | 77 | 30 | 0.93 | 77 | 750 | 08JAN2025 | Good |
| 9 | A009 | Python | hard | no | 84 | 25 | 0.92 | 84 | 800 | 09JAN2025 | Good |
| 10 | A010 | Machine_learning | hard | yes | 91 | 15 | 0.85 | 91 | 950 | 10JAN2025 | Excellent |
| 11 | A011 | Ai | medium | no | 88 | 28 | 0.90 | 88 | 900 | 11JAN2025 | Good |
| 12 | A012 | Deep_learning | hard | yes | 95 | 22 | 0.86 | 95 | 1000 | 12JAN2025 | Excellent |
| 13 | A013 | Statistics | medium | no | 82 | 35 | 0.91 | 82 | 700 | 13JAN2025 | Good |
| 14 | A014 | Business | easy | no | 74 | 40 | 0.89 | 74 | 600 | 14JAN2025 | Average |
| 15 | A015 | Finance | medium | no | 69 | 48 | 0.88 | 69 | 650 | 15JAN2025 | Average |
proc sort data=assessment_monitoring;
by Assessment_ID;
run;
proc print data=assessment_monitoring;
run;
OUTPUT:
| Obs | Assessment_ID | Invigilator | Location |
|---|---|---|---|
| 1 | A001 | Rahul | Hyderabad |
| 2 | A002 | Anita | Delhi |
| 3 | A003 | Kiran | Bangalore |
| 4 | A004 | Mohan | Chennai |
| 5 | A005 | Ritu | Mumbai |
| 6 | A006 | Arjun | Pune |
| 7 | A007 | Suresh | Hyderabad |
| 8 | A008 | Kavya | Delhi |
| 9 | A009 | Meena | Bangalore |
| 10 | A010 | Rajesh | Chennai |
| 11 | A011 | Sunil | Hyderabad |
| 12 | A012 | Priya | Mumbai |
| 13 | A013 | Pooja | Delhi |
| 14 | A014 | Ajay | Bangalore |
| 15 | A015 | Deepak | Chennai |
SAS requires datasets to be sorted before merging. Sorting
ensures correct matching of observations.
data assessment_merged;
merge assessment_utilization
assessment_monitoring;
by Assessment_ID;
run;
proc print data=assessment_merged;
run;
OUTPUT:
| Obs | Assessment_ID | Subject | Difficulty_Level | Cheating_Flag | Avg_Score | Completion_Time | Reliability_Index | Percentage | Fees | Exam_Date | Utilization_Class | Invigilator | Location |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | A001 | Mathematics | easy | no | 85 | 35 | 0.95 | 85 | 500 | 01JAN2025 | Good | Rahul | Hyderabad |
| 2 | A002 | Science | medium | no | 78 | 40 | 0.90 | 78 | 600 | 02JAN2025 | Good | Anita | Delhi |
| 3 | A003 | Statistics | hard | yes | 92 | 20 | 0.88 | 92 | 700 | 03JAN2025 | Excellent | Kiran | Bangalore |
| 4 | A004 | Programming | medium | no | 66 | 50 | 0.75 | 66 | 800 | 04JAN2025 | Average | Mohan | Chennai |
| 5 | A005 | Mathematics | easy | yes | 100 | 10 | 0.96 | 110 | 500 | 05JAN2025 | Excellent | Ritu | Mumbai |
| 6 | A006 | Analytics | medium | no | 70 | 45 | 0.91 | 70 | 650 | 06JAN2025 | Average | Arjun | Pune |
| 7 | A007 | Data_science | hard | no | 89 | 10 | 0.87 | 89 | 900 | 07JAN2025 | Good | Suresh | Hyderabad |
| 8 | A008 | Sql | medium | no | 77 | 30 | 0.93 | 77 | 750 | 08JAN2025 | Good | Kavya | Delhi |
| 9 | A009 | Python | hard | no | 84 | 25 | 0.92 | 84 | 800 | 09JAN2025 | Good | Meena | Bangalore |
| 10 | A010 | Machine_learning | hard | yes | 91 | 15 | 0.85 | 91 | 950 | 10JAN2025 | Excellent | Rajesh | Chennai |
| 11 | A011 | Ai | medium | no | 88 | 28 | 0.90 | 88 | 900 | 11JAN2025 | Good | Sunil | Hyderabad |
| 12 | A012 | Deep_learning | hard | yes | 95 | 22 | 0.86 | 95 | 1000 | 12JAN2025 | Excellent | Priya | Mumbai |
| 13 | A013 | Statistics | medium | no | 82 | 35 | 0.91 | 82 | 700 | 13JAN2025 | Good | Pooja | Delhi |
| 14 | A014 | Business | easy | no | 74 | 40 | 0.89 | 74 | 600 | 14JAN2025 | Average | Ajay | Bangalore |
| 15 | A015 | Finance | medium | no | 69 | 48 | 0.88 | 69 | 650 | 15JAN2025 | Average | Deepak | Chennai |
Explanation Of MERGE
MERGE combines datasets horizontally.
Meaning:
Variables from multiple datasets are combined into a single observation.
Example:
Dataset1
ID | Score |
A001 | 85 |
Dataset2
ID | Location |
A001 | Hyderabad |
After MERGE
ID | Score | Location |
A001 | 85 | Hyderabad |
15. Data Standardization Using Character Functions
data assessment_text_clean;
set assessment_merged;
Subject = strip(trim(Subject));
Subject = propcase(Subject);
Subject_Upper = upcase(Subject);
Subject_Lower = lowcase(Subject);
Subject_Code = catx("_",Subject,Assessment_ID);
run;
proc print data=assessment_text_clean;
run;
OUTPUT:
| Obs | Assessment_ID | Subject | Difficulty_Level | Cheating_Flag | Avg_Score | Completion_Time | Reliability_Index | Percentage | Fees | Exam_Date | Utilization_Class | Invigilator | Location | Subject_Upper | Subject_Lower | Subject_Code |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | A001 | Mathematics | easy | no | 85 | 35 | 0.95 | 85 | 500 | 01JAN2025 | Good | Rahul | Hyderabad | MATHEMATICS | mathematics | Mathematics_A001 |
| 2 | A002 | Science | medium | no | 78 | 40 | 0.90 | 78 | 600 | 02JAN2025 | Good | Anita | Delhi | SCIENCE | science | Science_A002 |
| 3 | A003 | Statistics | hard | yes | 92 | 20 | 0.88 | 92 | 700 | 03JAN2025 | Excellent | Kiran | Bangalore | STATISTICS | statistics | Statistics_A003 |
| 4 | A004 | Programming | medium | no | 66 | 50 | 0.75 | 66 | 800 | 04JAN2025 | Average | Mohan | Chennai | PROGRAMMING | programming | Programming_A004 |
| 5 | A005 | Mathematics | easy | yes | 100 | 10 | 0.96 | 110 | 500 | 05JAN2025 | Excellent | Ritu | Mumbai | MATHEMATICS | mathematics | Mathematics_A005 |
| 6 | A006 | Analytics | medium | no | 70 | 45 | 0.91 | 70 | 650 | 06JAN2025 | Average | Arjun | Pune | ANALYTICS | analytics | Analytics_A006 |
| 7 | A007 | Data_science | hard | no | 89 | 10 | 0.87 | 89 | 900 | 07JAN2025 | Good | Suresh | Hyderabad | DATA_SCIENCE | data_science | Data_science_A007 |
| 8 | A008 | Sql | medium | no | 77 | 30 | 0.93 | 77 | 750 | 08JAN2025 | Good | Kavya | Delhi | SQL | sql | Sql_A008 |
| 9 | A009 | Python | hard | no | 84 | 25 | 0.92 | 84 | 800 | 09JAN2025 | Good | Meena | Bangalore | PYTHON | python | Python_A009 |
| 10 | A010 | Machine_learning | hard | yes | 91 | 15 | 0.85 | 91 | 950 | 10JAN2025 | Excellent | Rajesh | Chennai | MACHINE_LEARNING | machine_learning | Machine_learning_A010 |
| 11 | A011 | Ai | medium | no | 88 | 28 | 0.90 | 88 | 900 | 11JAN2025 | Good | Sunil | Hyderabad | AI | ai | Ai_A011 |
| 12 | A012 | Deep_learning | hard | yes | 95 | 22 | 0.86 | 95 | 1000 | 12JAN2025 | Excellent | Priya | Mumbai | DEEP_LEARNING | deep_learning | Deep_learning_A012 |
| 13 | A013 | Statistics | medium | no | 82 | 35 | 0.91 | 82 | 700 | 13JAN2025 | Good | Pooja | Delhi | STATISTICS | statistics | Statistics_A013 |
| 14 | A014 | Business | easy | no | 74 | 40 | 0.89 | 74 | 600 | 14JAN2025 | Average | Ajay | Bangalore | BUSINESS | business | Business_A014 |
| 15 | A015 | Finance | medium | no | 69 | 48 | 0.88 | 69 | 650 | 15JAN2025 | Average | Deepak | Chennai | FINANCE | finance | Finance_A015 |
Explanation Of Character Functions
STRIP
Removes leading and trailing spaces.
Example:
" Mathematics "Becomes
"Mathematics"TRIM
Removes trailing spaces.
PROPCASE
Converts text into proper case.
Example
statistics → StatisticsUPCASE
Converts all characters to uppercase.
Statistics → STATISTICSLOWCASE
Converts text to lowercase.
STATISTICS → statisticsCATX
Concatenates multiple strings using a delimiter.
Example:
Subject_Code = catx("_",Subject,Assessment_ID);Output:
Mathematics_A00116. Numeric Functions
data assessment_numeric;
set assessment_text_clean;
Rounded_Score = round(Avg_Score,1);
Score_Difference = abs(Avg_Score - Percentage);
Adjusted_Fees = round(Fees * 1.05);
run;
proc print data=assessment_numeric;
run;
OUTPUT:
| Obs | Assessment_ID | Subject | Difficulty_Level | Cheating_Flag | Avg_Score | Completion_Time | Reliability_Index | Percentage | Fees | Exam_Date | Utilization_Class | Invigilator | Location | Subject_Upper | Subject_Lower | Subject_Code | Rounded_Score | Score_Difference | Adjusted_Fees |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | A001 | Mathematics | easy | no | 85 | 35 | 0.95 | 85 | 500 | 01JAN2025 | Good | Rahul | Hyderabad | MATHEMATICS | mathematics | Mathematics_A001 | 85 | 0 | 525 |
| 2 | A002 | Science | medium | no | 78 | 40 | 0.90 | 78 | 600 | 02JAN2025 | Good | Anita | Delhi | SCIENCE | science | Science_A002 | 78 | 0 | 630 |
| 3 | A003 | Statistics | hard | yes | 92 | 20 | 0.88 | 92 | 700 | 03JAN2025 | Excellent | Kiran | Bangalore | STATISTICS | statistics | Statistics_A003 | 92 | 0 | 735 |
| 4 | A004 | Programming | medium | no | 66 | 50 | 0.75 | 66 | 800 | 04JAN2025 | Average | Mohan | Chennai | PROGRAMMING | programming | Programming_A004 | 66 | 0 | 840 |
| 5 | A005 | Mathematics | easy | yes | 100 | 10 | 0.96 | 110 | 500 | 05JAN2025 | Excellent | Ritu | Mumbai | MATHEMATICS | mathematics | Mathematics_A005 | 100 | 10 | 525 |
| 6 | A006 | Analytics | medium | no | 70 | 45 | 0.91 | 70 | 650 | 06JAN2025 | Average | Arjun | Pune | ANALYTICS | analytics | Analytics_A006 | 70 | 0 | 683 |
| 7 | A007 | Data_science | hard | no | 89 | 10 | 0.87 | 89 | 900 | 07JAN2025 | Good | Suresh | Hyderabad | DATA_SCIENCE | data_science | Data_science_A007 | 89 | 0 | 945 |
| 8 | A008 | Sql | medium | no | 77 | 30 | 0.93 | 77 | 750 | 08JAN2025 | Good | Kavya | Delhi | SQL | sql | Sql_A008 | 77 | 0 | 788 |
| 9 | A009 | Python | hard | no | 84 | 25 | 0.92 | 84 | 800 | 09JAN2025 | Good | Meena | Bangalore | PYTHON | python | Python_A009 | 84 | 0 | 840 |
| 10 | A010 | Machine_learning | hard | yes | 91 | 15 | 0.85 | 91 | 950 | 10JAN2025 | Excellent | Rajesh | Chennai | MACHINE_LEARNING | machine_learning | Machine_learning_A010 | 91 | 0 | 998 |
| 11 | A011 | Ai | medium | no | 88 | 28 | 0.90 | 88 | 900 | 11JAN2025 | Good | Sunil | Hyderabad | AI | ai | Ai_A011 | 88 | 0 | 945 |
| 12 | A012 | Deep_learning | hard | yes | 95 | 22 | 0.86 | 95 | 1000 | 12JAN2025 | Excellent | Priya | Mumbai | DEEP_LEARNING | deep_learning | Deep_learning_A012 | 95 | 0 | 1050 |
| 13 | A013 | Statistics | medium | no | 82 | 35 | 0.91 | 82 | 700 | 13JAN2025 | Good | Pooja | Delhi | STATISTICS | statistics | Statistics_A013 | 82 | 0 | 735 |
| 14 | A014 | Business | easy | no | 74 | 40 | 0.89 | 74 | 600 | 14JAN2025 | Average | Ajay | Bangalore | BUSINESS | business | Business_A014 | 74 | 0 | 630 |
| 15 | A015 | Finance | medium | no | 69 | 48 | 0.88 | 69 | 650 | 15JAN2025 | Average | Deepak | Chennai | FINANCE | finance | Finance_A015 | 69 | 0 | 683 |
Explanation
ROUND
Rounds numeric value.
88.456 → 88.5
ABS
Absolute value function.
-10 → 10
Used earlier for fixing negative completion
time.
ROUND For Fees
Sometimes organizations increase fees by
certain percentage.
Fees * 1.05
Represents 5% increase.
17. Date Handling
data assessment_dates;
set assessment_numeric;
Next_Assessment_Date = intnx('month',Exam_Date,6,'same');
Months_Since_Exam = intck('month',Exam_Date,today());
Exam_Start_Date = mdy(1,1,2025);
Days_Between = intck('day',Exam_Date,today());
run;
proc print data=assessment_dates;
run;
OUTPUT:
| Obs | Assessment_ID | Subject | Difficulty_Level | Cheating_Flag | Avg_Score | Completion_Time | Reliability_Index | Percentage | Fees | Exam_Date | Utilization_Class | Invigilator | Location | Subject_Upper | Subject_Lower | Subject_Code | Rounded_Score | Score_Difference | Adjusted_Fees | Next_Assessment_Date | Months_Since_Exam | Exam_Start_Date | Days_Between |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | A001 | Mathematics | easy | no | 85 | 35 | 0.95 | 85 | 500 | 01JAN2025 | Good | Rahul | Hyderabad | MATHEMATICS | mathematics | Mathematics_A001 | 85 | 0 | 525 | 23923 | 14 | 23742 | 427 |
| 2 | A002 | Science | medium | no | 78 | 40 | 0.90 | 78 | 600 | 02JAN2025 | Good | Anita | Delhi | SCIENCE | science | Science_A002 | 78 | 0 | 630 | 23924 | 14 | 23742 | 426 |
| 3 | A003 | Statistics | hard | yes | 92 | 20 | 0.88 | 92 | 700 | 03JAN2025 | Excellent | Kiran | Bangalore | STATISTICS | statistics | Statistics_A003 | 92 | 0 | 735 | 23925 | 14 | 23742 | 425 |
| 4 | A004 | Programming | medium | no | 66 | 50 | 0.75 | 66 | 800 | 04JAN2025 | Average | Mohan | Chennai | PROGRAMMING | programming | Programming_A004 | 66 | 0 | 840 | 23926 | 14 | 23742 | 424 |
| 5 | A005 | Mathematics | easy | yes | 100 | 10 | 0.96 | 110 | 500 | 05JAN2025 | Excellent | Ritu | Mumbai | MATHEMATICS | mathematics | Mathematics_A005 | 100 | 10 | 525 | 23927 | 14 | 23742 | 423 |
| 6 | A006 | Analytics | medium | no | 70 | 45 | 0.91 | 70 | 650 | 06JAN2025 | Average | Arjun | Pune | ANALYTICS | analytics | Analytics_A006 | 70 | 0 | 683 | 23928 | 14 | 23742 | 422 |
| 7 | A007 | Data_science | hard | no | 89 | 10 | 0.87 | 89 | 900 | 07JAN2025 | Good | Suresh | Hyderabad | DATA_SCIENCE | data_science | Data_science_A007 | 89 | 0 | 945 | 23929 | 14 | 23742 | 421 |
| 8 | A008 | Sql | medium | no | 77 | 30 | 0.93 | 77 | 750 | 08JAN2025 | Good | Kavya | Delhi | SQL | sql | Sql_A008 | 77 | 0 | 788 | 23930 | 14 | 23742 | 420 |
| 9 | A009 | Python | hard | no | 84 | 25 | 0.92 | 84 | 800 | 09JAN2025 | Good | Meena | Bangalore | PYTHON | python | Python_A009 | 84 | 0 | 840 | 23931 | 14 | 23742 | 419 |
| 10 | A010 | Machine_learning | hard | yes | 91 | 15 | 0.85 | 91 | 950 | 10JAN2025 | Excellent | Rajesh | Chennai | MACHINE_LEARNING | machine_learning | Machine_learning_A010 | 91 | 0 | 998 | 23932 | 14 | 23742 | 418 |
| 11 | A011 | Ai | medium | no | 88 | 28 | 0.90 | 88 | 900 | 11JAN2025 | Good | Sunil | Hyderabad | AI | ai | Ai_A011 | 88 | 0 | 945 | 23933 | 14 | 23742 | 417 |
| 12 | A012 | Deep_learning | hard | yes | 95 | 22 | 0.86 | 95 | 1000 | 12JAN2025 | Excellent | Priya | Mumbai | DEEP_LEARNING | deep_learning | Deep_learning_A012 | 95 | 0 | 1050 | 23934 | 14 | 23742 | 416 |
| 13 | A013 | Statistics | medium | no | 82 | 35 | 0.91 | 82 | 700 | 13JAN2025 | Good | Pooja | Delhi | STATISTICS | statistics | Statistics_A013 | 82 | 0 | 735 | 23935 | 14 | 23742 | 415 |
| 14 | A014 | Business | easy | no | 74 | 40 | 0.89 | 74 | 600 | 14JAN2025 | Average | Ajay | Bangalore | BUSINESS | business | Business_A014 | 74 | 0 | 630 | 23936 | 14 | 23742 | 414 |
| 15 | A015 | Finance | medium | no | 69 | 48 | 0.88 | 69 | 650 | 15JAN2025 | Average | Deepak | Chennai | FINANCE | finance | Finance_A015 | 69 | 0 | 683 | 23937 | 14 | 23742 | 413 |
Explanation Of Date Functions
MDY
Creates a SAS date value.
MDY(month,day,year)
Example
MDY(1,1,2025)
Represents January 1 2025.
INTCK
Counts intervals between two dates.
Example:
intck('month',Exam_Date,today())
Returns number of months.
INTNX
Calculates future or past date.
Example:
intnx('month',Exam_Date,6)
Adds six months.
18. Fraud Detection Macro Automation
%macro fraud_detection;
data fraud_analysis;
set assessment_dates;
length Fraud_Level $20;
if Completion_Time < 15 and Avg_Score > 90 then Fraud_Level="High Risk";
else if Cheating_Flag="yes" then Fraud_Level="Medium Risk";
else if Completion_Time > 60 then Fraud_Level="Suspicious Delay";
else Fraud_Level="Normal";
run;
proc print data=fraud_analysis;
run;
%mend fraud_detection;
%fraud_detection;
OUTPUT:
| Obs | Assessment_ID | Subject | Difficulty_Level | Cheating_Flag | Avg_Score | Completion_Time | Reliability_Index | Percentage | Fees | Exam_Date | Utilization_Class | Invigilator | Location | Subject_Upper | Subject_Lower | Subject_Code | Rounded_Score | Score_Difference | Adjusted_Fees | Next_Assessment_Date | Months_Since_Exam | Exam_Start_Date | Days_Between | Fraud_Level |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | A001 | Mathematics | easy | no | 85 | 35 | 0.95 | 85 | 500 | 01JAN2025 | Good | Rahul | Hyderabad | MATHEMATICS | mathematics | Mathematics_A001 | 85 | 0 | 525 | 23923 | 14 | 23742 | 427 | Normal |
| 2 | A002 | Science | medium | no | 78 | 40 | 0.90 | 78 | 600 | 02JAN2025 | Good | Anita | Delhi | SCIENCE | science | Science_A002 | 78 | 0 | 630 | 23924 | 14 | 23742 | 426 | Normal |
| 3 | A003 | Statistics | hard | yes | 92 | 20 | 0.88 | 92 | 700 | 03JAN2025 | Excellent | Kiran | Bangalore | STATISTICS | statistics | Statistics_A003 | 92 | 0 | 735 | 23925 | 14 | 23742 | 425 | Medium Risk |
| 4 | A004 | Programming | medium | no | 66 | 50 | 0.75 | 66 | 800 | 04JAN2025 | Average | Mohan | Chennai | PROGRAMMING | programming | Programming_A004 | 66 | 0 | 840 | 23926 | 14 | 23742 | 424 | Normal |
| 5 | A005 | Mathematics | easy | yes | 100 | 10 | 0.96 | 110 | 500 | 05JAN2025 | Excellent | Ritu | Mumbai | MATHEMATICS | mathematics | Mathematics_A005 | 100 | 10 | 525 | 23927 | 14 | 23742 | 423 | High Risk |
| 6 | A006 | Analytics | medium | no | 70 | 45 | 0.91 | 70 | 650 | 06JAN2025 | Average | Arjun | Pune | ANALYTICS | analytics | Analytics_A006 | 70 | 0 | 683 | 23928 | 14 | 23742 | 422 | Normal |
| 7 | A007 | Data_science | hard | no | 89 | 10 | 0.87 | 89 | 900 | 07JAN2025 | Good | Suresh | Hyderabad | DATA_SCIENCE | data_science | Data_science_A007 | 89 | 0 | 945 | 23929 | 14 | 23742 | 421 | Normal |
| 8 | A008 | Sql | medium | no | 77 | 30 | 0.93 | 77 | 750 | 08JAN2025 | Good | Kavya | Delhi | SQL | sql | Sql_A008 | 77 | 0 | 788 | 23930 | 14 | 23742 | 420 | Normal |
| 9 | A009 | Python | hard | no | 84 | 25 | 0.92 | 84 | 800 | 09JAN2025 | Good | Meena | Bangalore | PYTHON | python | Python_A009 | 84 | 0 | 840 | 23931 | 14 | 23742 | 419 | Normal |
| 10 | A010 | Machine_learning | hard | yes | 91 | 15 | 0.85 | 91 | 950 | 10JAN2025 | Excellent | Rajesh | Chennai | MACHINE_LEARNING | machine_learning | Machine_learning_A010 | 91 | 0 | 998 | 23932 | 14 | 23742 | 418 | Medium Risk |
| 11 | A011 | Ai | medium | no | 88 | 28 | 0.90 | 88 | 900 | 11JAN2025 | Good | Sunil | Hyderabad | AI | ai | Ai_A011 | 88 | 0 | 945 | 23933 | 14 | 23742 | 417 | Normal |
| 12 | A012 | Deep_learning | hard | yes | 95 | 22 | 0.86 | 95 | 1000 | 12JAN2025 | Excellent | Priya | Mumbai | DEEP_LEARNING | deep_learning | Deep_learning_A012 | 95 | 0 | 1050 | 23934 | 14 | 23742 | 416 | Medium Risk |
| 13 | A013 | Statistics | medium | no | 82 | 35 | 0.91 | 82 | 700 | 13JAN2025 | Good | Pooja | Delhi | STATISTICS | statistics | Statistics_A013 | 82 | 0 | 735 | 23935 | 14 | 23742 | 415 | Normal |
| 14 | A014 | Business | easy | no | 74 | 40 | 0.89 | 74 | 600 | 14JAN2025 | Average | Ajay | Bangalore | BUSINESS | business | Business_A014 | 74 | 0 | 630 | 23936 | 14 | 23742 | 414 | Normal |
| 15 | A015 | Finance | medium | no | 69 | 48 | 0.88 | 69 | 650 | 15JAN2025 | Average | Deepak | Chennai | FINANCE | finance | Finance_A015 | 69 | 0 | 683 | 23937 | 14 | 23742 | 413 | Normal |
Macros
automate repetitive tasks.
Advantages:
Macros automate repetitive tasks.
·
Reduce code duplication
·
Improve automation
·
Increase efficiency
·
Enable parameterized analysis
In large
datasets containing millions of rows, macros help create automated fraud
detection systems.
19. Risk Classification
data fraud_risk_classification;
set fraud_analysis;
length Risk_Category $20;
if Fraud_Level="High Risk" then Risk_Category="Critical";
else if Fraud_Level="Medium Risk" then Risk_Category="Warning";
else if Fraud_Level="Suspicious Delay" then Risk_Category="Review";
else Risk_Category="Safe";
run;
proc print data=fraud_risk_classification;
run;
OUTPUT:
| Obs | Assessment_ID | Subject | Difficulty_Level | Cheating_Flag | Avg_Score | Completion_Time | Reliability_Index | Percentage | Fees | Exam_Date | Utilization_Class | Invigilator | Location | Subject_Upper | Subject_Lower | Subject_Code | Rounded_Score | Score_Difference | Adjusted_Fees | Next_Assessment_Date | Months_Since_Exam | Exam_Start_Date | Days_Between | Fraud_Level | Risk_Category |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | A001 | Mathematics | easy | no | 85 | 35 | 0.95 | 85 | 500 | 01JAN2025 | Good | Rahul | Hyderabad | MATHEMATICS | mathematics | Mathematics_A001 | 85 | 0 | 525 | 23923 | 14 | 23742 | 427 | Normal | Safe |
| 2 | A002 | Science | medium | no | 78 | 40 | 0.90 | 78 | 600 | 02JAN2025 | Good | Anita | Delhi | SCIENCE | science | Science_A002 | 78 | 0 | 630 | 23924 | 14 | 23742 | 426 | Normal | Safe |
| 3 | A003 | Statistics | hard | yes | 92 | 20 | 0.88 | 92 | 700 | 03JAN2025 | Excellent | Kiran | Bangalore | STATISTICS | statistics | Statistics_A003 | 92 | 0 | 735 | 23925 | 14 | 23742 | 425 | Medium Risk | Warning |
| 4 | A004 | Programming | medium | no | 66 | 50 | 0.75 | 66 | 800 | 04JAN2025 | Average | Mohan | Chennai | PROGRAMMING | programming | Programming_A004 | 66 | 0 | 840 | 23926 | 14 | 23742 | 424 | Normal | Safe |
| 5 | A005 | Mathematics | easy | yes | 100 | 10 | 0.96 | 110 | 500 | 05JAN2025 | Excellent | Ritu | Mumbai | MATHEMATICS | mathematics | Mathematics_A005 | 100 | 10 | 525 | 23927 | 14 | 23742 | 423 | High Risk | Critical |
| 6 | A006 | Analytics | medium | no | 70 | 45 | 0.91 | 70 | 650 | 06JAN2025 | Average | Arjun | Pune | ANALYTICS | analytics | Analytics_A006 | 70 | 0 | 683 | 23928 | 14 | 23742 | 422 | Normal | Safe |
| 7 | A007 | Data_science | hard | no | 89 | 10 | 0.87 | 89 | 900 | 07JAN2025 | Good | Suresh | Hyderabad | DATA_SCIENCE | data_science | Data_science_A007 | 89 | 0 | 945 | 23929 | 14 | 23742 | 421 | Normal | Safe |
| 8 | A008 | Sql | medium | no | 77 | 30 | 0.93 | 77 | 750 | 08JAN2025 | Good | Kavya | Delhi | SQL | sql | Sql_A008 | 77 | 0 | 788 | 23930 | 14 | 23742 | 420 | Normal | Safe |
| 9 | A009 | Python | hard | no | 84 | 25 | 0.92 | 84 | 800 | 09JAN2025 | Good | Meena | Bangalore | PYTHON | python | Python_A009 | 84 | 0 | 840 | 23931 | 14 | 23742 | 419 | Normal | Safe |
| 10 | A010 | Machine_learning | hard | yes | 91 | 15 | 0.85 | 91 | 950 | 10JAN2025 | Excellent | Rajesh | Chennai | MACHINE_LEARNING | machine_learning | Machine_learning_A010 | 91 | 0 | 998 | 23932 | 14 | 23742 | 418 | Medium Risk | Warning |
| 11 | A011 | Ai | medium | no | 88 | 28 | 0.90 | 88 | 900 | 11JAN2025 | Good | Sunil | Hyderabad | AI | ai | Ai_A011 | 88 | 0 | 945 | 23933 | 14 | 23742 | 417 | Normal | Safe |
| 12 | A012 | Deep_learning | hard | yes | 95 | 22 | 0.86 | 95 | 1000 | 12JAN2025 | Excellent | Priya | Mumbai | DEEP_LEARNING | deep_learning | Deep_learning_A012 | 95 | 0 | 1050 | 23934 | 14 | 23742 | 416 | Medium Risk | Warning |
| 13 | A013 | Statistics | medium | no | 82 | 35 | 0.91 | 82 | 700 | 13JAN2025 | Good | Pooja | Delhi | STATISTICS | statistics | Statistics_A013 | 82 | 0 | 735 | 23935 | 14 | 23742 | 415 | Normal | Safe |
| 14 | A014 | Business | easy | no | 74 | 40 | 0.89 | 74 | 600 | 14JAN2025 | Average | Ajay | Bangalore | BUSINESS | business | Business_A014 | 74 | 0 | 630 | 23936 | 14 | 23742 | 414 | Normal | Safe |
| 15 | A015 | Finance | medium | no | 69 | 48 | 0.88 | 69 | 650 | 15JAN2025 | Average | Deepak | Chennai | FINANCE | finance | Finance_A015 | 69 | 0 | 683 | 23937 | 14 | 23742 | 413 | Normal | Safe |
20. PROC TRANSPOSE
proc transpose data=fraud_risk_classification out=assessment_transpose;
var Avg_Score Percentage Completion_Time;
run;
proc print data=assessment_transpose;
run;
OUTPUT:
| Obs | _NAME_ | COL1 | COL2 | COL3 | COL4 | COL5 | COL6 | COL7 | COL8 | COL9 | COL10 | COL11 | COL12 | COL13 | COL14 | COL15 |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | Avg_Score | 85 | 78 | 92 | 66 | 100 | 70 | 89 | 77 | 84 | 91 | 88 | 95 | 82 | 74 | 69 |
| 2 | Percentage | 85 | 78 | 92 | 66 | 110 | 70 | 89 | 77 | 84 | 91 | 88 | 95 | 82 | 74 | 69 |
| 3 | Completion_Time | 35 | 40 | 20 | 50 | 10 | 45 | 10 | 30 | 25 | 15 | 28 | 22 | 35 | 40 | 48 |
Explanation
PROC TRANSPOSE converts columns into rows.
Example:
Original
|
ID |
Score |
Percentage |
|
A001 |
85 |
85 |
After transpose
|
Variable |
Value |
|
Score |
85 |
|
Percentage |
85 |
21. PROC DATASETS DELETE
proc datasets library=work;
delete assessment_combined
assessment_numeric
assessment_dates;
quit;
LOG:
PROC DATASETS manages SAS libraries.
DELETE removes unwanted datasets.
Benefits:
· Reduces memory usage
· Keeps workspace organized
· Improves performance
Final Analytical DatasetThe
final dataset now includes:
·
Clean subject names
·
Corrected scores
·
Fraud detection flags
·
Risk classification
·
Monitoring information
·
Date analytics
·
Fee adjustments
This
dataset is ready for:
·
Reporting
·
Dashboard visualization
·
Fraud investigation
Best Practices
For SAS
·
Always validate raw data
·
Identify missing values
·
Standardize text variables
·
Validate numeric ranges
·
Correct date formats
·
Use macros for automation
·
Document code clearly
·
Remove temporary datasets
Conclusion
This project demonstrates a SAS workflow for
managing and analyzing online assessment data with intentional errors such as
invalid scores, missing values, negative completion times, and inconsistent
subject names. Using SAS programming techniques, these issues were detected and
corrected through character functions (PROPCASE, STRIP, CATX), numeric
functions (ABS, COALESCE), and date functions (MDY, INTCK, INTNX).
Dataset
operations such as SET, MERGE, PROC APPEND, and PROC TRANSPOSE were used to
combine and restructure data. A SAS macro was implemented to automate fraud
detection based on suspicious patterns like high scores with very short
completion times. The final dataset is clean, validated, and ready for
reporting, analysis, and fraud monitoring in online assessment systems.
SAS INTERVIEW QUESTIONS
1. What is the RETAIN statement
used for?
The RETAIN
statement in SAS is used to hold or
preserve the value of a variable from one iteration of the DATA step to the
next. Normally, SAS resets variables to missing at the beginning of
each iteration, but RETAIN prevents that behavior.
It is commonly used for:
·
Running totals
·
Carrying previous values forward
·
Creating cumulative calculations
Example:
data example;set sales;retain total_sales 0;total_sales + amount;run;Explanation:
·
retain
total_sales 0; initializes and keeps the value across rows.
·
total_sales
+ amount; keeps adding the amount for each observation.
·
This creates a cumulative total.
2. What is the difference between DROP
and KEEP?
Both DROP
and KEEP control which variables
appear in the output dataset.
|
Feature |
DROP |
KEEP |
|
Purpose |
Removes variables from output dataset |
Keeps only specified variables |
|
Logic |
Excludes listed variables |
Includes only listed variables |
|
Usage |
When most variables are needed |
When only a few variables are needed |
Example
using DROP:
data newdata;set olddata;drop temp_var;run; temp_var will not appear in
the output dataset.
Example
using KEEP:
data newdata;set olddata;keep id name salary;run;Only id, name, and salary will appear in the
dataset.
3. What are automatic variables in
SAS?
Automatic variables are variables created automatically by SAS during
DATA step execution. They are not
written to the output dataset, but they are available for processing.
The two most common automatic variables are:
|
Variable |
Purpose |
|
|
Counts the number of times the DATA step executes |
|
|
Indicates if an error occurred (1 = error, 0 = no error) |
Example:
data test;set sales;obs_number = _N_;run;Explanation:
·
_N_ increases by 1 for every observation
processed.
·
It helps track row numbers or iterations.
Example output:
|
N |
obs_number |
|
1 |
1 |
|
2 |
2 |
|
3 |
3 |
Key
Points
·
Created automatically by SAS
·
Not saved in the dataset
·
Useful for debugging and tracking iterations
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
About the Author:
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
About the Author:
SAS Learning Hub is a data analytics and SAS programming platform focused on clinical, financial, and real-world data analysis. The content is created by professionals with academic training in Pharmaceutics and hands-on experience in Base SAS, PROC SQL, Macros, SDTM, and ADaM, providing practical and industry-relevant SAS learning resources.
Disclaimer:
The datasets and analysis in this article are created for educational and demonstration purposes only. They do not represent ONLINE EXAMS data.
Our Mission:
This blog provides industry-focused SAS programming tutorials and analytics projects covering finance, healthcare, and technology.
This project is suitable for:
· Students learning SAS
· Data analysts building portfolios
· Professionals preparing for SAS interviews
· Bloggers writing about analytics
· Clinical SAS Programmer
· Research Data Analyst
· Regulatory Data Validator
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Comments
Post a Comment