719.Can We Identify Suspicious Exam Behavior and Correct Data Issues Using Advanced SAS Techniques?

Smart Fraud Detection and Data Error Correction in Online Exams with SAS

--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

HERE IN THIS PROJECT WE USED THESE SAS STATEMENTS —DATA STEP | LENGTH | INPUT | DATALINES | FORMAT | SET | IF-THEN- KEEP | DROP | MERGE | PROC SORT | PROC PRINT | PROC APPEND | PROC TRANSPOSE | PROC DATASETS | MACRO

--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Introduction

In the modern digital era, online assessments have become an essential component of education, recruitment, and professional certification systems. Universities, corporate training platforms, government examination boards, and online learning providers rely heavily on digital examinations to evaluate knowledge, skills, and competencies. These systems allow thousands or even millions of candidates to take exams remotely, making assessment processes faster, scalable, and cost-effective.

However, the rapid adoption of online assessments has also introduced several challenges. Since the entire testing process occurs digitally, large volumes of data are generated every day. This data includes information such as assessment identifiers, subjects, average scores, completion times, difficulty levels, reliability indices, and flags indicating possible cheating behavior. Managing and analyzing this data efficiently requires robust data processing tools and strong data quality management practices.

One of the biggest concerns in online assessments is data accuracy and fraud detection. Candidates may attempt to cheat by using unauthorized resources, collaborating with others, or manipulating the system to obtain higher scores. Additionally, technical issues or data entry problems can introduce errors such as missing values, incorrect scores, invalid completion times, or inconsistent text formats. If these errors remain undetected, they can lead to incorrect conclusions, unfair evaluations, and unreliable reporting.

To address these challenges, organizations rely on advanced data analytics tools such as SAS (Statistical Analysis System). SAS is widely used in data management, statistical analysis, fraud detection, and large-scale data processing across industries such as healthcare, banking, pharmaceuticals, education, and government agencies.

Using SAS programming techniques, analysts can perform several important tasks, including:

·  Creating structured datasets from raw data

·  Detecting and correcting data quality issues

·  Standardizing inconsistent text and numeric values

·  Performing date calculations and time-based analysis

·  Combining data from multiple sources

·  Automating fraud detection logic using macros

·  Transforming data structures for reporting and analysis

This project demonstrates a practical SAS-based workflow for managing online assessment data. A simulated dataset is created containing several variables such as assessment identifiers, subject names, average scores, completion times, cheating indicators, difficulty levels, reliability indices, percentages, fees, and examination dates. To replicate real-world scenarios, multiple intentional errors are introduced into the dataset, including invalid scores, missing values, incorrect date formats, and inconsistent subject names.

The project then walks through the process of identifying these errors and correcting them using various SAS programming techniques. Functions for handling character strings and numeric values are applied to clean the dataset, while date functions such as MDY, INTCK, and INTNX are used to perform time-based calculations. Dataset manipulation techniques such as SET, MERGE, APPEND, and PROC TRANSPOSE are used to combine and restructure the data.

Furthermore, fraud detection logic is implemented using SAS macros. This allows the system to automatically identify suspicious assessment patterns, such as unusually high scores combined with extremely short completion times. Such automated detection systems are commonly used in modern examination platforms to ensure fairness and maintain the integrity of the evaluation process.

By the end of this project, readers will gain a clear understanding of how SAS can be used to perform data cleaning, fraud detection, dataset management, and automated analytics in online assessment systems.The project also demonstrates how real-world data issues can be systematically identified and resolved using structured programming approaches.

Table Of Contents

1.     Creating Dataset With Intentional Errors

2.     Errors Present In The Dataset

3.     Corrected Dataset Creation

4.     Utilization Classification

5.     Date Calculations Using INTCK and INTNX

6.     Fraud Detection Macro

7.     Combining Datasets Using MERGE

8.     Appending New Data Using PROC APPEND

9.     Data Restructuring Using PROC TRANSPOSE

10.  Character Functions Implementation

11.  KEEP and DROP Statements

12.  Deleting Temporary Dataset Using PROC DATASETS

13.  Creating Second Dataset (Assessment Monitoring)

14.  SET Statement For Data Concatenation

15.  MERGE Statement For Data Integration

16.  Data Standardization Using Character Functions

17.  Numeric Functions Implementation

18.  Date Handling Using MDY, INTCK, and INTNX

19.  Fraud Detection Macro Automation

20.  Risk Classification

21.  PROC TRANSPOSE For Data Transformation

22.  PROC DATASETS DELETE For Workspace Cleanup

23.  Final Analytical Dataset

24.  Best Practices For SAS Data Management

Business Context

Consider an online examination platform used by universities and certification agencies. Thousands of students take assessments every day across different subjects such as Mathematics, Programming, Statistics, and Business Analytics.

Each assessment generates a record containing several attributes such as:

·  Assessment ID

·  Subject name

·  Average score

·  Completion time

·  Difficulty level

·  Reliability index

·  Cheating flag

·  Fees paid

·  Assessment date

Online learning platforms and digital examinations have changed how organizations conduct assessments. Traditional pen-and-paper exams are gradually being replaced by online systems that allow candidates to take tests remotely using computers or mobile devices. Universities, certification bodies, recruitment agencies, and corporate training programs widely use these platforms to evaluate candidates efficiently.

Online assessments generate large amounts of data such as scores, completion time, subject difficulty, reliability index, and cheating indicators. This data helps organizations analyze exam quality and detect suspicious behavior. However, raw data often contains issues like missing values, incorrect formats, or inconsistent entries.

To ensure accurate analysis, tools like SAS are used to clean data, standardize variables, merge datasets, and implement fraud detection logic. In this project, a simulated online assessment dataset with intentional errors is used to demonstrate how SAS techniques can identify and correct data issues while preparing the dataset for reliable analysis.

The analytics team must analyze this dataset to answer several questions:

·  Which assessments have suspicious completion times?

·  Are there abnormal score patterns indicating cheating?

·  Are the data values valid and correctly formatted?

·  What percentage of students completed the exam successfully?

Dataset Structure

The dataset will contain the following variables.

Variable

Description

Assessment_ID

Unique ID for each assessment

Subject

Subject name

Avg_Score

Average score obtained

Completion_Time

Time taken to complete exam

Cheating_Flag

Indicates suspicious behavior

Difficulty_Level

Easy Medium Hard

Reliability_Index

Reliability measure of assessment

Percentage

Score percentage

Fees

Assessment fee

Utilization_Class

Performance classification

Exam_Date

Assessment date

1. Creating Dataset With Intentional Errors

data online_assessment_raw;

length Assessment_ID $10 Subject $20 Difficulty_Level $10 Cheating_Flag $5;

input Assessment_ID $ Subject $ Avg_Score Completion_Time Cheating_Flag $

      Difficulty_Level $ Reliability_Index Percentage Fees Exam_Date:ddmmyy10.;

format Exam_Date date9.;

datalines;

A001 math 85 35 no easy 0.95 85 500 01-01-2025

A002 science 78 40 no medium 0.90 78 600 02-01-2025

A003 statistics 92 20 yes hard 0.88 92 700 03-01-2025

A004 programming 66 50 no medium 0.75 66 800 04-01-2025

A005 mathematics 110 10 yes easy 0.96 110 500 05-01-2025

A006 analytics . 45 no medium 0.91 70 650 06-01-2025

A007 data_science 89 -10 no hard 0.87 89 900 07-01-2025

A008 sql 77 30 no medium 0.93 77 750 08-01-2025

A009 python 84 25 no hard 0.92 84 800 09-01-2025

A010 machine_learning 91 15 yes hard 0.85 91 950 10-01-2025

A011 ai 88 28 no medium 0.90 88 900 11-01-2025

A012 deep_learning 95 22 yes hard 0.86 95 1000 12-01-2025

A013 statistics 82 35 no medium 0.91 82 700 13-01-2025

A014 business 74 40 no easy 0.89 74 600 14-01-2025

A015 finance 69 48 no medium 0.88 69 650 15-01-2025

;

run;

proc print data=online_assessment_raw;

run;

OUTPUT:

ObsAssessment_IDSubjectDifficulty_LevelCheating_FlagAvg_ScoreCompletion_TimeReliability_IndexPercentageFeesExam_Date
1A001matheasyno85350.958550001JAN2025
2A002sciencemediumno78400.907860002JAN2025
3A003statisticshardyes92200.889270003JAN2025
4A004programmingmediumno66500.756680004JAN2025
5A005mathematicseasyyes110100.9611050005JAN2025
6A006analyticsmediumno.450.917065006JAN2025
7A007data_sciencehardno89-100.878990007JAN2025
8A008sqlmediumno77300.937775008JAN2025
9A009pythonhardno84250.928480009JAN2025
10A010machine_learninghardyes91150.859195010JAN2025
11A011aimediumno88280.908890011JAN2025
12A012deep_learninghardyes95220.8695100012JAN2025
13A013statisticsmediumno82350.918270013JAN2025
14A014businesseasyno74400.897460014JAN2025
15A015financemediumno69480.886965015JAN2025

Errors Present In The Dataset

Intentional errors exist in this dataset.

Error 1 – Incorrect Date Format

The dataset uses 01-01-2025 while SAS expects dates using DATE9 or numeric SAS date values.

This will produce incorrect date interpretation.

Error 2 – Impossible Score

A005 mathematics 110

Score cannot exceed 100.

Error 3 – Missing Value

Avg_Score .

Missing score must be handled properly.

Error 4 – Negative Completion Time

Completion_Time -10

This is logically impossible.

Error 5 – Inconsistent Subject Names

Examples:

math

mathematics

statistics

data_science

These should be standardized.

2. Corrected Dataset Creation

data online_assessment_clean;

set online_assessment_raw;

Subject = propcase(Subject);

if Subject="Math" then Subject="Mathematics";

Completion_Time = abs(Completion_Time);

if Avg_Score > 100 then Avg_Score = 100;

Avg_Score = coalesce(Avg_Score,70);

run;

proc print data=online_assessment_clean;

run;

OUTPUT:

ObsAssessment_IDSubjectDifficulty_LevelCheating_FlagAvg_ScoreCompletion_TimeReliability_IndexPercentageFeesExam_Date
1A001Mathematicseasyno85350.958550001JAN2025
2A002Sciencemediumno78400.907860002JAN2025
3A003Statisticshardyes92200.889270003JAN2025
4A004Programmingmediumno66500.756680004JAN2025
5A005Mathematicseasyyes100100.9611050005JAN2025
6A006Analyticsmediumno70450.917065006JAN2025
7A007Data_sciencehardno89100.878990007JAN2025
8A008Sqlmediumno77300.937775008JAN2025
9A009Pythonhardno84250.928480009JAN2025
10A010Machine_learninghardyes91150.859195010JAN2025
11A011Aimediumno88280.908890011JAN2025
12A012Deep_learninghardyes95220.8695100012JAN2025
13A013Statisticsmediumno82350.918270013JAN2025
14A014Businesseasyno74400.897460014JAN2025
15A015Financemediumno69480.886965015JAN2025

DATA Step

data online_assessment_clean;

Creates a new cleaned dataset.

SET Statement

set online_assessment_raw;

Reads observations from raw dataset.

PROPCASE Function

Subject = propcase(Subject);

Converts text to proper case.

Example:

math → Math

statistics → Statistics

ABS Function

Completion_Time = abs(Completion_Time);

Converts negative numbers into positive numbers.

COALESCE Function

Avg_Score = coalesce(Avg_Score,70);

Replaces missing score with default value.

Corrected the dataset containing several intentional errors.
The corrections included:

·  Converting subject names to proper case

·  Handling missing scores

·  Fixing negative completion time

·  Correcting invalid scores above 100

·  Converting date values properly

3. Utilization Classification

data assessment_utilization;

set online_assessment_clean;

length Utilization_Class $20;

if Percentage >=90 then Utilization_Class="Excellent";

else if Percentage >=75 then Utilization_Class="Good";

else if Percentage >=60 then Utilization_Class="Average";

else Utilization_Class="Poor";

run;

proc print data=assessment_utilization;

run;

OUTPUT:

ObsAssessment_IDSubjectDifficulty_LevelCheating_FlagAvg_ScoreCompletion_TimeReliability_IndexPercentageFeesExam_DateUtilization_Class
1A001Mathematicseasyno85350.958550001JAN2025Good
2A002Sciencemediumno78400.907860002JAN2025Good
3A003Statisticshardyes92200.889270003JAN2025Excellent
4A004Programmingmediumno66500.756680004JAN2025Average
5A005Mathematicseasyyes100100.9611050005JAN2025Excellent
6A006Analyticsmediumno70450.917065006JAN2025Average
7A007Data_sciencehardno89100.878990007JAN2025Good
8A008Sqlmediumno77300.937775008JAN2025Good
9A009Pythonhardno84250.928480009JAN2025Good
10A010Machine_learninghardyes91150.859195010JAN2025Excellent
11A011Aimediumno88280.908890011JAN2025Good
12A012Deep_learninghardyes95220.8695100012JAN2025Excellent
13A013Statisticsmediumno82350.918270013JAN2025Good
14A014Businesseasyno74400.897460014JAN2025Average
15A015Financemediumno69480.886965015JAN2025Average

4. Date Calculations Using INTCK,INTNX

data assessment_duration;

set assessment_utilization;

Months_Since_Exam = intck('month',Exam_Date,today());

Next_Assessment = intnx('month',Exam_Date,6,'same');

run;

proc print data=assessment_duration;

run;

OUTPUT:

ObsAssessment_IDSubjectDifficulty_LevelCheating_FlagAvg_ScoreCompletion_TimeReliability_IndexPercentageFeesExam_DateUtilization_ClassMonths_Since_ExamNext_Assessment
1A001Mathematicseasyno85350.958550001JAN2025Good1423923
2A002Sciencemediumno78400.907860002JAN2025Good1423924
3A003Statisticshardyes92200.889270003JAN2025Excellent1423925
4A004Programmingmediumno66500.756680004JAN2025Average1423926
5A005Mathematicseasyyes100100.9611050005JAN2025Excellent1423927
6A006Analyticsmediumno70450.917065006JAN2025Average1423928
7A007Data_sciencehardno89100.878990007JAN2025Good1423929
8A008Sqlmediumno77300.937775008JAN2025Good1423930
9A009Pythonhardno84250.928480009JAN2025Good1423931
10A010Machine_learninghardyes91150.859195010JAN2025Excellent1423932
11A011Aimediumno88280.908890011JAN2025Good1423933
12A012Deep_learninghardyes95220.8695100012JAN2025Excellent1423934
13A013Statisticsmediumno82350.918270013JAN2025Good1423935
14A014Businesseasyno74400.897460014JAN2025Average1423936
15A015Financemediumno69480.886965015JAN2025Average1423937

INTCK calculates the number of months between exam date and current date.

Predicts next exam after 6 months.

5. Fraud Detection Macro

%macro detect_fraud;

data fraud_cases;

set assessment_duration;

length Fraud_Flag $12;

if Completion_Time < 15 and Avg_Score > 90 then Fraud_Flag="High Risk";

else if Cheating_Flag="yes" then Fraud_Flag="Medium Risk";

else Fraud_Flag="Low Risk";

run;

proc print data=fraud_cases;

run;

%mend;


%detect_fraud;

OUTPUT:

ObsAssessment_IDSubjectDifficulty_LevelCheating_FlagAvg_ScoreCompletion_TimeReliability_IndexPercentageFeesExam_DateUtilization_ClassMonths_Since_ExamNext_AssessmentFraud_Flag
1A001Mathematicseasyno85350.958550001JAN2025Good1423923Low Risk
2A002Sciencemediumno78400.907860002JAN2025Good1423924Low Risk
3A003Statisticshardyes92200.889270003JAN2025Excellent1423925Medium Risk
4A004Programmingmediumno66500.756680004JAN2025Average1423926Low Risk
5A005Mathematicseasyyes100100.9611050005JAN2025Excellent1423927High Risk
6A006Analyticsmediumno70450.917065006JAN2025Average1423928Low Risk
7A007Data_sciencehardno89100.878990007JAN2025Good1423929Low Risk
8A008Sqlmediumno77300.937775008JAN2025Good1423930Low Risk
9A009Pythonhardno84250.928480009JAN2025Good1423931Low Risk
10A010Machine_learninghardyes91150.859195010JAN2025Excellent1423932Medium Risk
11A011Aimediumno88280.908890011JAN2025Good1423933Low Risk
12A012Deep_learninghardyes95220.8695100012JAN2025Excellent1423934Medium Risk
13A013Statisticsmediumno82350.918270013JAN2025Good1423935Low Risk
14A014Businesseasyno74400.897460014JAN2025Average1423936Low Risk
15A015Financemediumno69480.886965015JAN2025Average1423937Low Risk
Explanation Of Fraud Logic

Fraud is detected using rules:

Rule 1
Very high score + very short completion time.

Rule 2
Cheating flag already marked.

Rule 3
Otherwise considered normal.

6. Combining Datasets Using MERGE

proc sort data=fraud_cases;by Assessment_ID;run;

proc print data=fraud_cases;

run;

OUTPUT:

ObsAssessment_IDSubjectDifficulty_LevelCheating_FlagAvg_ScoreCompletion_TimeReliability_IndexPercentageFeesExam_DateUtilization_ClassMonths_Since_ExamNext_AssessmentFraud_Flag
1A001Mathematicseasyno85350.958550001JAN2025Good1423923Low Risk
2A002Sciencemediumno78400.907860002JAN2025Good1423924Low Risk
3A003Statisticshardyes92200.889270003JAN2025Excellent1423925Medium Risk
4A004Programmingmediumno66500.756680004JAN2025Average1423926Low Risk
5A005Mathematicseasyyes100100.9611050005JAN2025Excellent1423927High Risk
6A006Analyticsmediumno70450.917065006JAN2025Average1423928Low Risk
7A007Data_sciencehardno89100.878990007JAN2025Good1423929Low Risk
8A008Sqlmediumno77300.937775008JAN2025Good1423930Low Risk
9A009Pythonhardno84250.928480009JAN2025Good1423931Low Risk
10A010Machine_learninghardyes91150.859195010JAN2025Excellent1423932Medium Risk
11A011Aimediumno88280.908890011JAN2025Good1423933Low Risk
12A012Deep_learninghardyes95220.8695100012JAN2025Excellent1423934Medium Risk
13A013Statisticsmediumno82350.918270013JAN2025Good1423935Low Risk
14A014Businesseasyno74400.897460014JAN2025Average1423936Low Risk
15A015Financemediumno69480.886965015JAN2025Average1423937Low Risk

proc sort data=assessment_duration;by Assessment_ID;run;

proc print data=assessment_duration;

run;

OUTPUT:

ObsAssessment_IDSubjectDifficulty_LevelCheating_FlagAvg_ScoreCompletion_TimeReliability_IndexPercentageFeesExam_DateUtilization_ClassMonths_Since_ExamNext_Assessment
1A001Mathematicseasyno85350.958550001JAN2025Good1423923
2A002Sciencemediumno78400.907860002JAN2025Good1423924
3A003Statisticshardyes92200.889270003JAN2025Excellent1423925
4A004Programmingmediumno66500.756680004JAN2025Average1423926
5A005Mathematicseasyyes100100.9611050005JAN2025Excellent1423927
6A006Analyticsmediumno70450.917065006JAN2025Average1423928
7A007Data_sciencehardno89100.878990007JAN2025Good1423929
8A008Sqlmediumno77300.937775008JAN2025Good1423930
9A009Pythonhardno84250.928480009JAN2025Good1423931
10A010Machine_learninghardyes91150.859195010JAN2025Excellent1423932
11A011Aimediumno88280.908890011JAN2025Good1423933
12A012Deep_learninghardyes95220.8695100012JAN2025Excellent1423934
13A013Statisticsmediumno82350.918270013JAN2025Good1423935
14A014Businesseasyno74400.897460014JAN2025Average1423936
15A015Financemediumno69480.886965015JAN2025Average1423937

data combined_dataset;

merge fraud_cases 

      assessment_duration;

by Assessment_ID;

run;

proc print data=combined_dataset;

run;

OUTPUT:

ObsAssessment_IDSubjectDifficulty_LevelCheating_FlagAvg_ScoreCompletion_TimeReliability_IndexPercentageFeesExam_DateUtilization_ClassMonths_Since_ExamNext_AssessmentFraud_Flag
1A001Mathematicseasyno85350.958550001JAN2025Good1423923Low Risk
2A002Sciencemediumno78400.907860002JAN2025Good1423924Low Risk
3A003Statisticshardyes92200.889270003JAN2025Excellent1423925Medium Risk
4A004Programmingmediumno66500.756680004JAN2025Average1423926Low Risk
5A005Mathematicseasyyes100100.9611050005JAN2025Excellent1423927High Risk
6A006Analyticsmediumno70450.917065006JAN2025Average1423928Low Risk
7A007Data_sciencehardno89100.878990007JAN2025Good1423929Low Risk
8A008Sqlmediumno77300.937775008JAN2025Good1423930Low Risk
9A009Pythonhardno84250.928480009JAN2025Good1423931Low Risk
10A010Machine_learninghardyes91150.859195010JAN2025Excellent1423932Medium Risk
11A011Aimediumno88280.908890011JAN2025Good1423933Low Risk
12A012Deep_learninghardyes95220.8695100012JAN2025Excellent1423934Medium Risk
13A013Statisticsmediumno82350.918270013JAN2025Good1423935Low Risk
14A014Businesseasyno74400.897460014JAN2025Average1423936Low Risk
15A015Financemediumno69480.886965015JAN2025Average1423937Low Risk

MERGE combines datasets based on Assessment_ID.

7. Appending New Data

proc append base=combined_dataset

            data=assessment_duration;

run;

proc print data=combined_dataset;

run;

OUTPUT:

ObsAssessment_IDSubjectDifficulty_LevelCheating_FlagAvg_ScoreCompletion_TimeReliability_IndexPercentageFeesExam_DateUtilization_ClassMonths_Since_ExamNext_AssessmentFraud_Flag
1A001Mathematicseasyno85350.958550001JAN2025Good1423923Low Risk
2A002Sciencemediumno78400.907860002JAN2025Good1423924Low Risk
3A003Statisticshardyes92200.889270003JAN2025Excellent1423925Medium Risk
4A004Programmingmediumno66500.756680004JAN2025Average1423926Low Risk
5A005Mathematicseasyyes100100.9611050005JAN2025Excellent1423927High Risk
6A006Analyticsmediumno70450.917065006JAN2025Average1423928Low Risk
7A007Data_sciencehardno89100.878990007JAN2025Good1423929Low Risk
8A008Sqlmediumno77300.937775008JAN2025Good1423930Low Risk
9A009Pythonhardno84250.928480009JAN2025Good1423931Low Risk
10A010Machine_learninghardyes91150.859195010JAN2025Excellent1423932Medium Risk
11A011Aimediumno88280.908890011JAN2025Good1423933Low Risk
12A012Deep_learninghardyes95220.8695100012JAN2025Excellent1423934Medium Risk
13A013Statisticsmediumno82350.918270013JAN2025Good1423935Low Risk
14A014Businesseasyno74400.897460014JAN2025Average1423936Low Risk
15A015Financemediumno69480.886965015JAN2025Average1423937Low Risk
16A001Mathematicseasyno85350.958550001JAN2025Good1423923 
17A002Sciencemediumno78400.907860002JAN2025Good1423924 
18A003Statisticshardyes92200.889270003JAN2025Excellent1423925 
19A004Programmingmediumno66500.756680004JAN2025Average1423926 
20A005Mathematicseasyyes100100.9611050005JAN2025Excellent1423927 
21A006Analyticsmediumno70450.917065006JAN2025Average1423928 
22A007Data_sciencehardno89100.878990007JAN2025Good1423929 
23A008Sqlmediumno77300.937775008JAN2025Good1423930 
24A009Pythonhardno84250.928480009JAN2025Good1423931 
25A010Machine_learninghardyes91150.859195010JAN2025Excellent1423932 
26A011Aimediumno88280.908890011JAN2025Good1423933 
27A012Deep_learninghardyes95220.8695100012JAN2025Excellent1423934 
28A013Statisticsmediumno82350.918270013JAN2025Good1423935 
29A014Businesseasyno74400.897460014JAN2025Average1423936 
30A015Financemediumno69480.886965015JAN2025Average1423937 

Adds additional records to dataset.

8. Using PROC TRANSPOSE

proc transpose data=combined_dataset out=transpose_scores;

var Avg_Score Percentage;

run;

proc print data=transpose_scores;

run;

OUTPUT:

Obs_NAME_COL1COL2COL3COL4COL5COL6COL7COL8COL9COL10COL11COL12COL13COL14COL15COL16COL17COL18COL19COL20COL21COL22COL23COL24COL25COL26COL27COL28COL29COL30
1Avg_Score85789266100708977849188958274698578926610070897784918895827469
2Percentage85789266110708977849188958274698578926611070897784918895827469

Restructures rows into columns.

9. Character Functions 

data assessment_char;

set assessment_utilization;

Subject_Clean = strip(trim(Subject));

Subject_Code = catx("-",Subject,Assessment_ID);

run;

proc print data=assessment_char;

run;

OUTPUT:

ObsAssessment_IDSubjectDifficulty_LevelCheating_FlagAvg_ScoreCompletion_TimeReliability_IndexPercentageFeesExam_DateUtilization_ClassSubject_CleanSubject_Code
1A001Mathematicseasyno85350.958550001JAN2025GoodMathematicsMathematics-A001
2A002Sciencemediumno78400.907860002JAN2025GoodScienceScience-A002
3A003Statisticshardyes92200.889270003JAN2025ExcellentStatisticsStatistics-A003
4A004Programmingmediumno66500.756680004JAN2025AverageProgrammingProgramming-A004
5A005Mathematicseasyyes100100.9611050005JAN2025ExcellentMathematicsMathematics-A005
6A006Analyticsmediumno70450.917065006JAN2025AverageAnalyticsAnalytics-A006
7A007Data_sciencehardno89100.878990007JAN2025GoodData_scienceData_science-A007
8A008Sqlmediumno77300.937775008JAN2025GoodSqlSql-A008
9A009Pythonhardno84250.928480009JAN2025GoodPythonPython-A009
10A010Machine_learninghardyes91150.859195010JAN2025ExcellentMachine_learningMachine_learning-A010
11A011Aimediumno88280.908890011JAN2025GoodAiAi-A011
12A012Deep_learninghardyes95220.8695100012JAN2025ExcellentDeep_learningDeep_learning-A012
13A013Statisticsmediumno82350.918270013JAN2025GoodStatisticsStatistics-A013
14A014Businesseasyno74400.897460014JAN2025AverageBusinessBusiness-A014
15A015Financemediumno69480.886965015JAN2025AverageFinanceFinance-A015

Functions used:

Function

Purpose

STRIP

Remove leading/trailing spaces

TRIM

Remove trailing spaces

CAT

Concatenate strings

CATX

Concatenate with delimiter

UPCASE

Convert to uppercase

LOWCASE

Convert to lowercase

10. Keep And Drop 

data assessment_K;

set assessment_char;

keep Assessment_ID Subject Percentage Fees Exam_Date Utilization_Class

     Subject_Code;

run;

proc print data=assessment_K;

run;

OUTPUT:

ObsAssessment_IDSubjectPercentageFeesExam_DateUtilization_ClassSubject_Code
1A001Mathematics8550001JAN2025GoodMathematics-A001
2A002Science7860002JAN2025GoodScience-A002
3A003Statistics9270003JAN2025ExcellentStatistics-A003
4A004Programming6680004JAN2025AverageProgramming-A004
5A005Mathematics11050005JAN2025ExcellentMathematics-A005
6A006Analytics7065006JAN2025AverageAnalytics-A006
7A007Data_science8990007JAN2025GoodData_science-A007
8A008Sql7775008JAN2025GoodSql-A008
9A009Python8480009JAN2025GoodPython-A009
10A010Machine_learning9195010JAN2025ExcellentMachine_learning-A010
11A011Ai8890011JAN2025GoodAi-A011
12A012Deep_learning95100012JAN2025ExcellentDeep_learning-A012
13A013Statistics8270013JAN2025GoodStatistics-A013
14A014Business7460014JAN2025AverageBusiness-A014
15A015Finance6965015JAN2025AverageFinance-A015

data assessment_D;

set assessment_char;

Drop Subject Difficulty_Level Cheating_Flag Avg_Score Completion_Time 

     Reliability_Index Percentage Fees ;

run;

proc print data=assessment_D;

run;

OUTPUT:

ObsAssessment_IDExam_DateUtilization_ClassSubject_CleanSubject_Code
1A00101JAN2025GoodMathematicsMathematics-A001
2A00202JAN2025GoodScienceScience-A002
3A00303JAN2025ExcellentStatisticsStatistics-A003
4A00404JAN2025AverageProgrammingProgramming-A004
5A00505JAN2025ExcellentMathematicsMathematics-A005
6A00606JAN2025AverageAnalyticsAnalytics-A006
7A00707JAN2025GoodData_scienceData_science-A007
8A00808JAN2025GoodSqlSql-A008
9A00909JAN2025GoodPythonPython-A009
10A01010JAN2025ExcellentMachine_learningMachine_learning-A010
11A01111JAN2025GoodAiAi-A011
12A01212JAN2025ExcellentDeep_learningDeep_learning-A012
13A01313JAN2025GoodStatisticsStatistics-A013
14A01414JAN2025AverageBusinessBusiness-A014
15A01515JAN2025AverageFinanceFinance-A015

11. Deleting Temporary Dataset

proc datasets library=work;

delete assessment_D transpose_scores;

quit;

LOG:

NOTE: Deleting WORK.ASSESSMENT_D (memtype=DATA).
NOTE: Deleting WORK.TRANSPOSE_SCORES (memtype=DATA).
Deletes unwanted datasets.

12. Creating Second Dataset

data assessment_monitoring;

length Assessment_ID $10 Invigilator $20 Location $20;

input Assessment_ID $ Invigilator $ Location $;

datalines;

A001 Rahul Hyderabad

A002 Anita Delhi

A003 Kiran Bangalore

A004 Mohan Chennai

A005 Ritu Mumbai

A006 Arjun Pune

A007 Suresh Hyderabad

A008 Kavya Delhi

A009 Meena Bangalore

A010 Rajesh Chennai

A011 Sunil Hyderabad

A012 Priya Mumbai

A013 Pooja Delhi

A014 Ajay Bangalore

A015 Deepak Chennai

;

run;

proc print data=assessment_monitoring;

run;

OUTPUT:

ObsAssessment_IDInvigilatorLocation
1A001RahulHyderabad
2A002AnitaDelhi
3A003KiranBangalore
4A004MohanChennai
5A005RituMumbai
6A006ArjunPune
7A007SureshHyderabad
8A008KavyaDelhi
9A009MeenaBangalore
10A010RajeshChennai
11A011SunilHyderabad
12A012PriyaMumbai
13A013PoojaDelhi
14A014AjayBangalore
15A015DeepakChennai

Explanation Of Monitoring Dataset

This dataset represents invigilation or monitoring information collected during the assessment.

Variables included:

Variable

Meaning

Assessment_ID

Unique assessment identifier

Invigilator

Supervisor responsible

Location

Examination location

Such data is useful for identifying location-based fraud patterns.

13. SET Statement For Data Concatenation

data assessment_combined;

set online_assessment_clean

    assessment_utilization;

run;

proc print data=assessment_combined;

run;

OUTPUT:

ObsAssessment_IDSubjectDifficulty_LevelCheating_FlagAvg_ScoreCompletion_TimeReliability_IndexPercentageFeesExam_DateUtilization_Class
1A001Mathematicseasyno85350.958550001JAN2025 
2A002Sciencemediumno78400.907860002JAN2025 
3A003Statisticshardyes92200.889270003JAN2025 
4A004Programmingmediumno66500.756680004JAN2025 
5A005Mathematicseasyyes100100.9611050005JAN2025 
6A006Analyticsmediumno70450.917065006JAN2025 
7A007Data_sciencehardno89100.878990007JAN2025 
8A008Sqlmediumno77300.937775008JAN2025 
9A009Pythonhardno84250.928480009JAN2025 
10A010Machine_learninghardyes91150.859195010JAN2025 
11A011Aimediumno88280.908890011JAN2025 
12A012Deep_learninghardyes95220.8695100012JAN2025 
13A013Statisticsmediumno82350.918270013JAN2025 
14A014Businesseasyno74400.897460014JAN2025 
15A015Financemediumno69480.886965015JAN2025 
16A001Mathematicseasyno85350.958550001JAN2025Good
17A002Sciencemediumno78400.907860002JAN2025Good
18A003Statisticshardyes92200.889270003JAN2025Excellent
19A004Programmingmediumno66500.756680004JAN2025Average
20A005Mathematicseasyyes100100.9611050005JAN2025Excellent
21A006Analyticsmediumno70450.917065006JAN2025Average
22A007Data_sciencehardno89100.878990007JAN2025Good
23A008Sqlmediumno77300.937775008JAN2025Good
24A009Pythonhardno84250.928480009JAN2025Good
25A010Machine_learninghardyes91150.859195010JAN2025Excellent
26A011Aimediumno88280.908890011JAN2025Good
27A012Deep_learninghardyes95220.8695100012JAN2025Excellent
28A013Statisticsmediumno82350.918270013JAN2025Good
29A014Businesseasyno74400.897460014JAN2025Average
30A015Financemediumno69480.886965015JAN2025Average

Explanation Of SET Statement

The SET statement reads observations sequentially from datasets and combines them.

Key characteristics:

·  Observations from first dataset appear first

·  Then observations from next dataset

·  Variables are aligned automatically

Example workflow:

Dataset1

A001

A002

Dataset2

A003

A004

After SET:

A001

A002

A003

A004

14. MERGE Statement For Data Integration

proc sort data=assessment_utilization;

by Assessment_ID;

run;

proc print data=assessment_utilization;

run;

OUTPUT:

ObsAssessment_IDSubjectDifficulty_LevelCheating_FlagAvg_ScoreCompletion_TimeReliability_IndexPercentageFeesExam_DateUtilization_Class
1A001Mathematicseasyno85350.958550001JAN2025Good
2A002Sciencemediumno78400.907860002JAN2025Good
3A003Statisticshardyes92200.889270003JAN2025Excellent
4A004Programmingmediumno66500.756680004JAN2025Average
5A005Mathematicseasyyes100100.9611050005JAN2025Excellent
6A006Analyticsmediumno70450.917065006JAN2025Average
7A007Data_sciencehardno89100.878990007JAN2025Good
8A008Sqlmediumno77300.937775008JAN2025Good
9A009Pythonhardno84250.928480009JAN2025Good
10A010Machine_learninghardyes91150.859195010JAN2025Excellent
11A011Aimediumno88280.908890011JAN2025Good
12A012Deep_learninghardyes95220.8695100012JAN2025Excellent
13A013Statisticsmediumno82350.918270013JAN2025Good
14A014Businesseasyno74400.897460014JAN2025Average
15A015Financemediumno69480.886965015JAN2025Average

proc sort data=assessment_monitoring;

by Assessment_ID;

run;

proc print data=assessment_monitoring;

run;

OUTPUT:

ObsAssessment_IDInvigilatorLocation
1A001RahulHyderabad
2A002AnitaDelhi
3A003KiranBangalore
4A004MohanChennai
5A005RituMumbai
6A006ArjunPune
7A007SureshHyderabad
8A008KavyaDelhi
9A009MeenaBangalore
10A010RajeshChennai
11A011SunilHyderabad
12A012PriyaMumbai
13A013PoojaDelhi
14A014AjayBangalore
15A015DeepakChennai

SAS requires datasets to be sorted before merging. Sorting ensures correct matching of observations.

data assessment_merged;

merge assessment_utilization

      assessment_monitoring;

by Assessment_ID;

run;

proc print data=assessment_merged;

run;

OUTPUT:

ObsAssessment_IDSubjectDifficulty_LevelCheating_FlagAvg_ScoreCompletion_TimeReliability_IndexPercentageFeesExam_DateUtilization_ClassInvigilatorLocation
1A001Mathematicseasyno85350.958550001JAN2025GoodRahulHyderabad
2A002Sciencemediumno78400.907860002JAN2025GoodAnitaDelhi
3A003Statisticshardyes92200.889270003JAN2025ExcellentKiranBangalore
4A004Programmingmediumno66500.756680004JAN2025AverageMohanChennai
5A005Mathematicseasyyes100100.9611050005JAN2025ExcellentRituMumbai
6A006Analyticsmediumno70450.917065006JAN2025AverageArjunPune
7A007Data_sciencehardno89100.878990007JAN2025GoodSureshHyderabad
8A008Sqlmediumno77300.937775008JAN2025GoodKavyaDelhi
9A009Pythonhardno84250.928480009JAN2025GoodMeenaBangalore
10A010Machine_learninghardyes91150.859195010JAN2025ExcellentRajeshChennai
11A011Aimediumno88280.908890011JAN2025GoodSunilHyderabad
12A012Deep_learninghardyes95220.8695100012JAN2025ExcellentPriyaMumbai
13A013Statisticsmediumno82350.918270013JAN2025GoodPoojaDelhi
14A014Businesseasyno74400.897460014JAN2025AverageAjayBangalore
15A015Financemediumno69480.886965015JAN2025AverageDeepakChennai

Explanation Of MERGE

MERGE combines datasets horizontally.

Meaning:

Variables from multiple datasets are combined into a single observation.

Example:

Dataset1

ID

Score

A001

85

Dataset2

ID

Location

A001

Hyderabad

After MERGE

ID

Score

Location

A001

85

Hyderabad

15. Data Standardization Using Character Functions

data assessment_text_clean;

set assessment_merged;

Subject = strip(trim(Subject));

Subject = propcase(Subject);

Subject_Upper = upcase(Subject);

Subject_Lower = lowcase(Subject);

Subject_Code = catx("_",Subject,Assessment_ID);

run;

proc print data=assessment_text_clean;

run;

OUTPUT:

ObsAssessment_IDSubjectDifficulty_LevelCheating_FlagAvg_ScoreCompletion_TimeReliability_IndexPercentageFeesExam_DateUtilization_ClassInvigilatorLocationSubject_UpperSubject_LowerSubject_Code
1A001Mathematicseasyno85350.958550001JAN2025GoodRahulHyderabadMATHEMATICSmathematicsMathematics_A001
2A002Sciencemediumno78400.907860002JAN2025GoodAnitaDelhiSCIENCEscienceScience_A002
3A003Statisticshardyes92200.889270003JAN2025ExcellentKiranBangaloreSTATISTICSstatisticsStatistics_A003
4A004Programmingmediumno66500.756680004JAN2025AverageMohanChennaiPROGRAMMINGprogrammingProgramming_A004
5A005Mathematicseasyyes100100.9611050005JAN2025ExcellentRituMumbaiMATHEMATICSmathematicsMathematics_A005
6A006Analyticsmediumno70450.917065006JAN2025AverageArjunPuneANALYTICSanalyticsAnalytics_A006
7A007Data_sciencehardno89100.878990007JAN2025GoodSureshHyderabadDATA_SCIENCEdata_scienceData_science_A007
8A008Sqlmediumno77300.937775008JAN2025GoodKavyaDelhiSQLsqlSql_A008
9A009Pythonhardno84250.928480009JAN2025GoodMeenaBangalorePYTHONpythonPython_A009
10A010Machine_learninghardyes91150.859195010JAN2025ExcellentRajeshChennaiMACHINE_LEARNINGmachine_learningMachine_learning_A010
11A011Aimediumno88280.908890011JAN2025GoodSunilHyderabadAIaiAi_A011
12A012Deep_learninghardyes95220.8695100012JAN2025ExcellentPriyaMumbaiDEEP_LEARNINGdeep_learningDeep_learning_A012
13A013Statisticsmediumno82350.918270013JAN2025GoodPoojaDelhiSTATISTICSstatisticsStatistics_A013
14A014Businesseasyno74400.897460014JAN2025AverageAjayBangaloreBUSINESSbusinessBusiness_A014
15A015Financemediumno69480.886965015JAN2025AverageDeepakChennaiFINANCEfinanceFinance_A015

Explanation Of Character Functions

STRIP

Removes leading and trailing spaces.

Example:

"  Mathematics  "

Becomes

"Mathematics"

TRIM

Removes trailing spaces.

PROPCASE

Converts text into proper case.

Example

statisticsStatistics

UPCASE

Converts all characters to uppercase.

StatisticsSTATISTICS

LOWCASE

Converts text to lowercase.

STATISTICSstatistics

CATX

Concatenates multiple strings using a delimiter.

Example:

Subject_Code = catx("_",Subject,Assessment_ID);

Output:

Mathematics_A001

16. Numeric Functions

data assessment_numeric;

set assessment_text_clean;

Rounded_Score = round(Avg_Score,1);

Score_Difference = abs(Avg_Score - Percentage);

Adjusted_Fees = round(Fees * 1.05);

run;

proc print data=assessment_numeric;

run;

OUTPUT:

ObsAssessment_IDSubjectDifficulty_LevelCheating_FlagAvg_ScoreCompletion_TimeReliability_IndexPercentageFeesExam_DateUtilization_ClassInvigilatorLocationSubject_UpperSubject_LowerSubject_CodeRounded_ScoreScore_DifferenceAdjusted_Fees
1A001Mathematicseasyno85350.958550001JAN2025GoodRahulHyderabadMATHEMATICSmathematicsMathematics_A001850525
2A002Sciencemediumno78400.907860002JAN2025GoodAnitaDelhiSCIENCEscienceScience_A002780630
3A003Statisticshardyes92200.889270003JAN2025ExcellentKiranBangaloreSTATISTICSstatisticsStatistics_A003920735
4A004Programmingmediumno66500.756680004JAN2025AverageMohanChennaiPROGRAMMINGprogrammingProgramming_A004660840
5A005Mathematicseasyyes100100.9611050005JAN2025ExcellentRituMumbaiMATHEMATICSmathematicsMathematics_A00510010525
6A006Analyticsmediumno70450.917065006JAN2025AverageArjunPuneANALYTICSanalyticsAnalytics_A006700683
7A007Data_sciencehardno89100.878990007JAN2025GoodSureshHyderabadDATA_SCIENCEdata_scienceData_science_A007890945
8A008Sqlmediumno77300.937775008JAN2025GoodKavyaDelhiSQLsqlSql_A008770788
9A009Pythonhardno84250.928480009JAN2025GoodMeenaBangalorePYTHONpythonPython_A009840840
10A010Machine_learninghardyes91150.859195010JAN2025ExcellentRajeshChennaiMACHINE_LEARNINGmachine_learningMachine_learning_A010910998
11A011Aimediumno88280.908890011JAN2025GoodSunilHyderabadAIaiAi_A011880945
12A012Deep_learninghardyes95220.8695100012JAN2025ExcellentPriyaMumbaiDEEP_LEARNINGdeep_learningDeep_learning_A0129501050
13A013Statisticsmediumno82350.918270013JAN2025GoodPoojaDelhiSTATISTICSstatisticsStatistics_A013820735
14A014Businesseasyno74400.897460014JAN2025AverageAjayBangaloreBUSINESSbusinessBusiness_A014740630
15A015Financemediumno69480.886965015JAN2025AverageDeepakChennaiFINANCEfinanceFinance_A015690683

Explanation

ROUND

Rounds numeric value.

88.456 → 88.5

ABS

Absolute value function.

-10 → 10

Used earlier for fixing negative completion time.

ROUND For Fees

Sometimes organizations increase fees by certain percentage.

Fees * 1.05

Represents 5% increase.

17. Date Handling

data assessment_dates;

set assessment_numeric;

Next_Assessment_Date = intnx('month',Exam_Date,6,'same');

Months_Since_Exam = intck('month',Exam_Date,today());

Exam_Start_Date = mdy(1,1,2025);

Days_Between = intck('day',Exam_Date,today());

run;

proc print data=assessment_dates;

run;

OUTPUT:

ObsAssessment_IDSubjectDifficulty_LevelCheating_FlagAvg_ScoreCompletion_TimeReliability_IndexPercentageFeesExam_DateUtilization_ClassInvigilatorLocationSubject_UpperSubject_LowerSubject_CodeRounded_ScoreScore_DifferenceAdjusted_FeesNext_Assessment_DateMonths_Since_ExamExam_Start_DateDays_Between
1A001Mathematicseasyno85350.958550001JAN2025GoodRahulHyderabadMATHEMATICSmathematicsMathematics_A001850525239231423742427
2A002Sciencemediumno78400.907860002JAN2025GoodAnitaDelhiSCIENCEscienceScience_A002780630239241423742426
3A003Statisticshardyes92200.889270003JAN2025ExcellentKiranBangaloreSTATISTICSstatisticsStatistics_A003920735239251423742425
4A004Programmingmediumno66500.756680004JAN2025AverageMohanChennaiPROGRAMMINGprogrammingProgramming_A004660840239261423742424
5A005Mathematicseasyyes100100.9611050005JAN2025ExcellentRituMumbaiMATHEMATICSmathematicsMathematics_A00510010525239271423742423
6A006Analyticsmediumno70450.917065006JAN2025AverageArjunPuneANALYTICSanalyticsAnalytics_A006700683239281423742422
7A007Data_sciencehardno89100.878990007JAN2025GoodSureshHyderabadDATA_SCIENCEdata_scienceData_science_A007890945239291423742421
8A008Sqlmediumno77300.937775008JAN2025GoodKavyaDelhiSQLsqlSql_A008770788239301423742420
9A009Pythonhardno84250.928480009JAN2025GoodMeenaBangalorePYTHONpythonPython_A009840840239311423742419
10A010Machine_learninghardyes91150.859195010JAN2025ExcellentRajeshChennaiMACHINE_LEARNINGmachine_learningMachine_learning_A010910998239321423742418
11A011Aimediumno88280.908890011JAN2025GoodSunilHyderabadAIaiAi_A011880945239331423742417
12A012Deep_learninghardyes95220.8695100012JAN2025ExcellentPriyaMumbaiDEEP_LEARNINGdeep_learningDeep_learning_A0129501050239341423742416
13A013Statisticsmediumno82350.918270013JAN2025GoodPoojaDelhiSTATISTICSstatisticsStatistics_A013820735239351423742415
14A014Businesseasyno74400.897460014JAN2025AverageAjayBangaloreBUSINESSbusinessBusiness_A014740630239361423742414
15A015Financemediumno69480.886965015JAN2025AverageDeepakChennaiFINANCEfinanceFinance_A015690683239371423742413

Explanation Of Date Functions

MDY

Creates a SAS date value.

MDY(month,day,year)

Example

MDY(1,1,2025)

Represents January 1 2025.

INTCK

Counts intervals between two dates.

Example:

intck('month',Exam_Date,today())

Returns number of months.

INTNX

Calculates future or past date.

Example:

intnx('month',Exam_Date,6)

Adds six months.

18. Fraud Detection Macro Automation

%macro fraud_detection;

data fraud_analysis;

set assessment_dates;

length Fraud_Level $20;

if Completion_Time < 15 and Avg_Score > 90 then Fraud_Level="High Risk";

else if Cheating_Flag="yes" then Fraud_Level="Medium Risk";

else if Completion_Time > 60 then Fraud_Level="Suspicious Delay";

else Fraud_Level="Normal";

run;

proc print data=fraud_analysis;

run;

%mend fraud_detection;


%fraud_detection;

OUTPUT:

ObsAssessment_IDSubjectDifficulty_LevelCheating_FlagAvg_ScoreCompletion_TimeReliability_IndexPercentageFeesExam_DateUtilization_ClassInvigilatorLocationSubject_UpperSubject_LowerSubject_CodeRounded_ScoreScore_DifferenceAdjusted_FeesNext_Assessment_DateMonths_Since_ExamExam_Start_DateDays_BetweenFraud_Level
1A001Mathematicseasyno85350.958550001JAN2025GoodRahulHyderabadMATHEMATICSmathematicsMathematics_A001850525239231423742427Normal
2A002Sciencemediumno78400.907860002JAN2025GoodAnitaDelhiSCIENCEscienceScience_A002780630239241423742426Normal
3A003Statisticshardyes92200.889270003JAN2025ExcellentKiranBangaloreSTATISTICSstatisticsStatistics_A003920735239251423742425Medium Risk
4A004Programmingmediumno66500.756680004JAN2025AverageMohanChennaiPROGRAMMINGprogrammingProgramming_A004660840239261423742424Normal
5A005Mathematicseasyyes100100.9611050005JAN2025ExcellentRituMumbaiMATHEMATICSmathematicsMathematics_A00510010525239271423742423High Risk
6A006Analyticsmediumno70450.917065006JAN2025AverageArjunPuneANALYTICSanalyticsAnalytics_A006700683239281423742422Normal
7A007Data_sciencehardno89100.878990007JAN2025GoodSureshHyderabadDATA_SCIENCEdata_scienceData_science_A007890945239291423742421Normal
8A008Sqlmediumno77300.937775008JAN2025GoodKavyaDelhiSQLsqlSql_A008770788239301423742420Normal
9A009Pythonhardno84250.928480009JAN2025GoodMeenaBangalorePYTHONpythonPython_A009840840239311423742419Normal
10A010Machine_learninghardyes91150.859195010JAN2025ExcellentRajeshChennaiMACHINE_LEARNINGmachine_learningMachine_learning_A010910998239321423742418Medium Risk
11A011Aimediumno88280.908890011JAN2025GoodSunilHyderabadAIaiAi_A011880945239331423742417Normal
12A012Deep_learninghardyes95220.8695100012JAN2025ExcellentPriyaMumbaiDEEP_LEARNINGdeep_learningDeep_learning_A0129501050239341423742416Medium Risk
13A013Statisticsmediumno82350.918270013JAN2025GoodPoojaDelhiSTATISTICSstatisticsStatistics_A013820735239351423742415Normal
14A014Businesseasyno74400.897460014JAN2025AverageAjayBangaloreBUSINESSbusinessBusiness_A014740630239361423742414Normal
15A015Financemediumno69480.886965015JAN2025AverageDeepakChennaiFINANCEfinanceFinance_A015690683239371423742413Normal

Macros automate repetitive tasks.

Advantages:

Macros automate repetitive tasks.

·  Reduce code duplication

·  Improve automation

·  Increase efficiency

·  Enable parameterized analysis

In large datasets containing millions of rows, macros help create automated fraud detection systems.

19. Risk Classification

data fraud_risk_classification;

set fraud_analysis;

length Risk_Category $20;

if Fraud_Level="High Risk" then Risk_Category="Critical";

else if Fraud_Level="Medium Risk" then Risk_Category="Warning";

else if Fraud_Level="Suspicious Delay" then Risk_Category="Review";

else Risk_Category="Safe";

run;

proc print data=fraud_risk_classification;

run;

OUTPUT:

ObsAssessment_IDSubjectDifficulty_LevelCheating_FlagAvg_ScoreCompletion_TimeReliability_IndexPercentageFeesExam_DateUtilization_ClassInvigilatorLocationSubject_UpperSubject_LowerSubject_CodeRounded_ScoreScore_DifferenceAdjusted_FeesNext_Assessment_DateMonths_Since_ExamExam_Start_DateDays_BetweenFraud_LevelRisk_Category
1A001Mathematicseasyno85350.958550001JAN2025GoodRahulHyderabadMATHEMATICSmathematicsMathematics_A001850525239231423742427NormalSafe
2A002Sciencemediumno78400.907860002JAN2025GoodAnitaDelhiSCIENCEscienceScience_A002780630239241423742426NormalSafe
3A003Statisticshardyes92200.889270003JAN2025ExcellentKiranBangaloreSTATISTICSstatisticsStatistics_A003920735239251423742425Medium RiskWarning
4A004Programmingmediumno66500.756680004JAN2025AverageMohanChennaiPROGRAMMINGprogrammingProgramming_A004660840239261423742424NormalSafe
5A005Mathematicseasyyes100100.9611050005JAN2025ExcellentRituMumbaiMATHEMATICSmathematicsMathematics_A00510010525239271423742423High RiskCritical
6A006Analyticsmediumno70450.917065006JAN2025AverageArjunPuneANALYTICSanalyticsAnalytics_A006700683239281423742422NormalSafe
7A007Data_sciencehardno89100.878990007JAN2025GoodSureshHyderabadDATA_SCIENCEdata_scienceData_science_A007890945239291423742421NormalSafe
8A008Sqlmediumno77300.937775008JAN2025GoodKavyaDelhiSQLsqlSql_A008770788239301423742420NormalSafe
9A009Pythonhardno84250.928480009JAN2025GoodMeenaBangalorePYTHONpythonPython_A009840840239311423742419NormalSafe
10A010Machine_learninghardyes91150.859195010JAN2025ExcellentRajeshChennaiMACHINE_LEARNINGmachine_learningMachine_learning_A010910998239321423742418Medium RiskWarning
11A011Aimediumno88280.908890011JAN2025GoodSunilHyderabadAIaiAi_A011880945239331423742417NormalSafe
12A012Deep_learninghardyes95220.8695100012JAN2025ExcellentPriyaMumbaiDEEP_LEARNINGdeep_learningDeep_learning_A0129501050239341423742416Medium RiskWarning
13A013Statisticsmediumno82350.918270013JAN2025GoodPoojaDelhiSTATISTICSstatisticsStatistics_A013820735239351423742415NormalSafe
14A014Businesseasyno74400.897460014JAN2025AverageAjayBangaloreBUSINESSbusinessBusiness_A014740630239361423742414NormalSafe
15A015Financemediumno69480.886965015JAN2025AverageDeepakChennaiFINANCEfinanceFinance_A015690683239371423742413NormalSafe

20. PROC TRANSPOSE

proc transpose data=fraud_risk_classification out=assessment_transpose;

var Avg_Score Percentage Completion_Time;

run;

proc print data=assessment_transpose;

run;

OUTPUT:

Obs_NAME_COL1COL2COL3COL4COL5COL6COL7COL8COL9COL10COL11COL12COL13COL14COL15
1Avg_Score8578926610070897784918895827469
2Percentage8578926611070897784918895827469
3Completion_Time354020501045103025152822354048

Explanation

PROC TRANSPOSE converts columns into rows.

Example:

Original

ID

Score

Percentage

A001

85

85

After transpose

Variable

Value

Score

85

Percentage

85

21. PROC DATASETS DELETE

proc datasets library=work;

delete assessment_combined

       assessment_numeric

       assessment_dates;

quit;

LOG:

NOTE: Deleting WORK.ASSESSMENT_COMBINED (memtype=DATA).
NOTE: Deleting WORK.ASSESSMENT_NUMERIC (memtype=DATA).
NOTE: Deleting WORK.ASSESSMENT_DATES (memtype=DATA).
Explanation

PROC DATASETS manages SAS libraries.

DELETE removes unwanted datasets.

Benefits:

· Reduces memory usage

· Keeps workspace organized

· Improves performance

Final Analytical Dataset

The final dataset now includes:

·  Clean subject names

·  Corrected scores

·  Fraud detection flags

·  Risk classification

·  Monitoring information

·  Date analytics

·  Fee adjustments

This dataset is ready for:

·  Reporting

·  Dashboard visualization

·  Fraud investigation

Best Practices For SAS

·  Always validate raw data

·  Identify missing values

·  Standardize text variables

·  Validate numeric ranges

·  Correct date formats

·  Use macros for automation

·  Document code clearly

·  Remove temporary datasets

Conclusion

This project demonstrates a SAS workflow for managing and analyzing online assessment data with intentional errors such as invalid scores, missing values, negative completion times, and inconsistent subject names. Using SAS programming techniques, these issues were detected and corrected through character functions (PROPCASE, STRIP, CATX), numeric functions (ABS, COALESCE), and date functions (MDY, INTCK, INTNX).

Dataset operations such as SET, MERGE, PROC APPEND, and PROC TRANSPOSE were used to combine and restructure data. A SAS macro was implemented to automate fraud detection based on suspicious patterns like high scores with very short completion times. The final dataset is clean, validated, and ready for reporting, analysis, and fraud monitoring in online assessment systems.


SAS INTERVIEW QUESTIONS

1. What is the RETAIN statement used for?

The RETAIN statement in SAS is used to hold or preserve the value of a variable from one iteration of the DATA step to the next. Normally, SAS resets variables to missing at the beginning of each iteration, but RETAIN prevents that behavior.

It is commonly used for:

·       Running totals

·       Carrying previous values forward

·       Creating cumulative calculations

Example:

data example;
set sales;
retain total_sales 0;
total_sales + amount;
run;

Explanation:

·       retain total_sales 0; initializes and keeps the value across rows.

·       total_sales + amount; keeps adding the amount for each observation.

·       This creates a cumulative total.

2. What is the difference between DROP and KEEP?

Both DROP and KEEP control which variables appear in the output dataset.

Feature

DROP

KEEP

Purpose

Removes variables from output dataset

Keeps only specified variables

Logic

Excludes listed variables

Includes only listed variables

Usage

When most variables are needed

When only a few variables are needed

Example using DROP:

data newdata;
set olddata;
drop temp_var;
run;

 temp_var will not appear in the output dataset.

Example using KEEP:

data newdata;
set olddata;
keep id name salary;
run;

Only id, name, and salary will appear in the dataset.

3. What are automatic variables in SAS?

Automatic variables are variables created automatically by SAS during DATA step execution. They are not written to the output dataset, but they are available for processing.

The two most common automatic variables are:

Variable

Purpose

_N_

Counts the number of times the DATA step executes

_ERROR_

Indicates if an error occurred (1 = error, 0 = no error)

Example:

data test;
set sales;
obs_number = _N_;
run;

Explanation:

·       _N_ increases by 1 for every observation processed.

·       It helps track row numbers or iterations.

Example output:

N

 obs_number

1

1

2

2

3

3

Key Points

·       Created automatically by SAS

·       Not saved in the dataset

·       Useful for debugging and tracking iterations

--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

About the Author:

SAS Learning Hub is a data analytics and SAS programming platform focused on clinical, financial, and real-world data analysis. The content is created by professionals with academic training in Pharmaceutics and hands-on experience in Base SAS, PROC SQL, Macros, SDTM, and ADaM, providing practical and industry-relevant SAS learning resources.


Disclaimer:

The datasets and analysis in this article are created for educational and demonstration purposes only. They do not represent ONLINE EXAMS data.


Our Mission:

This blog provides industry-focused SAS programming tutorials and analytics projects covering finance, healthcare, and technology.


This project is suitable for:

·  Students learning SAS

·  Data analysts building portfolios

·  Professionals preparing for SAS interviews

·  Bloggers writing about analytics 

·  Clinical SAS Programmer

·  Research Data Analyst

·  Regulatory Data Validator

--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Follow Us On : 


 
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

--->Follow our blog for more SAS-based analytics projects and industry data models.

---> Support Us By Following Our Blog..

To deepen your understanding of SAS analytics, please refer to our other data science and industry-focused projects listed below:


--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

About Us | Contact Privacy Policy





Comments

Popular posts from this blog

412.Can We Build And Clean A University Course Analytics & Fraud Detection System In Sas Using Only Macros While Intentionally Creating And Fixing Errors?

718.Can We Design, Debug, Detect Fraud, And Optimize A Smart Parking System Using Advanced SAS Programming Techniques?

416.Can We Design, Debug, Detect Fraud, and Build an ADaM-Ready Autonomous Drone Flight Analytics System Using Advanced SAS Programming Techniques?