413.Can We Build and Debug a Digital Identity Verification Fraud Detection System in SAS Using Advanced Data Engineering and Macros?
Digital Identity Verification Fraud Detection System Using SAS
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
HERE IN THIS PROJECT WE USED THESE SAS STATEMENTS —DATA STEP | SET | MERGE | PROC SORT | PROC SQL | PROC APPEND | PROC TRANSPOSE | PROC DATASETS | DELETE | INPUT | DATALINES | LENGTH | FORMAT | IF-THEN-ELSE | ABS | ROUND | COALESCE | STRIP | TRIM | CAT | CATX | PROPCASE | UPCASE | LOWCASE | MDY | INTCK | INTNX | TODAY | PROC PRINT | RUN | QUIT | %MACRO | %MEND
Introduction
In
today’s digital economy, identity verification systems play a critical role in
banking, fintech, healthcare, e-commerce, and government services. Every time a
user uploads an Aadhaar card, passport, selfie, or biometric scan, a backend
system validates authenticity. However, digital identity systems are vulnerable
to fraud attempts such as spoofing, synthetic identity creation, and document
tampering.
In this
project, we will:
- Create a Digital Identity
Verification Dataset
- Introduce intentional
errors
- Detect and correct those
errors
- Apply macros for fraud
detection logic
- Use date functions (MDY,
INTCK, INTNX)
- Apply APPEND, TRANSPOSE,
SET, MERGE
- Use numeric and character
functions
- Demonstrate PROC DATASETS
DELETE
- Implement Utilization
Classification
- Apply Percentage
Calculations
Table Of Contents
- Business Context
- Dataset Structure
- Raw Dataset Creation With
Intentional Errors
- Error Identification
- Corrected Dataset Creation
- Date Handling Using MDY,
INTCK, INTNX
- Fraud Detection Macro
- Utilization Classification
Logic
- SET vs MERGE vs APPEND
- PROC TRANSPOSE
- Numeric Functions
- Character Functions
- COALESCE Usage
- PROC DATASETS DELETE
- Final Fraud Detection Output
- 15 Key Points About This
Project
- Benefits
- Summary
- Conclusion
Business Context
Imagine
you are working as a SAS Programmer in a fintech company in Hyderabad. The
company processes:
- Aadhaar eKYC
- Biometric verification
- Video KYC
- OTP-based authentication
Your role
is to:
- Identify high-risk
verification attempts
- Calculate fraud attempt
percentages
- Track processing delays
- Classify system utilization
- Correct dirty data
- Automate fraud detection
using macros
1. Raw Dataset Creation (With Intentional Errors)
title "Raw Digital Identity Verification Data With Errors";
data digital_identity_raw;
input Verification_ID $ Method:$12. Processing_Time Failure_Rate Fraud_Attempts
Confidence_Score Outcome $ Month $ Day Year;
format Verification_Date date9.;
Verification_Date = mdy(Month, Day, Year);
datalines;
V001 Aadhaar 35 0.02 1 95 Approved 01 15 2025
V002 biometric 50 0.10 3 88 Approved 02 10 2025
V003 OTP 20 . 0 90 Rejected 03 05 2025
V004 aadhaar 70 0.30 5 60 Rejected 13 20 2025
V005 VideoKYC -15 0.05 2 85 Approved 04 25 2025
V006 OTP 30 0.01 0 98 approved 05 11 2025
V007 Biometric 45 0.20 4 75 Rejected 06 18 2025
V008 Aadhaar 60 0.25 6 55 Rejected 07 21 2025
V009 VideoKYC 40 0.03 1 92 Approved 08 09 2025
V010 OTP 25 0.00 0 99 Approved 09 13 2025
V011 Aadhaar 90 0.50 8 40 Rejected 10 05 2025
V012 Biometric 55 0.15 3 80 Approved 11 22 2025
V013 OTP 15 0.02 0 96 Approved 12 02 2025
V014 Aadhaar 100 0.60 9 35 Rejected 01 17 2026
V015 VideoKYC 48 0.07 2 89 Approved 02 28 2026
;
run;
proc print data=digital_identity_raw;
run;
title;
OUTPUT:
| Obs | Verification_ID | Method | Processing_Time | Failure_Rate | Fraud_Attempts | Confidence_Score | Outcome | Month | Day | Year | Verification_Date |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | V001 | Aadhaar | 35 | 0.02 | 1 | 95 | Approved | 01 | 15 | 2025 | 15JAN2025 |
| 2 | V002 | biometric | 50 | 0.10 | 3 | 88 | Approved | 02 | 10 | 2025 | 10FEB2025 |
| 3 | V003 | OTP | 20 | . | 0 | 90 | Rejected | 03 | 5 | 2025 | 05MAR2025 |
| 4 | V004 | aadhaar | 70 | 0.30 | 5 | 60 | Rejected | 13 | 20 | 2025 | . |
| 5 | V005 | VideoKYC | -15 | 0.05 | 2 | 85 | Approved | 04 | 25 | 2025 | 25APR2025 |
| 6 | V006 | OTP | 30 | 0.01 | 0 | 98 | approved | 05 | 11 | 2025 | 11MAY2025 |
| 7 | V007 | Biometric | 45 | 0.20 | 4 | 75 | Rejected | 06 | 18 | 2025 | 18JUN2025 |
| 8 | V008 | Aadhaar | 60 | 0.25 | 6 | 55 | Rejected | 07 | 21 | 2025 | 21JUL2025 |
| 9 | V009 | VideoKYC | 40 | 0.03 | 1 | 92 | Approved | 08 | 9 | 2025 | 09AUG2025 |
| 10 | V010 | OTP | 25 | 0.00 | 0 | 99 | Approved | 09 | 13 | 2025 | 13SEP2025 |
| 11 | V011 | Aadhaar | 90 | 0.50 | 8 | 40 | Rejected | 10 | 5 | 2025 | 05OCT2025 |
| 12 | V012 | Biometric | 55 | 0.15 | 3 | 80 | Approved | 11 | 22 | 2025 | 22NOV2025 |
| 13 | V013 | OTP | 15 | 0.02 | 0 | 96 | Approved | 12 | 2 | 2025 | 02DEC2025 |
| 14 | V014 | Aadhaar | 100 | 0.60 | 9 | 35 | Rejected | 01 | 17 | 2026 | 17JAN2026 |
| 15 | V015 | VideoKYC | 48 | 0.07 | 2 | 89 | Approved | 02 | 28 | 2026 | 28FEB2026 |
❗ Intentional Errors
·
Month = 13 (Invalid date)
·
Negative Processing_Time
·
Missing Failure_Rate
·
Mixed case in Method
·
Mixed case in Outcome
·
Inconsistent naming: Aadhaar vs aadhaar
2. Corrected Dataset Creation
title "Corrected Digital Identity Dataset";
data digital_identity_clean;
set digital_identity_raw;
Method = propcase(strip(Method));
Outcome = upcase(strip(Outcome));
if Processing_Time < 0 then Processing_Time = abs(Processing_Time);
if Failure_Rate = . then Failure_Rate = 0;
if Month > 12 then Month = 12
Verification_Date = mdy(Month, Day, Year);
Fraud_Percentage = (Fraud_Attempts / 10) * 100;
run;
proc print data=digital_identity_clean;
run;
title;
OUTPUT:
| Obs | Verification_ID | Method | Processing_Time | Failure_Rate | Fraud_Attempts | Confidence_Score | Outcome | Month | Day | Year | Verification_Date | Fraud_Percentage |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | V001 | Aadhaar | 35 | 0.02 | 1 | 95 | APPROVED | 01 | 15 | 2025 | 15JAN2025 | 10 |
| 2 | V002 | Biometric | 50 | 0.10 | 3 | 88 | APPROVED | 02 | 10 | 2025 | 10FEB2025 | 30 |
| 3 | V003 | Otp | 20 | 0.00 | 0 | 90 | REJECTED | 03 | 5 | 2025 | 05MAR2025 | 0 |
| 4 | V004 | Aadhaar | 70 | 0.30 | 5 | 60 | REJECTED | 12 | 20 | 2025 | 20DEC2025 | 50 |
| 5 | V005 | Videokyc | 15 | 0.05 | 2 | 85 | APPROVED | 04 | 25 | 2025 | 25APR2025 | 20 |
| 6 | V006 | Otp | 30 | 0.01 | 0 | 98 | APPROVED | 05 | 11 | 2025 | 11MAY2025 | 0 |
| 7 | V007 | Biometric | 45 | 0.20 | 4 | 75 | REJECTED | 06 | 18 | 2025 | 18JUN2025 | 40 |
| 8 | V008 | Aadhaar | 60 | 0.25 | 6 | 55 | REJECTED | 07 | 21 | 2025 | 21JUL2025 | 60 |
| 9 | V009 | Videokyc | 40 | 0.03 | 1 | 92 | APPROVED | 08 | 9 | 2025 | 09AUG2025 | 10 |
| 10 | V010 | Otp | 25 | 0.00 | 0 | 99 | APPROVED | 09 | 13 | 2025 | 13SEP2025 | 0 |
| 11 | V011 | Aadhaar | 90 | 0.50 | 8 | 40 | REJECTED | 10 | 5 | 2025 | 05OCT2025 | 80 |
| 12 | V012 | Biometric | 55 | 0.15 | 3 | 80 | APPROVED | 11 | 22 | 2025 | 22NOV2025 | 30 |
| 13 | V013 | Otp | 15 | 0.02 | 0 | 96 | APPROVED | 12 | 2 | 2025 | 02DEC2025 | 0 |
| 14 | V014 | Aadhaar | 100 | 0.60 | 9 | 35 | REJECTED | 01 | 17 | 2026 | 17JAN2026 | 90 |
| 15 | V015 | Videokyc | 48 | 0.07 | 2 | 89 | APPROVED | 02 | 28 | 2026 | 28FEB2026 | 20 |
Code Explanation
SET
·
Reads dataset into new dataset
·
Used for transformation
PROPCASE
·
Converts Aadhaar → Aadhaar (proper case)
STRIP
·
Removes leading/trailing spaces
UPCASE
·
Standardizes Approved vs approved
ABS
·
Converts negative to positive
MDY
·
Creates SAS date value
3. Date Calculations Using INTCK and INTNX
data digital_identity_dates;
set digital_identity_clean;
Today_Date = today();
Days_Since_Verification = intck('day', Verification_Date, Today_Date);
Next_Review_Date = intnx('month', Verification_Date, 3, 'same');
format Today_Date Next_Review_Date date9.;
run;
proc print data=digital_identity_dates;
run;
OUTPUT:
| Obs | Verification_ID | Method | Processing_Time | Failure_Rate | Fraud_Attempts | Confidence_Score | Outcome | Month | Day | Year | Verification_Date | Fraud_Percentage | Today_Date | Days_Since_Verification | Next_Review_Date |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | V001 | Aadhaar | 35 | 0.02 | 1 | 95 | APPROVED | 01 | 15 | 2025 | 15JAN2025 | 10 | 27FEB2026 | 408 | 15APR2025 |
| 2 | V002 | Biometric | 50 | 0.10 | 3 | 88 | APPROVED | 02 | 10 | 2025 | 10FEB2025 | 30 | 27FEB2026 | 382 | 10MAY2025 |
| 3 | V003 | Otp | 20 | 0.00 | 0 | 90 | REJECTED | 03 | 5 | 2025 | 05MAR2025 | 0 | 27FEB2026 | 359 | 05JUN2025 |
| 4 | V004 | Aadhaar | 70 | 0.30 | 5 | 60 | REJECTED | 12 | 20 | 2025 | 20DEC2025 | 50 | 27FEB2026 | 69 | 20MAR2026 |
| 5 | V005 | Videokyc | 15 | 0.05 | 2 | 85 | APPROVED | 04 | 25 | 2025 | 25APR2025 | 20 | 27FEB2026 | 308 | 25JUL2025 |
| 6 | V006 | Otp | 30 | 0.01 | 0 | 98 | APPROVED | 05 | 11 | 2025 | 11MAY2025 | 0 | 27FEB2026 | 292 | 11AUG2025 |
| 7 | V007 | Biometric | 45 | 0.20 | 4 | 75 | REJECTED | 06 | 18 | 2025 | 18JUN2025 | 40 | 27FEB2026 | 254 | 18SEP2025 |
| 8 | V008 | Aadhaar | 60 | 0.25 | 6 | 55 | REJECTED | 07 | 21 | 2025 | 21JUL2025 | 60 | 27FEB2026 | 221 | 21OCT2025 |
| 9 | V009 | Videokyc | 40 | 0.03 | 1 | 92 | APPROVED | 08 | 9 | 2025 | 09AUG2025 | 10 | 27FEB2026 | 202 | 09NOV2025 |
| 10 | V010 | Otp | 25 | 0.00 | 0 | 99 | APPROVED | 09 | 13 | 2025 | 13SEP2025 | 0 | 27FEB2026 | 167 | 13DEC2025 |
| 11 | V011 | Aadhaar | 90 | 0.50 | 8 | 40 | REJECTED | 10 | 5 | 2025 | 05OCT2025 | 80 | 27FEB2026 | 145 | 05JAN2026 |
| 12 | V012 | Biometric | 55 | 0.15 | 3 | 80 | APPROVED | 11 | 22 | 2025 | 22NOV2025 | 30 | 27FEB2026 | 97 | 22FEB2026 |
| 13 | V013 | Otp | 15 | 0.02 | 0 | 96 | APPROVED | 12 | 2 | 2025 | 02DEC2025 | 0 | 27FEB2026 | 87 | 02MAR2026 |
| 14 | V014 | Aadhaar | 100 | 0.60 | 9 | 35 | REJECTED | 01 | 17 | 2026 | 17JAN2026 | 90 | 27FEB2026 | 41 | 17APR2026 |
| 15 | V015 | Videokyc | 48 | 0.07 | 2 | 89 | APPROVED | 02 | 28 | 2026 | 28FEB2026 | 20 | 27FEB2026 | -1 | 28MAY2026 |
INTCK
Calculates difference between two dates.
INTNX
Moves
date forward by interval.
4. Utilization Classification
data digital_identity_utilization;
set digital_identity_dates;
length Utilization_Class $12.;
if Processing_Time <= 30 then Utilization_Class="Fast";
else if Processing_Time <= 60 then Utilization_Class="Moderate";
else Utilization_Class="Slow";
run;
proc print data=digital_identity_utilization;
run;
OUTPUT:
| Obs | Verification_ID | Method | Processing_Time | Failure_Rate | Fraud_Attempts | Confidence_Score | Outcome | Month | Day | Year | Verification_Date | Fraud_Percentage | Today_Date | Days_Since_Verification | Next_Review_Date | Utilization_Class |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | V001 | Aadhaar | 35 | 0.02 | 1 | 95 | APPROVED | 01 | 15 | 2025 | 15JAN2025 | 10 | 27FEB2026 | 408 | 15APR2025 | Moderate |
| 2 | V002 | Biometric | 50 | 0.10 | 3 | 88 | APPROVED | 02 | 10 | 2025 | 10FEB2025 | 30 | 27FEB2026 | 382 | 10MAY2025 | Moderate |
| 3 | V003 | Otp | 20 | 0.00 | 0 | 90 | REJECTED | 03 | 5 | 2025 | 05MAR2025 | 0 | 27FEB2026 | 359 | 05JUN2025 | Fast |
| 4 | V004 | Aadhaar | 70 | 0.30 | 5 | 60 | REJECTED | 12 | 20 | 2025 | 20DEC2025 | 50 | 27FEB2026 | 69 | 20MAR2026 | Slow |
| 5 | V005 | Videokyc | 15 | 0.05 | 2 | 85 | APPROVED | 04 | 25 | 2025 | 25APR2025 | 20 | 27FEB2026 | 308 | 25JUL2025 | Fast |
| 6 | V006 | Otp | 30 | 0.01 | 0 | 98 | APPROVED | 05 | 11 | 2025 | 11MAY2025 | 0 | 27FEB2026 | 292 | 11AUG2025 | Fast |
| 7 | V007 | Biometric | 45 | 0.20 | 4 | 75 | REJECTED | 06 | 18 | 2025 | 18JUN2025 | 40 | 27FEB2026 | 254 | 18SEP2025 | Moderate |
| 8 | V008 | Aadhaar | 60 | 0.25 | 6 | 55 | REJECTED | 07 | 21 | 2025 | 21JUL2025 | 60 | 27FEB2026 | 221 | 21OCT2025 | Moderate |
| 9 | V009 | Videokyc | 40 | 0.03 | 1 | 92 | APPROVED | 08 | 9 | 2025 | 09AUG2025 | 10 | 27FEB2026 | 202 | 09NOV2025 | Moderate |
| 10 | V010 | Otp | 25 | 0.00 | 0 | 99 | APPROVED | 09 | 13 | 2025 | 13SEP2025 | 0 | 27FEB2026 | 167 | 13DEC2025 | Fast |
| 11 | V011 | Aadhaar | 90 | 0.50 | 8 | 40 | REJECTED | 10 | 5 | 2025 | 05OCT2025 | 80 | 27FEB2026 | 145 | 05JAN2026 | Slow |
| 12 | V012 | Biometric | 55 | 0.15 | 3 | 80 | APPROVED | 11 | 22 | 2025 | 22NOV2025 | 30 | 27FEB2026 | 97 | 22FEB2026 | Moderate |
| 13 | V013 | Otp | 15 | 0.02 | 0 | 96 | APPROVED | 12 | 2 | 2025 | 02DEC2025 | 0 | 27FEB2026 | 87 | 02MAR2026 | Fast |
| 14 | V014 | Aadhaar | 100 | 0.60 | 9 | 35 | REJECTED | 01 | 17 | 2026 | 17JAN2026 | 90 | 27FEB2026 | 41 | 17APR2026 | Slow |
| 15 | V015 | Videokyc | 48 | 0.07 | 2 | 89 | APPROVED | 02 | 28 | 2026 | 28FEB2026 | 20 | 27FEB2026 | -1 | 28MAY2026 | Moderate |
5. Fraud Detection Macro
%macro fraud_logic(input=, output=);
data &output;
set &input;
if Fraud_Attempts >= 5 and Confidence_Score < 70 then Fraud_Flag=1;
else if Failure_Rate > 0.25 then Fraud_Flag=1;
else Fraud_Flag=0;
run;
proc print data=&output;
run;
%mend;
%fraud_logic(input=digital_identity_utilization,
output=digital_identity_fraud);
OUTPUT:
| Obs | Verification_ID | Method | Processing_Time | Failure_Rate | Fraud_Attempts | Confidence_Score | Outcome | Month | Day | Year | Verification_Date | Fraud_Percentage | Today_Date | Days_Since_Verification | Next_Review_Date | Utilization_Class | Fraud_Flag |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | V001 | Aadhaar | 35 | 0.02 | 1 | 95 | APPROVED | 01 | 15 | 2025 | 15JAN2025 | 10 | 27FEB2026 | 408 | 15APR2025 | Moderate | 0 |
| 2 | V002 | Biometric | 50 | 0.10 | 3 | 88 | APPROVED | 02 | 10 | 2025 | 10FEB2025 | 30 | 27FEB2026 | 382 | 10MAY2025 | Moderate | 0 |
| 3 | V003 | Otp | 20 | 0.00 | 0 | 90 | REJECTED | 03 | 5 | 2025 | 05MAR2025 | 0 | 27FEB2026 | 359 | 05JUN2025 | Fast | 0 |
| 4 | V004 | Aadhaar | 70 | 0.30 | 5 | 60 | REJECTED | 12 | 20 | 2025 | 20DEC2025 | 50 | 27FEB2026 | 69 | 20MAR2026 | Slow | 1 |
| 5 | V005 | Videokyc | 15 | 0.05 | 2 | 85 | APPROVED | 04 | 25 | 2025 | 25APR2025 | 20 | 27FEB2026 | 308 | 25JUL2025 | Fast | 0 |
| 6 | V006 | Otp | 30 | 0.01 | 0 | 98 | APPROVED | 05 | 11 | 2025 | 11MAY2025 | 0 | 27FEB2026 | 292 | 11AUG2025 | Fast | 0 |
| 7 | V007 | Biometric | 45 | 0.20 | 4 | 75 | REJECTED | 06 | 18 | 2025 | 18JUN2025 | 40 | 27FEB2026 | 254 | 18SEP2025 | Moderate | 0 |
| 8 | V008 | Aadhaar | 60 | 0.25 | 6 | 55 | REJECTED | 07 | 21 | 2025 | 21JUL2025 | 60 | 27FEB2026 | 221 | 21OCT2025 | Moderate | 1 |
| 9 | V009 | Videokyc | 40 | 0.03 | 1 | 92 | APPROVED | 08 | 9 | 2025 | 09AUG2025 | 10 | 27FEB2026 | 202 | 09NOV2025 | Moderate | 0 |
| 10 | V010 | Otp | 25 | 0.00 | 0 | 99 | APPROVED | 09 | 13 | 2025 | 13SEP2025 | 0 | 27FEB2026 | 167 | 13DEC2025 | Fast | 0 |
| 11 | V011 | Aadhaar | 90 | 0.50 | 8 | 40 | REJECTED | 10 | 5 | 2025 | 05OCT2025 | 80 | 27FEB2026 | 145 | 05JAN2026 | Slow | 1 |
| 12 | V012 | Biometric | 55 | 0.15 | 3 | 80 | APPROVED | 11 | 22 | 2025 | 22NOV2025 | 30 | 27FEB2026 | 97 | 22FEB2026 | Moderate | 0 |
| 13 | V013 | Otp | 15 | 0.02 | 0 | 96 | APPROVED | 12 | 2 | 2025 | 02DEC2025 | 0 | 27FEB2026 | 87 | 02MAR2026 | Fast | 0 |
| 14 | V014 | Aadhaar | 100 | 0.60 | 9 | 35 | REJECTED | 01 | 17 | 2026 | 17JAN2026 | 90 | 27FEB2026 | 41 | 17APR2026 | Slow | 1 |
| 15 | V015 | Videokyc | 48 | 0.07 | 2 | 89 | APPROVED | 02 | 28 | 2026 | 28FEB2026 | 20 | 27FEB2026 | -1 | 28MAY2026 | Moderate | 0 |
·
Reusable
·
Parameter driven
·
Scalable for multiple datasets
6. Using MERGE
data method_risk;
input Method:$12. Risk_Level $;
datalines;
Aadhaar Medium
Biometric High
Otp Low
Videokyc Medium
;
run;
proc print data=method_risk;
run;
OUTPUT:
| Obs | Method | Risk_Level |
|---|---|---|
| 1 | Aadhaar | Medium |
| 2 | Biometric | High |
| 3 | Otp | Low |
| 4 | Videokyc | Medium |
proc sort data=digital_identity_fraud;
by Method;
run;
proc print data=digital_identity_fraud;
run;
OUTPUT:
| Obs | Verification_ID | Method | Processing_Time | Failure_Rate | Fraud_Attempts | Confidence_Score | Outcome | Month | Day | Year | Verification_Date | Fraud_Percentage | Today_Date | Days_Since_Verification | Next_Review_Date | Utilization_Class | Fraud_Flag |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | V001 | Aadhaar | 35 | 0.02 | 1 | 95 | APPROVED | 01 | 15 | 2025 | 15JAN2025 | 10 | 27FEB2026 | 408 | 15APR2025 | Moderate | 0 |
| 2 | V004 | Aadhaar | 70 | 0.30 | 5 | 60 | REJECTED | 12 | 20 | 2025 | 20DEC2025 | 50 | 27FEB2026 | 69 | 20MAR2026 | Slow | 1 |
| 3 | V008 | Aadhaar | 60 | 0.25 | 6 | 55 | REJECTED | 07 | 21 | 2025 | 21JUL2025 | 60 | 27FEB2026 | 221 | 21OCT2025 | Moderate | 1 |
| 4 | V011 | Aadhaar | 90 | 0.50 | 8 | 40 | REJECTED | 10 | 5 | 2025 | 05OCT2025 | 80 | 27FEB2026 | 145 | 05JAN2026 | Slow | 1 |
| 5 | V014 | Aadhaar | 100 | 0.60 | 9 | 35 | REJECTED | 01 | 17 | 2026 | 17JAN2026 | 90 | 27FEB2026 | 41 | 17APR2026 | Slow | 1 |
| 6 | V002 | Biometric | 50 | 0.10 | 3 | 88 | APPROVED | 02 | 10 | 2025 | 10FEB2025 | 30 | 27FEB2026 | 382 | 10MAY2025 | Moderate | 0 |
| 7 | V007 | Biometric | 45 | 0.20 | 4 | 75 | REJECTED | 06 | 18 | 2025 | 18JUN2025 | 40 | 27FEB2026 | 254 | 18SEP2025 | Moderate | 0 |
| 8 | V012 | Biometric | 55 | 0.15 | 3 | 80 | APPROVED | 11 | 22 | 2025 | 22NOV2025 | 30 | 27FEB2026 | 97 | 22FEB2026 | Moderate | 0 |
| 9 | V003 | Otp | 20 | 0.00 | 0 | 90 | REJECTED | 03 | 5 | 2025 | 05MAR2025 | 0 | 27FEB2026 | 359 | 05JUN2025 | Fast | 0 |
| 10 | V006 | Otp | 30 | 0.01 | 0 | 98 | APPROVED | 05 | 11 | 2025 | 11MAY2025 | 0 | 27FEB2026 | 292 | 11AUG2025 | Fast | 0 |
| 11 | V010 | Otp | 25 | 0.00 | 0 | 99 | APPROVED | 09 | 13 | 2025 | 13SEP2025 | 0 | 27FEB2026 | 167 | 13DEC2025 | Fast | 0 |
| 12 | V013 | Otp | 15 | 0.02 | 0 | 96 | APPROVED | 12 | 2 | 2025 | 02DEC2025 | 0 | 27FEB2026 | 87 | 02MAR2026 | Fast | 0 |
| 13 | V005 | Videokyc | 15 | 0.05 | 2 | 85 | APPROVED | 04 | 25 | 2025 | 25APR2025 | 20 | 27FEB2026 | 308 | 25JUL2025 | Fast | 0 |
| 14 | V009 | Videokyc | 40 | 0.03 | 1 | 92 | APPROVED | 08 | 9 | 2025 | 09AUG2025 | 10 | 27FEB2026 | 202 | 09NOV2025 | Moderate | 0 |
| 15 | V015 | Videokyc | 48 | 0.07 | 2 | 89 | APPROVED | 02 | 28 | 2026 | 28FEB2026 | 20 | 27FEB2026 | -1 | 28MAY2026 | Moderate | 0 |
proc sort data=method_risk;
by Method;
run;
proc print data=method_risk;
run;
OUTPUT:
| Obs | Method | Risk_Level |
|---|---|---|
| 1 | Aadhaar | Medium |
| 2 | Biometric | High |
| 3 | Otp | Low |
| 4 | Videokyc | Medium |
data digital_identity_merged;
merge digital_identity_fraud method_risk;
by Method;
run;
proc print data=digital_identity_merged;
run;
OUTPUT:
| Obs | Verification_ID | Method | Processing_Time | Failure_Rate | Fraud_Attempts | Confidence_Score | Outcome | Month | Day | Year | Verification_Date | Fraud_Percentage | Today_Date | Days_Since_Verification | Next_Review_Date | Utilization_Class | Fraud_Flag | Risk_Level |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | V001 | Aadhaar | 35 | 0.02 | 1 | 95 | APPROVED | 01 | 15 | 2025 | 15JAN2025 | 10 | 27FEB2026 | 408 | 15APR2025 | Moderate | 0 | Medium |
| 2 | V004 | Aadhaar | 70 | 0.30 | 5 | 60 | REJECTED | 12 | 20 | 2025 | 20DEC2025 | 50 | 27FEB2026 | 69 | 20MAR2026 | Slow | 1 | Medium |
| 3 | V008 | Aadhaar | 60 | 0.25 | 6 | 55 | REJECTED | 07 | 21 | 2025 | 21JUL2025 | 60 | 27FEB2026 | 221 | 21OCT2025 | Moderate | 1 | Medium |
| 4 | V011 | Aadhaar | 90 | 0.50 | 8 | 40 | REJECTED | 10 | 5 | 2025 | 05OCT2025 | 80 | 27FEB2026 | 145 | 05JAN2026 | Slow | 1 | Medium |
| 5 | V014 | Aadhaar | 100 | 0.60 | 9 | 35 | REJECTED | 01 | 17 | 2026 | 17JAN2026 | 90 | 27FEB2026 | 41 | 17APR2026 | Slow | 1 | Medium |
| 6 | Biometric | . | . | . | . | . | . | . | . | . | . | . | . | High | ||||
| 7 | V002 | Biometric | 50 | 0.10 | 3 | 88 | APPROVED | 02 | 10 | 2025 | 10FEB2025 | 30 | 27FEB2026 | 382 | 10MAY2025 | Moderate | 0 | |
| 8 | V007 | Biometric | 45 | 0.20 | 4 | 75 | REJECTED | 06 | 18 | 2025 | 18JUN2025 | 40 | 27FEB2026 | 254 | 18SEP2025 | Moderate | 0 | |
| 9 | V012 | Biometric | 55 | 0.15 | 3 | 80 | APPROVED | 11 | 22 | 2025 | 22NOV2025 | 30 | 27FEB2026 | 97 | 22FEB2026 | Moderate | 0 | |
| 10 | V003 | Otp | 20 | 0.00 | 0 | 90 | REJECTED | 03 | 5 | 2025 | 05MAR2025 | 0 | 27FEB2026 | 359 | 05JUN2025 | Fast | 0 | Low |
| 11 | V006 | Otp | 30 | 0.01 | 0 | 98 | APPROVED | 05 | 11 | 2025 | 11MAY2025 | 0 | 27FEB2026 | 292 | 11AUG2025 | Fast | 0 | Low |
| 12 | V010 | Otp | 25 | 0.00 | 0 | 99 | APPROVED | 09 | 13 | 2025 | 13SEP2025 | 0 | 27FEB2026 | 167 | 13DEC2025 | Fast | 0 | Low |
| 13 | V013 | Otp | 15 | 0.02 | 0 | 96 | APPROVED | 12 | 2 | 2025 | 02DEC2025 | 0 | 27FEB2026 | 87 | 02MAR2026 | Fast | 0 | Low |
| 14 | V005 | Videokyc | 15 | 0.05 | 2 | 85 | APPROVED | 04 | 25 | 2025 | 25APR2025 | 20 | 27FEB2026 | 308 | 25JUL2025 | Fast | 0 | Medium |
| 15 | V009 | Videokyc | 40 | 0.03 | 1 | 92 | APPROVED | 08 | 9 | 2025 | 09AUG2025 | 10 | 27FEB2026 | 202 | 09NOV2025 | Moderate | 0 | Medium |
| 16 | V015 | Videokyc | 48 | 0.07 | 2 | 89 | APPROVED | 02 | 28 | 2026 | 28FEB2026 | 20 | 27FEB2026 | -1 | 28MAY2026 | Moderate | 0 | Medium |
MERGE
Combines datasets horizontally.
7. Using Same Datasets For APPEND
proc append base=digital_identity_merged
data=digital_identity_merged force;
run;
proc print data=digital_identity_merged;
run;
OUTPUT:
| Obs | Verification_ID | Method | Processing_Time | Failure_Rate | Fraud_Attempts | Confidence_Score | Outcome | Month | Day | Year | Verification_Date | Fraud_Percentage | Today_Date | Days_Since_Verification | Next_Review_Date | Utilization_Class | Fraud_Flag | Risk_Level |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | V001 | Aadhaar | 35 | 0.02 | 1 | 95 | APPROVED | 01 | 15 | 2025 | 15JAN2025 | 10 | 27FEB2026 | 408 | 15APR2025 | Moderate | 0 | Medium |
| 2 | V004 | Aadhaar | 70 | 0.30 | 5 | 60 | REJECTED | 12 | 20 | 2025 | 20DEC2025 | 50 | 27FEB2026 | 69 | 20MAR2026 | Slow | 1 | Medium |
| 3 | V008 | Aadhaar | 60 | 0.25 | 6 | 55 | REJECTED | 07 | 21 | 2025 | 21JUL2025 | 60 | 27FEB2026 | 221 | 21OCT2025 | Moderate | 1 | Medium |
| 4 | V011 | Aadhaar | 90 | 0.50 | 8 | 40 | REJECTED | 10 | 5 | 2025 | 05OCT2025 | 80 | 27FEB2026 | 145 | 05JAN2026 | Slow | 1 | Medium |
| 5 | V014 | Aadhaar | 100 | 0.60 | 9 | 35 | REJECTED | 01 | 17 | 2026 | 17JAN2026 | 90 | 27FEB2026 | 41 | 17APR2026 | Slow | 1 | Medium |
| 6 | Biometri | . | . | . | . | . | . | . | . | . | . | . | . | High | ||||
| 7 | V002 | Biometric | 50 | 0.10 | 3 | 88 | APPROVED | 02 | 10 | 2025 | 10FEB2025 | 30 | 27FEB2026 | 382 | 10MAY2025 | Moderate | 0 | |
| 8 | V007 | Biometric | 45 | 0.20 | 4 | 75 | REJECTED | 06 | 18 | 2025 | 18JUN2025 | 40 | 27FEB2026 | 254 | 18SEP2025 | Moderate | 0 | |
| 9 | V012 | Biometric | 55 | 0.15 | 3 | 80 | APPROVED | 11 | 22 | 2025 | 22NOV2025 | 30 | 27FEB2026 | 97 | 22FEB2026 | Moderate | 0 | |
| 10 | V003 | Otp | 20 | 0.00 | 0 | 90 | REJECTED | 03 | 5 | 2025 | 05MAR2025 | 0 | 27FEB2026 | 359 | 05JUN2025 | Fast | 0 | Low |
| 11 | V006 | Otp | 30 | 0.01 | 0 | 98 | APPROVED | 05 | 11 | 2025 | 11MAY2025 | 0 | 27FEB2026 | 292 | 11AUG2025 | Fast | 0 | Low |
| 12 | V010 | Otp | 25 | 0.00 | 0 | 99 | APPROVED | 09 | 13 | 2025 | 13SEP2025 | 0 | 27FEB2026 | 167 | 13DEC2025 | Fast | 0 | Low |
| 13 | V013 | Otp | 15 | 0.02 | 0 | 96 | APPROVED | 12 | 2 | 2025 | 02DEC2025 | 0 | 27FEB2026 | 87 | 02MAR2026 | Fast | 0 | Low |
| 14 | V005 | Videokyc | 15 | 0.05 | 2 | 85 | APPROVED | 04 | 25 | 2025 | 25APR2025 | 20 | 27FEB2026 | 308 | 25JUL2025 | Fast | 0 | Medium |
| 15 | V009 | Videokyc | 40 | 0.03 | 1 | 92 | APPROVED | 08 | 9 | 2025 | 09AUG2025 | 10 | 27FEB2026 | 202 | 09NOV2025 | Moderate | 0 | Medium |
| 16 | V015 | Videokyc | 48 | 0.07 | 2 | 89 | APPROVED | 02 | 28 | 2026 | 28FEB2026 | 20 | 27FEB2026 | -1 | 28MAY2026 | Moderate | 0 | Medium |
| 17 | V001 | Aadhaar | 35 | 0.02 | 1 | 95 | APPROVED | 01 | 15 | 2025 | 15JAN2025 | 10 | 27FEB2026 | 408 | 15APR2025 | Moderate | 0 | Medium |
| 18 | V004 | Aadhaar | 70 | 0.30 | 5 | 60 | REJECTED | 12 | 20 | 2025 | 20DEC2025 | 50 | 27FEB2026 | 69 | 20MAR2026 | Slow | 1 | Medium |
| 19 | V008 | Aadhaar | 60 | 0.25 | 6 | 55 | REJECTED | 07 | 21 | 2025 | 21JUL2025 | 60 | 27FEB2026 | 221 | 21OCT2025 | Moderate | 1 | Medium |
| 20 | V011 | Aadhaar | 90 | 0.50 | 8 | 40 | REJECTED | 10 | 5 | 2025 | 05OCT2025 | 80 | 27FEB2026 | 145 | 05JAN2026 | Slow | 1 | Medium |
| 21 | V014 | Aadhaar | 100 | 0.60 | 9 | 35 | REJECTED | 01 | 17 | 2026 | 17JAN2026 | 90 | 27FEB2026 | 41 | 17APR2026 | Slow | 1 | Medium |
| 22 | Biometri | . | . | . | . | . | . | . | . | . | . | . | . | High | ||||
| 23 | V002 | Biometric | 50 | 0.10 | 3 | 88 | APPROVED | 02 | 10 | 2025 | 10FEB2025 | 30 | 27FEB2026 | 382 | 10MAY2025 | Moderate | 0 | |
| 24 | V007 | Biometric | 45 | 0.20 | 4 | 75 | REJECTED | 06 | 18 | 2025 | 18JUN2025 | 40 | 27FEB2026 | 254 | 18SEP2025 | Moderate | 0 | |
| 25 | V012 | Biometric | 55 | 0.15 | 3 | 80 | APPROVED | 11 | 22 | 2025 | 22NOV2025 | 30 | 27FEB2026 | 97 | 22FEB2026 | Moderate | 0 | |
| 26 | V003 | Otp | 20 | 0.00 | 0 | 90 | REJECTED | 03 | 5 | 2025 | 05MAR2025 | 0 | 27FEB2026 | 359 | 05JUN2025 | Fast | 0 | Low |
| 27 | V006 | Otp | 30 | 0.01 | 0 | 98 | APPROVED | 05 | 11 | 2025 | 11MAY2025 | 0 | 27FEB2026 | 292 | 11AUG2025 | Fast | 0 | Low |
| 28 | V010 | Otp | 25 | 0.00 | 0 | 99 | APPROVED | 09 | 13 | 2025 | 13SEP2025 | 0 | 27FEB2026 | 167 | 13DEC2025 | Fast | 0 | Low |
| 29 | V013 | Otp | 15 | 0.02 | 0 | 96 | APPROVED | 12 | 2 | 2025 | 02DEC2025 | 0 | 27FEB2026 | 87 | 02MAR2026 | Fast | 0 | Low |
| 30 | V005 | Videokyc | 15 | 0.05 | 2 | 85 | APPROVED | 04 | 25 | 2025 | 25APR2025 | 20 | 27FEB2026 | 308 | 25JUL2025 | Fast | 0 | Medium |
| 31 | V009 | Videokyc | 40 | 0.03 | 1 | 92 | APPROVED | 08 | 9 | 2025 | 09AUG2025 | 10 | 27FEB2026 | 202 | 09NOV2025 | Moderate | 0 | Medium |
| 32 | V015 | Videokyc | 48 | 0.07 | 2 | 89 | APPROVED | 02 | 28 | 2026 | 28FEB2026 | 20 | 27FEB2026 | -1 | 28MAY2026 | Moderate | 0 | Medium |
·
You are appending the dataset to itself
·
It will duplicate all observations
·
It is logically incorrect in a real project
·
It may cause unintended data inflation
8. Append Different Datasets
Step 1: Create New Dataset
title "New Incoming Digital Identity Records";
data digital_identity_new;
input Verification_ID $ Method:$12. Processing_Time Failure_Rate
Fraud_Attempts Confidence_Score Outcome $ Month Day Year;
format Verification_Date date9.;
Verification_Date = mdy(Month, Day, Year);
datalines;
V016 Aadhaar 42 0.04 1 93 Approved 03 10 2026
V017 Biometric 75 0.35 6 58 Rejected 04 15 2026
V018 OTP 28 0.01 0 97 Approved 05 21 2026
;
run;
proc print data=digital_identity_new;
run;
title;
OUTPUT:
| Obs | Verification_ID | Method | Processing_Time | Failure_Rate | Fraud_Attempts | Confidence_Score | Outcome | Month | Day | Year | Verification_Date |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | V016 | Aadhaar | 42 | 0.04 | 1 | 93 | Approved | 3 | 10 | 2026 | 10MAR2026 |
| 2 | V017 | Biometric | 75 | 0.35 | 6 | 58 | Rejected | 4 | 15 | 2026 | 15APR2026 |
| 3 | V018 | OTP | 28 | 0.01 | 0 | 97 | Approved | 5 | 21 | 2026 | 21MAY2026 |
Step 2: Standardize Structure Before Append
data digital_identity_new_clean;
set digital_identity_new;
Method = propcase(strip(Method));
Outcome = upcase(strip(Outcome));
if Processing_Time < 0 then Processing_Time = abs(Processing_Time);
if Failure_Rate = . then Failure_Rate = 0;
Fraud_Percentage = (Fraud_Attempts/10)*100;
run;
proc print data=digital_identity_new_clean;
run;
OUTPUT:
| Obs | Verification_ID | Method | Processing_Time | Failure_Rate | Fraud_Attempts | Confidence_Score | Outcome | Month | Day | Year | Verification_Date | Fraud_Percentage |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | V016 | Aadhaar | 42 | 0.04 | 1 | 93 | APPROVED | 3 | 10 | 2026 | 10MAR2026 | 10 |
| 2 | V017 | Biometric | 75 | 0.35 | 6 | 58 | REJECTED | 4 | 15 | 2026 | 15APR2026 | 60 |
| 3 | V018 | Otp | 28 | 0.01 | 0 | 97 | APPROVED | 5 | 21 | 2026 | 21MAY2026 | 0 |
· Before APPEND, variable structure must match.
Step 3: Correct PROC APPEND Usage
title "Appending New Records To Main Dataset";
proc append base=digital_identity_merged
data=digital_identity_new_clean
force;
run;
proc print data=digital_identity_merged;
run;
OUTPUT:
| Obs | Verification_ID | Method | Processing_Time | Failure_Rate | Fraud_Attempts | Confidence_Score | Outcome | Month | Day | Year | Verification_Date | Fraud_Percentage | Today_Date | Days_Since_Verification | Next_Review_Date | Utilization_Class | Fraud_Flag | Risk_Level |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | V001 | Aadhaar | 35 | 0.02 | 1 | 95 | APPROVED | 01 | 15 | 2025 | 15JAN2025 | 10 | 27FEB2026 | 408 | 15APR2025 | Moderate | 0 | Medium |
| 2 | V004 | Aadhaar | 70 | 0.30 | 5 | 60 | REJECTED | 12 | 20 | 2025 | 20DEC2025 | 50 | 27FEB2026 | 69 | 20MAR2026 | Slow | 1 | Medium |
| 3 | V008 | Aadhaar | 60 | 0.25 | 6 | 55 | REJECTED | 07 | 21 | 2025 | 21JUL2025 | 60 | 27FEB2026 | 221 | 21OCT2025 | Moderate | 1 | Medium |
| 4 | V011 | Aadhaar | 90 | 0.50 | 8 | 40 | REJECTED | 10 | 5 | 2025 | 05OCT2025 | 80 | 27FEB2026 | 145 | 05JAN2026 | Slow | 1 | Medium |
| 5 | V014 | Aadhaar | 100 | 0.60 | 9 | 35 | REJECTED | 01 | 17 | 2026 | 17JAN2026 | 90 | 27FEB2026 | 41 | 17APR2026 | Slow | 1 | Medium |
| 6 | V002 | Biometric | 50 | 0.10 | 3 | 88 | APPROVED | 02 | 10 | 2025 | 10FEB2025 | 30 | 27FEB2026 | 382 | 10MAY2025 | Moderate | 0 | High |
| 7 | V007 | Biometric | 45 | 0.20 | 4 | 75 | REJECTED | 06 | 18 | 2025 | 18JUN2025 | 40 | 27FEB2026 | 254 | 18SEP2025 | Moderate | 0 | High |
| 8 | V012 | Biometric | 55 | 0.15 | 3 | 80 | APPROVED | 11 | 22 | 2025 | 22NOV2025 | 30 | 27FEB2026 | 97 | 22FEB2026 | Moderate | 0 | High |
| 9 | V003 | Otp | 20 | 0.00 | 0 | 90 | REJECTED | 03 | 5 | 2025 | 05MAR2025 | 0 | 27FEB2026 | 359 | 05JUN2025 | Fast | 0 | Low |
| 10 | V006 | Otp | 30 | 0.01 | 0 | 98 | APPROVED | 05 | 11 | 2025 | 11MAY2025 | 0 | 27FEB2026 | 292 | 11AUG2025 | Fast | 0 | Low |
| 11 | V010 | Otp | 25 | 0.00 | 0 | 99 | APPROVED | 09 | 13 | 2025 | 13SEP2025 | 0 | 27FEB2026 | 167 | 13DEC2025 | Fast | 0 | Low |
| 12 | V013 | Otp | 15 | 0.02 | 0 | 96 | APPROVED | 12 | 2 | 2025 | 02DEC2025 | 0 | 27FEB2026 | 87 | 02MAR2026 | Fast | 0 | Low |
| 13 | V005 | Videokyc | 15 | 0.05 | 2 | 85 | APPROVED | 04 | 25 | 2025 | 25APR2025 | 20 | 27FEB2026 | 308 | 25JUL2025 | Fast | 0 | Medium |
| 14 | V009 | Videokyc | 40 | 0.03 | 1 | 92 | APPROVED | 08 | 9 | 2025 | 09AUG2025 | 10 | 27FEB2026 | 202 | 09NOV2025 | Moderate | 0 | Medium |
| 15 | V015 | Videokyc | 48 | 0.07 | 2 | 89 | APPROVED | 02 | 28 | 2026 | 28FEB2026 | 20 | 27FEB2026 | -1 | 28MAY2026 | Moderate | 0 | Medium |
| 16 | V016 | Aadhaar | 42 | 0.04 | 1 | 93 | APPROVED | 10 | 2026 | 10MAR2026 | 10 | . | . | . | . | |||
| 17 | V017 | Biometric | 75 | 0.35 | 6 | 58 | REJECTED | 15 | 2026 | 15APR2026 | 60 | . | . | . | . | |||
| 18 | V018 | Otp | 28 | 0.01 | 0 | 97 | APPROVED | 21 | 2026 | 21MAY2026 | 0 | . | . | . | . |
BASE
·
Target dataset
·
Existing master dataset
DATA
· New dataset to be added
FORCE
·
Allows append even if:
·
Variable lengths differ
·
Minor attribute mismatch
·
Should be used carefully in production
9. TRANSPOSE
proc transpose data=digital_identity_merged out=transposed_data;
var Processing_Time Fraud_Attempts Confidence_Score;
run;
proc print data=transposed_data;
run;
OUTPUT:
| Obs | _NAME_ | COL1 | COL2 | COL3 | COL4 | COL5 | COL6 | COL7 | COL8 | COL9 | COL10 | COL11 | COL12 | COL13 | COL14 | COL15 | COL16 | COL17 | COL18 |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | Processing_Time | 35 | 70 | 60 | 90 | 100 | 50 | 45 | 55 | 20 | 30 | 25 | 15 | 15 | 40 | 48 | 42 | 75 | 28 |
| 2 | Fraud_Attempts | 1 | 5 | 6 | 8 | 9 | 3 | 4 | 3 | 0 | 0 | 0 | 0 | 2 | 1 | 2 | 1 | 6 | 0 |
| 3 | Confidence_Score | 95 | 60 | 55 | 40 | 35 | 88 | 75 | 80 | 90 | 98 | 99 | 96 | 85 | 92 | 89 | 93 | 58 | 97 |
10. Numeric Functions Used
data numeric_example;
set digital_identity_merged;
Rounded_Score = round(Confidence_Score);
Total_Risk = sum(Fraud_Attempts, Failure_Rate*100);
run;
proc print data=numeric_example;
run;
OUTPUT:
| Obs | Verification_ID | Method | Processing_Time | Failure_Rate | Fraud_Attempts | Confidence_Score | Outcome | Month | Day | Year | Verification_Date | Fraud_Percentage | Today_Date | Days_Since_Verification | Next_Review_Date | Utilization_Class | Fraud_Flag | Risk_Level | Rounded_Score | Total_Risk |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | V001 | Aadhaar | 35 | 0.02 | 1 | 95 | APPROVED | 01 | 15 | 2025 | 15JAN2025 | 10 | 27FEB2026 | 408 | 15APR2025 | Moderate | 0 | Medium | 95 | 3 |
| 2 | V004 | Aadhaar | 70 | 0.30 | 5 | 60 | REJECTED | 12 | 20 | 2025 | 20DEC2025 | 50 | 27FEB2026 | 69 | 20MAR2026 | Slow | 1 | Medium | 60 | 35 |
| 3 | V008 | Aadhaar | 60 | 0.25 | 6 | 55 | REJECTED | 07 | 21 | 2025 | 21JUL2025 | 60 | 27FEB2026 | 221 | 21OCT2025 | Moderate | 1 | Medium | 55 | 31 |
| 4 | V011 | Aadhaar | 90 | 0.50 | 8 | 40 | REJECTED | 10 | 5 | 2025 | 05OCT2025 | 80 | 27FEB2026 | 145 | 05JAN2026 | Slow | 1 | Medium | 40 | 58 |
| 5 | V014 | Aadhaar | 100 | 0.60 | 9 | 35 | REJECTED | 01 | 17 | 2026 | 17JAN2026 | 90 | 27FEB2026 | 41 | 17APR2026 | Slow | 1 | Medium | 35 | 69 |
| 6 | V002 | Biometric | 50 | 0.10 | 3 | 88 | APPROVED | 02 | 10 | 2025 | 10FEB2025 | 30 | 27FEB2026 | 382 | 10MAY2025 | Moderate | 0 | High | 88 | 13 |
| 7 | V007 | Biometric | 45 | 0.20 | 4 | 75 | REJECTED | 06 | 18 | 2025 | 18JUN2025 | 40 | 27FEB2026 | 254 | 18SEP2025 | Moderate | 0 | High | 75 | 24 |
| 8 | V012 | Biometric | 55 | 0.15 | 3 | 80 | APPROVED | 11 | 22 | 2025 | 22NOV2025 | 30 | 27FEB2026 | 97 | 22FEB2026 | Moderate | 0 | High | 80 | 18 |
| 9 | V003 | Otp | 20 | 0.00 | 0 | 90 | REJECTED | 03 | 5 | 2025 | 05MAR2025 | 0 | 27FEB2026 | 359 | 05JUN2025 | Fast | 0 | Low | 90 | 0 |
| 10 | V006 | Otp | 30 | 0.01 | 0 | 98 | APPROVED | 05 | 11 | 2025 | 11MAY2025 | 0 | 27FEB2026 | 292 | 11AUG2025 | Fast | 0 | Low | 98 | 1 |
| 11 | V010 | Otp | 25 | 0.00 | 0 | 99 | APPROVED | 09 | 13 | 2025 | 13SEP2025 | 0 | 27FEB2026 | 167 | 13DEC2025 | Fast | 0 | Low | 99 | 0 |
| 12 | V013 | Otp | 15 | 0.02 | 0 | 96 | APPROVED | 12 | 2 | 2025 | 02DEC2025 | 0 | 27FEB2026 | 87 | 02MAR2026 | Fast | 0 | Low | 96 | 2 |
| 13 | V005 | Videokyc | 15 | 0.05 | 2 | 85 | APPROVED | 04 | 25 | 2025 | 25APR2025 | 20 | 27FEB2026 | 308 | 25JUL2025 | Fast | 0 | Medium | 85 | 7 |
| 14 | V009 | Videokyc | 40 | 0.03 | 1 | 92 | APPROVED | 08 | 9 | 2025 | 09AUG2025 | 10 | 27FEB2026 | 202 | 09NOV2025 | Moderate | 0 | Medium | 92 | 4 |
| 15 | V015 | Videokyc | 48 | 0.07 | 2 | 89 | APPROVED | 02 | 28 | 2026 | 28FEB2026 | 20 | 27FEB2026 | -1 | 28MAY2026 | Moderate | 0 | Medium | 89 | 9 |
| 16 | V016 | Aadhaar | 42 | 0.04 | 1 | 93 | APPROVED | 10 | 2026 | 10MAR2026 | 10 | . | . | . | . | 93 | 5 | |||
| 17 | V017 | Biometric | 75 | 0.35 | 6 | 58 | REJECTED | 15 | 2026 | 15APR2026 | 60 | . | . | . | . | 58 | 41 | |||
| 18 | V018 | Otp | 28 | 0.01 | 0 | 97 | APPROVED | 21 | 2026 | 21MAY2026 | 0 | . | . | . | . | 97 | 1 |
11. Character Functions Used
data char_example;
set digital_identity_merged;
Full_Info = catx("-", Verification_ID, Method, Outcome);
Lower_Method = lowcase(Method);
Trimmed_Method = trim(Method);
Clean_Method = strip(Method);
run;
proc print data=char_example;
run;
OUTPUT:
| Obs | Verification_ID | Method | Processing_Time | Failure_Rate | Fraud_Attempts | Confidence_Score | Outcome | Month | Day | Year | Verification_Date | Fraud_Percentage | Today_Date | Days_Since_Verification | Next_Review_Date | Utilization_Class | Fraud_Flag | Risk_Level | Full_Info | Lower_Method | Trimmed_Method | Clean_Method |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | V001 | Aadhaar | 35 | 0.02 | 1 | 95 | APPROVED | 01 | 15 | 2025 | 15JAN2025 | 10 | 27FEB2026 | 408 | 15APR2025 | Moderate | 0 | Medium | V001-Aadhaar-APPROVED | aadhaar | Aadhaar | Aadhaar |
| 2 | V004 | Aadhaar | 70 | 0.30 | 5 | 60 | REJECTED | 12 | 20 | 2025 | 20DEC2025 | 50 | 27FEB2026 | 69 | 20MAR2026 | Slow | 1 | Medium | V004-Aadhaar-REJECTED | aadhaar | Aadhaar | Aadhaar |
| 3 | V008 | Aadhaar | 60 | 0.25 | 6 | 55 | REJECTED | 07 | 21 | 2025 | 21JUL2025 | 60 | 27FEB2026 | 221 | 21OCT2025 | Moderate | 1 | Medium | V008-Aadhaar-REJECTED | aadhaar | Aadhaar | Aadhaar |
| 4 | V011 | Aadhaar | 90 | 0.50 | 8 | 40 | REJECTED | 10 | 5 | 2025 | 05OCT2025 | 80 | 27FEB2026 | 145 | 05JAN2026 | Slow | 1 | Medium | V011-Aadhaar-REJECTED | aadhaar | Aadhaar | Aadhaar |
| 5 | V014 | Aadhaar | 100 | 0.60 | 9 | 35 | REJECTED | 01 | 17 | 2026 | 17JAN2026 | 90 | 27FEB2026 | 41 | 17APR2026 | Slow | 1 | Medium | V014-Aadhaar-REJECTED | aadhaar | Aadhaar | Aadhaar |
| 6 | V002 | Biometric | 50 | 0.10 | 3 | 88 | APPROVED | 02 | 10 | 2025 | 10FEB2025 | 30 | 27FEB2026 | 382 | 10MAY2025 | Moderate | 0 | High | V002-Biometric-APPROVED | biometric | Biometric | Biometric |
| 7 | V007 | Biometric | 45 | 0.20 | 4 | 75 | REJECTED | 06 | 18 | 2025 | 18JUN2025 | 40 | 27FEB2026 | 254 | 18SEP2025 | Moderate | 0 | High | V007-Biometric-REJECTED | biometric | Biometric | Biometric |
| 8 | V012 | Biometric | 55 | 0.15 | 3 | 80 | APPROVED | 11 | 22 | 2025 | 22NOV2025 | 30 | 27FEB2026 | 97 | 22FEB2026 | Moderate | 0 | High | V012-Biometric-APPROVED | biometric | Biometric | Biometric |
| 9 | V003 | Otp | 20 | 0.00 | 0 | 90 | REJECTED | 03 | 5 | 2025 | 05MAR2025 | 0 | 27FEB2026 | 359 | 05JUN2025 | Fast | 0 | Low | V003-Otp-REJECTED | otp | Otp | Otp |
| 10 | V006 | Otp | 30 | 0.01 | 0 | 98 | APPROVED | 05 | 11 | 2025 | 11MAY2025 | 0 | 27FEB2026 | 292 | 11AUG2025 | Fast | 0 | Low | V006-Otp-APPROVED | otp | Otp | Otp |
| 11 | V010 | Otp | 25 | 0.00 | 0 | 99 | APPROVED | 09 | 13 | 2025 | 13SEP2025 | 0 | 27FEB2026 | 167 | 13DEC2025 | Fast | 0 | Low | V010-Otp-APPROVED | otp | Otp | Otp |
| 12 | V013 | Otp | 15 | 0.02 | 0 | 96 | APPROVED | 12 | 2 | 2025 | 02DEC2025 | 0 | 27FEB2026 | 87 | 02MAR2026 | Fast | 0 | Low | V013-Otp-APPROVED | otp | Otp | Otp |
| 13 | V005 | Videokyc | 15 | 0.05 | 2 | 85 | APPROVED | 04 | 25 | 2025 | 25APR2025 | 20 | 27FEB2026 | 308 | 25JUL2025 | Fast | 0 | Medium | V005-Videokyc-APPROVED | videokyc | Videokyc | Videokyc |
| 14 | V009 | Videokyc | 40 | 0.03 | 1 | 92 | APPROVED | 08 | 9 | 2025 | 09AUG2025 | 10 | 27FEB2026 | 202 | 09NOV2025 | Moderate | 0 | Medium | V009-Videokyc-APPROVED | videokyc | Videokyc | Videokyc |
| 15 | V015 | Videokyc | 48 | 0.07 | 2 | 89 | APPROVED | 02 | 28 | 2026 | 28FEB2026 | 20 | 27FEB2026 | -1 | 28MAY2026 | Moderate | 0 | Medium | V015-Videokyc-APPROVED | videokyc | Videokyc | Videokyc |
| 16 | V016 | Aadhaar | 42 | 0.04 | 1 | 93 | APPROVED | 10 | 2026 | 10MAR2026 | 10 | . | . | . | . | V016-Aadhaar-APPROVED | aadhaar | Aadhaar | Aadhaar | |||
| 17 | V017 | Biometric | 75 | 0.35 | 6 | 58 | REJECTED | 15 | 2026 | 15APR2026 | 60 | . | . | . | . | V017-Biometric-REJECTED | biometric | Biometric | Biometric | |||
| 18 | V018 | Otp | 28 | 0.01 | 0 | 97 | APPROVED | 21 | 2026 | 21MAY2026 | 0 | . | . | . | . | V018-Otp-APPROVED | otp | Otp | Otp |
CAT
Concatenates without separator
CATX
Concatenates with separator
12. PROC DATASETS DELETE
proc datasets library=work nolist;
delete digital_identity_raw;
quit;
LOG:
· Used to clean temporary datasets.
13. Final Corrected Full Code
title "Final Cleaned Digital Identity Fraud Detection System";
data digital_identity_final;
set digital_identity_raw;
Method = propcase(strip(Method));
Outcome = upcase(strip(Outcome));
if Processing_Time < 0 then Processing_Time = abs(Processing_Time);
if Failure_Rate = . then Failure_Rate = 0;
if Month > 12 then Month = 12;
Verification_Date = mdy(Month, Day, Year);
Days_Since_Verification = intck('day', Verification_Date, today());
Next_Review_Date = intnx('month', Verification_Date, 3, 'same');
if Processing_Time <= 30 then Utilization_Class="Fast";
else if Processing_Time <= 60 then Utilization_Class="Moderate";
else Utilization_Class="Slow";
if Fraud_Attempts >= 5 and Confidence_Score < 70 then Fraud_Flag=1;
else if Failure_Rate > 0.25 then Fraud_Flag=1;
else Fraud_Flag=0;
format Verification_Date Next_Review_Date date9.;
run;
proc print data=digital_identity_final;
run;
| Obs | Verification_ID | Method | Processing_Time | Failure_Rate | Fraud_Attempts | Confidence_Score | Outcome | Month | Day | Year | Verification_Date | Days_Since_Verification | Next_Review_Date | Utilization_Class | Fraud_Flag |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | V001 | Aadhaar | 35 | 0.02 | 1 | 95 | APPROVED | 01 | 15 | 2025 | 15JAN2025 | 408 | 15APR2025 | Mode | 0 |
| 2 | V002 | Biometric | 50 | 0.10 | 3 | 88 | APPROVED | 02 | 10 | 2025 | 10FEB2025 | 382 | 10MAY2025 | Mode | 0 |
| 3 | V003 | Otp | 20 | 0.00 | 0 | 90 | REJECTED | 03 | 5 | 2025 | 05MAR2025 | 359 | 05JUN2025 | Fast | 0 |
| 4 | V004 | Aadhaar | 70 | 0.30 | 5 | 60 | REJECTED | 12 | 20 | 2025 | 20DEC2025 | 69 | 20MAR2026 | Slow | 1 |
| 5 | V005 | Videokyc | 15 | 0.05 | 2 | 85 | APPROVED | 04 | 25 | 2025 | 25APR2025 | 308 | 25JUL2025 | Fast | 0 |
| 6 | V006 | Otp | 30 | 0.01 | 0 | 98 | APPROVED | 05 | 11 | 2025 | 11MAY2025 | 292 | 11AUG2025 | Fast | 0 |
| 7 | V007 | Biometric | 45 | 0.20 | 4 | 75 | REJECTED | 06 | 18 | 2025 | 18JUN2025 | 254 | 18SEP2025 | Mode | 0 |
| 8 | V008 | Aadhaar | 60 | 0.25 | 6 | 55 | REJECTED | 07 | 21 | 2025 | 21JUL2025 | 221 | 21OCT2025 | Mode | 1 |
| 9 | V009 | Videokyc | 40 | 0.03 | 1 | 92 | APPROVED | 08 | 9 | 2025 | 09AUG2025 | 202 | 09NOV2025 | Mode | 0 |
| 10 | V010 | Otp | 25 | 0.00 | 0 | 99 | APPROVED | 09 | 13 | 2025 | 13SEP2025 | 167 | 13DEC2025 | Fast | 0 |
| 11 | V011 | Aadhaar | 90 | 0.50 | 8 | 40 | REJECTED | 10 | 5 | 2025 | 05OCT2025 | 145 | 05JAN2026 | Slow | 1 |
| 12 | V012 | Biometric | 55 | 0.15 | 3 | 80 | APPROVED | 11 | 22 | 2025 | 22NOV2025 | 97 | 22FEB2026 | Mode | 0 |
| 13 | V013 | Otp | 15 | 0.02 | 0 | 96 | APPROVED | 12 | 2 | 2025 | 02DEC2025 | 87 | 02MAR2026 | Fast | 0 |
| 14 | V014 | Aadhaar | 100 | 0.60 | 9 | 35 | REJECTED | 01 | 17 | 2026 | 17JAN2026 | 41 | 17APR2026 | Slow | 1 |
| 15 | V015 | Videokyc | 48 | 0.07 | 2 | 89 | APPROVED | 02 | 28 | 2026 | 28FEB2026 | -1 | 28MAY2026 | Mode | 0 |
15 Key Points About This Project:
1.This project simulates a real-world digital identity verification system used in fintech and banking environments.
2.A
structured dataset with 15+ observations was created including verification,
fraud, performance, and date variables.
3.Intentional
data quality errors were introduced to mimic real production data issues.
4. The
SET statement was used to read and transform raw datasets during cleaning.
5.Character
functions like PROPCASE, UPCASE, LOWCASE, STRIP, and TRIM were used for
standardization.
6.Numeric
functions like ABS, SUM, ROUND, and INT were used to correct and derive numeric
values.
7.The MDY
function was used to construct proper SAS date values from separate month, day,
and year fields.
8.INTCK
was applied to calculate date differences such as days since verification.
9.INTNX
was used to generate future review dates based on verification timelines.
10.Utilization classification was created based on processing time
thresholds (Fast, Moderate, Slow).
11. Fraud
detection logic was implemented using business rules combining Fraud_Attempts,
Failure_Rate, and Confidence_Score.
12.A
reusable macro was built to automate fraud detection logic across datasets.
13.MERGE,
SET, and PROC APPEND were used to combine datasets based on business
requirements.
14.PROC
TRANSPOSE was applied to reshape data for reporting and dashboard purposes.
15.PROC
DATASETS DELETE was used to remove temporary datasets and maintain a clean
working environment.
1.Improves Fraud Detection
It helps identify suspicious verification attempts early, reducing financial
loss and identity misuse.
2.Improves
Data Quality
By fixing errors like missing values, wrong dates, and inconsistent text, the
system becomes more reliable and accurate.
3.Automates
Fraud Logic
Using macros, fraud detection rules can be reused and applied automatically to
new incoming data.
4.Enhances
Operational Efficiency
Utilization classification (Fast, Moderate, Slow) helps measure processing
performance and improve turnaround time.
5.Supports
Business Decision Making
Fraud percentage, failure rate, and confidence score analysis provide management
with clear insights for risk control and policy improvement.
Conclusion:
In this complete digital identity verification
analytics system, we:
·
Created raw dataset with intentional data
quality issues
·
Identified and corrected errors
·
Standardized character fields
·
Applied date engineering
·
Calculated fraud percentages
·
Implemented utilization classification
·
Designed macro-based fraud detection
·
Used SET, MERGE, APPEND, TRANSPOSE
·
Applied numeric and character functions
·
Cleaned datasets using PROC DATASETS DELETE
This mirrors real-world fintech fraud
detection pipelines.
If you attend a SAS interview at companies
like TCS, Cognizant, or Syneos, explaining this end-to-end workflow will
demonstrate:
·
Data cleaning capability
·
Business logic understanding
·
SAS technical strength
·
Macro programming expertise
·
Fraud analytics mindset
SAS INTERVIEW QUESTIONS
1.How do you append datasets in PROC SQL?
2.Difference between UNION and UNION ALL?
3.How to detect fraud using SAS?
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
About the Author:
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
About the Author:
SAS Learning Hub is a data analytics and SAS programming platform focused on clinical, financial, and real-world data analysis. The content is created by professionals with academic training in Pharmaceutics and hands-on experience in Base SAS, PROC SQL, Macros, SDTM, and ADaM, providing practical and industry-relevant SAS learning resources.
Disclaimer:
The datasets and analysis in this article are created for educational and demonstration purposes only. They do not represent DIGITAL IDENTITY data.
Our Mission:
This blog provides industry-focused SAS programming tutorials and analytics projects covering finance, healthcare, and technology.
This project is suitable for:
· Students learning SAS
· Data analysts building portfolios
· Professionals preparing for SAS interviews
· Bloggers writing about analytics
· Clinical SAS Programmer
· Research Data Analyst
· Regulatory Data Validator
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Comments
Post a Comment