413.Can We Build and Debug a Digital Identity Verification Fraud Detection System in SAS Using Advanced Data Engineering and Macros?

Digital Identity Verification Fraud Detection System Using SAS

--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

HERE IN THIS PROJECT WE USED THESE SAS STATEMENTS —DATA STEP | SET | MERGE | PROC SORT | PROC SQL | PROC APPEND | PROC TRANSPOSE | PROC DATASETS | DELETE | INPUT | DATALINES | LENGTH | FORMAT | IF-THEN-ELSE | ABS | ROUND | COALESCE | STRIP | TRIM | CAT | CATX | PROPCASE | UPCASE | LOWCASE | MDY | INTCK | INTNX | TODAY | PROC PRINT | RUN | QUIT | %MACRO | %MEND 

--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Introduction

In today’s digital economy, identity verification systems play a critical role in banking, fintech, healthcare, e-commerce, and government services. Every time a user uploads an Aadhaar card, passport, selfie, or biometric scan, a backend system validates authenticity. However, digital identity systems are vulnerable to fraud attempts such as spoofing, synthetic identity creation, and document tampering.

In this project, we will:

  • Create a Digital Identity Verification Dataset
  • Introduce intentional errors
  • Detect and correct those errors
  • Apply macros for fraud detection logic
  • Use date functions (MDY, INTCK, INTNX)
  • Apply APPEND, TRANSPOSE, SET, MERGE
  • Use numeric and character functions
  • Demonstrate PROC DATASETS DELETE
  • Implement Utilization Classification
  • Apply Percentage Calculations

Table Of Contents

  1. Business Context
  2. Dataset Structure
  3. Raw Dataset Creation With Intentional Errors
  4. Error Identification
  5. Corrected Dataset Creation
  6. Date Handling Using MDY, INTCK, INTNX
  7. Fraud Detection Macro
  8. Utilization Classification Logic
  9. SET vs MERGE vs APPEND
  10. PROC TRANSPOSE
  11. Numeric Functions
  12. Character Functions
  13. COALESCE Usage
  14. PROC DATASETS DELETE
  15. Final Fraud Detection Output
  16. 15 Key Points About This Project
  17. Benefits
  18. Summary
  19. Conclusion

Business Context

Imagine you are working as a SAS Programmer in a fintech company in Hyderabad. The company processes:

  • Aadhaar eKYC
  • Biometric verification
  • Video KYC
  • OTP-based authentication

Your role is to:

  • Identify high-risk verification attempts
  • Calculate fraud attempt percentages
  • Track processing delays
  • Classify system utilization
  • Correct dirty data
  • Automate fraud detection using macros

1. Raw Dataset Creation (With Intentional Errors)

title "Raw Digital Identity Verification Data With Errors";

data digital_identity_raw;

    input Verification_ID $ Method:$12. Processing_Time Failure_Rate Fraud_Attempts

          Confidence_Score Outcome $ Month $ Day Year;

    format Verification_Date date9.;

    Verification_Date = mdy(Month, Day, Year);

    datalines;

V001 Aadhaar 35 0.02 1 95 Approved 01 15 2025

V002 biometric 50 0.10 3 88 Approved 02 10 2025

V003 OTP 20 . 0 90 Rejected 03 05 2025

V004 aadhaar 70 0.30 5 60 Rejected 13 20 2025

V005 VideoKYC -15 0.05 2 85 Approved 04 25 2025

V006 OTP 30 0.01 0 98 approved 05 11 2025

V007 Biometric 45 0.20 4 75 Rejected 06 18 2025

V008 Aadhaar 60 0.25 6 55 Rejected 07 21 2025

V009 VideoKYC 40 0.03 1 92 Approved 08 09 2025

V010 OTP 25 0.00 0 99 Approved 09 13 2025

V011 Aadhaar 90 0.50 8 40 Rejected 10 05 2025

V012 Biometric 55 0.15 3 80 Approved 11 22 2025

V013 OTP 15 0.02 0 96 Approved 12 02 2025

V014 Aadhaar 100 0.60 9 35 Rejected 01 17 2026

V015 VideoKYC 48 0.07 2 89 Approved 02 28 2026

;

run;

proc print data=digital_identity_raw;

run;

title;

OUTPUT:

Raw Digital Identity Verification Data With Errors

ObsVerification_IDMethodProcessing_TimeFailure_RateFraud_AttemptsConfidence_ScoreOutcomeMonthDayYearVerification_Date
1V001Aadhaar350.02195Approved0115202515JAN2025
2V002biometric500.10388Approved0210202510FEB2025
3V003OTP20.090Rejected035202505MAR2025
4V004aadhaar700.30560Rejected13202025.
5V005VideoKYC-150.05285Approved0425202525APR2025
6V006OTP300.01098approved0511202511MAY2025
7V007Biometric450.20475Rejected0618202518JUN2025
8V008Aadhaar600.25655Rejected0721202521JUL2025
9V009VideoKYC400.03192Approved089202509AUG2025
10V010OTP250.00099Approved0913202513SEP2025
11V011Aadhaar900.50840Rejected105202505OCT2025
12V012Biometric550.15380Approved1122202522NOV2025
13V013OTP150.02096Approved122202502DEC2025
14V014Aadhaar1000.60935Rejected0117202617JAN2026
15V015VideoKYC480.07289Approved0228202628FEB2026

Intentional Errors

·  Month = 13 (Invalid date)

·  Negative Processing_Time

·  Missing Failure_Rate

·  Mixed case in Method

·  Mixed case in Outcome

·  Inconsistent naming: Aadhaar vs aadhaar

2. Corrected Dataset Creation

title "Corrected Digital Identity Dataset";

data digital_identity_clean;

    set digital_identity_raw;

    Method = propcase(strip(Method));

    Outcome = upcase(strip(Outcome));

    if Processing_Time < 0 then Processing_Time = abs(Processing_Time);

    if Failure_Rate = . then Failure_Rate = 0;

    if Month > 12 then Month = 12

    Verification_Date = mdy(Month, Day, Year);

    Fraud_Percentage = (Fraud_Attempts / 10) * 100;

run;

proc print data=digital_identity_clean;

run;

title;

OUTPUT:

Corrected Digital Identity Dataset

ObsVerification_IDMethodProcessing_TimeFailure_RateFraud_AttemptsConfidence_ScoreOutcomeMonthDayYearVerification_DateFraud_Percentage
1V001Aadhaar350.02195APPROVED0115202515JAN202510
2V002Biometric500.10388APPROVED0210202510FEB202530
3V003Otp200.00090REJECTED035202505MAR20250
4V004Aadhaar700.30560REJECTED1220202520DEC202550
5V005Videokyc150.05285APPROVED0425202525APR202520
6V006Otp300.01098APPROVED0511202511MAY20250
7V007Biometric450.20475REJECTED0618202518JUN202540
8V008Aadhaar600.25655REJECTED0721202521JUL202560
9V009Videokyc400.03192APPROVED089202509AUG202510
10V010Otp250.00099APPROVED0913202513SEP20250
11V011Aadhaar900.50840REJECTED105202505OCT202580
12V012Biometric550.15380APPROVED1122202522NOV202530
13V013Otp150.02096APPROVED122202502DEC20250
14V014Aadhaar1000.60935REJECTED0117202617JAN202690
15V015Videokyc480.07289APPROVED0228202628FEB202620

Code Explanation

SET

·       Reads dataset into new dataset

·       Used for transformation

PROPCASE

·       Converts Aadhaar → Aadhaar (proper case)

STRIP

·       Removes leading/trailing spaces

UPCASE

·       Standardizes Approved vs approved

ABS

·       Converts negative to positive

MDY

·       Creates SAS date value

3. Date Calculations Using INTCK and INTNX

data digital_identity_dates;

    set digital_identity_clean;

    Today_Date = today();

    Days_Since_Verification = intck('day', Verification_Date, Today_Date);

    Next_Review_Date = intnx('month', Verification_Date, 3, 'same');

    format Today_Date Next_Review_Date date9.;

run;

proc print data=digital_identity_dates;

run;

OUTPUT:

ObsVerification_IDMethodProcessing_TimeFailure_RateFraud_AttemptsConfidence_ScoreOutcomeMonthDayYearVerification_DateFraud_PercentageToday_DateDays_Since_VerificationNext_Review_Date
1V001Aadhaar350.02195APPROVED0115202515JAN20251027FEB202640815APR2025
2V002Biometric500.10388APPROVED0210202510FEB20253027FEB202638210MAY2025
3V003Otp200.00090REJECTED035202505MAR2025027FEB202635905JUN2025
4V004Aadhaar700.30560REJECTED1220202520DEC20255027FEB20266920MAR2026
5V005Videokyc150.05285APPROVED0425202525APR20252027FEB202630825JUL2025
6V006Otp300.01098APPROVED0511202511MAY2025027FEB202629211AUG2025
7V007Biometric450.20475REJECTED0618202518JUN20254027FEB202625418SEP2025
8V008Aadhaar600.25655REJECTED0721202521JUL20256027FEB202622121OCT2025
9V009Videokyc400.03192APPROVED089202509AUG20251027FEB202620209NOV2025
10V010Otp250.00099APPROVED0913202513SEP2025027FEB202616713DEC2025
11V011Aadhaar900.50840REJECTED105202505OCT20258027FEB202614505JAN2026
12V012Biometric550.15380APPROVED1122202522NOV20253027FEB20269722FEB2026
13V013Otp150.02096APPROVED122202502DEC2025027FEB20268702MAR2026
14V014Aadhaar1000.60935REJECTED0117202617JAN20269027FEB20264117APR2026
15V015Videokyc480.07289APPROVED0228202628FEB20262027FEB2026-128MAY2026

INTCK

Calculates difference between two dates.

INTNX

Moves date forward by interval.

4. Utilization Classification

data digital_identity_utilization;

    set digital_identity_dates;

    length Utilization_Class $12.;

    if Processing_Time <= 30 then Utilization_Class="Fast";

    else if Processing_Time <= 60 then Utilization_Class="Moderate";

    else Utilization_Class="Slow";

run;

proc print data=digital_identity_utilization;

run;

OUTPUT:

ObsVerification_IDMethodProcessing_TimeFailure_RateFraud_AttemptsConfidence_ScoreOutcomeMonthDayYearVerification_DateFraud_PercentageToday_DateDays_Since_VerificationNext_Review_DateUtilization_Class
1V001Aadhaar350.02195APPROVED0115202515JAN20251027FEB202640815APR2025Moderate
2V002Biometric500.10388APPROVED0210202510FEB20253027FEB202638210MAY2025Moderate
3V003Otp200.00090REJECTED035202505MAR2025027FEB202635905JUN2025Fast
4V004Aadhaar700.30560REJECTED1220202520DEC20255027FEB20266920MAR2026Slow
5V005Videokyc150.05285APPROVED0425202525APR20252027FEB202630825JUL2025Fast
6V006Otp300.01098APPROVED0511202511MAY2025027FEB202629211AUG2025Fast
7V007Biometric450.20475REJECTED0618202518JUN20254027FEB202625418SEP2025Moderate
8V008Aadhaar600.25655REJECTED0721202521JUL20256027FEB202622121OCT2025Moderate
9V009Videokyc400.03192APPROVED089202509AUG20251027FEB202620209NOV2025Moderate
10V010Otp250.00099APPROVED0913202513SEP2025027FEB202616713DEC2025Fast
11V011Aadhaar900.50840REJECTED105202505OCT20258027FEB202614505JAN2026Slow
12V012Biometric550.15380APPROVED1122202522NOV20253027FEB20269722FEB2026Moderate
13V013Otp150.02096APPROVED122202502DEC2025027FEB20268702MAR2026Fast
14V014Aadhaar1000.60935REJECTED0117202617JAN20269027FEB20264117APR2026Slow
15V015Videokyc480.07289APPROVED0228202628FEB20262027FEB2026-128MAY2026Moderate

5. Fraud Detection Macro

%macro fraud_logic(input=, output=);

data &output;

    set &input;

    if Fraud_Attempts >= 5 and Confidence_Score < 70 then Fraud_Flag=1;

    else if Failure_Rate > 0.25 then Fraud_Flag=1;

    else Fraud_Flag=0;

run;

proc print data=&output;

run;

%mend;


%fraud_logic(input=digital_identity_utilization,

             output=digital_identity_fraud);

OUTPUT:

ObsVerification_IDMethodProcessing_TimeFailure_RateFraud_AttemptsConfidence_ScoreOutcomeMonthDayYearVerification_DateFraud_PercentageToday_DateDays_Since_VerificationNext_Review_DateUtilization_ClassFraud_Flag
1V001Aadhaar350.02195APPROVED0115202515JAN20251027FEB202640815APR2025Moderate0
2V002Biometric500.10388APPROVED0210202510FEB20253027FEB202638210MAY2025Moderate0
3V003Otp200.00090REJECTED035202505MAR2025027FEB202635905JUN2025Fast0
4V004Aadhaar700.30560REJECTED1220202520DEC20255027FEB20266920MAR2026Slow1
5V005Videokyc150.05285APPROVED0425202525APR20252027FEB202630825JUL2025Fast0
6V006Otp300.01098APPROVED0511202511MAY2025027FEB202629211AUG2025Fast0
7V007Biometric450.20475REJECTED0618202518JUN20254027FEB202625418SEP2025Moderate0
8V008Aadhaar600.25655REJECTED0721202521JUL20256027FEB202622121OCT2025Moderate1
9V009Videokyc400.03192APPROVED089202509AUG20251027FEB202620209NOV2025Moderate0
10V010Otp250.00099APPROVED0913202513SEP2025027FEB202616713DEC2025Fast0
11V011Aadhaar900.50840REJECTED105202505OCT20258027FEB202614505JAN2026Slow1
12V012Biometric550.15380APPROVED1122202522NOV20253027FEB20269722FEB2026Moderate0
13V013Otp150.02096APPROVED122202502DEC2025027FEB20268702MAR2026Fast0
14V014Aadhaar1000.60935REJECTED0117202617JAN20269027FEB20264117APR2026Slow1
15V015Videokyc480.07289APPROVED0228202628FEB20262027FEB2026-128MAY2026Moderate0

·  Reusable

·  Parameter driven

·  Scalable for multiple datasets

6. Using MERGE

data method_risk;

 input Method:$12. Risk_Level $;

 datalines;

Aadhaar Medium

Biometric High

Otp Low

Videokyc Medium

;

run;

proc print data=method_risk;

run;

OUTPUT:

ObsMethodRisk_Level
1AadhaarMedium
2BiometricHigh
3OtpLow
4VideokycMedium

proc sort data=digital_identity_fraud;

    by Method;

run;

proc print data=digital_identity_fraud;

run;

OUTPUT:

ObsVerification_IDMethodProcessing_TimeFailure_RateFraud_AttemptsConfidence_ScoreOutcomeMonthDayYearVerification_DateFraud_PercentageToday_DateDays_Since_VerificationNext_Review_DateUtilization_ClassFraud_Flag
1V001Aadhaar350.02195APPROVED0115202515JAN20251027FEB202640815APR2025Moderate0
2V004Aadhaar700.30560REJECTED1220202520DEC20255027FEB20266920MAR2026Slow1
3V008Aadhaar600.25655REJECTED0721202521JUL20256027FEB202622121OCT2025Moderate1
4V011Aadhaar900.50840REJECTED105202505OCT20258027FEB202614505JAN2026Slow1
5V014Aadhaar1000.60935REJECTED0117202617JAN20269027FEB20264117APR2026Slow1
6V002Biometric500.10388APPROVED0210202510FEB20253027FEB202638210MAY2025Moderate0
7V007Biometric450.20475REJECTED0618202518JUN20254027FEB202625418SEP2025Moderate0
8V012Biometric550.15380APPROVED1122202522NOV20253027FEB20269722FEB2026Moderate0
9V003Otp200.00090REJECTED035202505MAR2025027FEB202635905JUN2025Fast0
10V006Otp300.01098APPROVED0511202511MAY2025027FEB202629211AUG2025Fast0
11V010Otp250.00099APPROVED0913202513SEP2025027FEB202616713DEC2025Fast0
12V013Otp150.02096APPROVED122202502DEC2025027FEB20268702MAR2026Fast0
13V005Videokyc150.05285APPROVED0425202525APR20252027FEB202630825JUL2025Fast0
14V009Videokyc400.03192APPROVED089202509AUG20251027FEB202620209NOV2025Moderate0
15V015Videokyc480.07289APPROVED0228202628FEB20262027FEB2026-128MAY2026Moderate0

proc sort data=method_risk;

    by Method;

run;

proc print data=method_risk;

run;

OUTPUT:

ObsMethodRisk_Level
1AadhaarMedium
2BiometricHigh
3OtpLow
4VideokycMedium

data digital_identity_merged;

    merge digital_identity_fraud method_risk;

    by Method;

run;

proc print data=digital_identity_merged;

run;

OUTPUT:

ObsVerification_IDMethodProcessing_TimeFailure_RateFraud_AttemptsConfidence_ScoreOutcomeMonthDayYearVerification_DateFraud_PercentageToday_DateDays_Since_VerificationNext_Review_DateUtilization_ClassFraud_FlagRisk_Level
1V001Aadhaar350.02195APPROVED0115202515JAN20251027FEB202640815APR2025Moderate0Medium
2V004Aadhaar700.30560REJECTED1220202520DEC20255027FEB20266920MAR2026Slow1Medium
3V008Aadhaar600.25655REJECTED0721202521JUL20256027FEB202622121OCT2025Moderate1Medium
4V011Aadhaar900.50840REJECTED105202505OCT20258027FEB202614505JAN2026Slow1Medium
5V014Aadhaar1000.60935REJECTED0117202617JAN20269027FEB20264117APR2026Slow1Medium
6 Biometric....  ....... .High
7V002Biometric500.10388APPROVED0210202510FEB20253027FEB202638210MAY2025Moderate0 
8V007Biometric450.20475REJECTED0618202518JUN20254027FEB202625418SEP2025Moderate0 
9V012Biometric550.15380APPROVED1122202522NOV20253027FEB20269722FEB2026Moderate0 
10V003Otp200.00090REJECTED035202505MAR2025027FEB202635905JUN2025Fast0Low
11V006Otp300.01098APPROVED0511202511MAY2025027FEB202629211AUG2025Fast0Low
12V010Otp250.00099APPROVED0913202513SEP2025027FEB202616713DEC2025Fast0Low
13V013Otp150.02096APPROVED122202502DEC2025027FEB20268702MAR2026Fast0Low
14V005Videokyc150.05285APPROVED0425202525APR20252027FEB202630825JUL2025Fast0Medium
15V009Videokyc400.03192APPROVED089202509AUG20251027FEB202620209NOV2025Moderate0Medium
16V015Videokyc480.07289APPROVED0228202628FEB20262027FEB2026-128MAY2026Moderate0Medium

MERGE

Combines datasets horizontally.

7. Using Same Datasets For APPEND

proc append base=digital_identity_merged

            data=digital_identity_merged force;

run;

proc print data=digital_identity_merged;

run;

OUTPUT:

ObsVerification_IDMethodProcessing_TimeFailure_RateFraud_AttemptsConfidence_ScoreOutcomeMonthDayYearVerification_DateFraud_PercentageToday_DateDays_Since_VerificationNext_Review_DateUtilization_ClassFraud_FlagRisk_Level
1V001Aadhaar350.02195APPROVED0115202515JAN20251027FEB202640815APR2025Moderate0Medium
2V004Aadhaar700.30560REJECTED1220202520DEC20255027FEB20266920MAR2026Slow1Medium
3V008Aadhaar600.25655REJECTED0721202521JUL20256027FEB202622121OCT2025Moderate1Medium
4V011Aadhaar900.50840REJECTED105202505OCT20258027FEB202614505JAN2026Slow1Medium
5V014Aadhaar1000.60935REJECTED0117202617JAN20269027FEB20264117APR2026Slow1Medium
6 Biometri....  ....... .High
7V002Biometric500.10388APPROVED0210202510FEB20253027FEB202638210MAY2025Moderate0 
8V007Biometric450.20475REJECTED0618202518JUN20254027FEB202625418SEP2025Moderate0 
9V012Biometric550.15380APPROVED1122202522NOV20253027FEB20269722FEB2026Moderate0 
10V003Otp200.00090REJECTED035202505MAR2025027FEB202635905JUN2025Fast0Low
11V006Otp300.01098APPROVED0511202511MAY2025027FEB202629211AUG2025Fast0Low
12V010Otp250.00099APPROVED0913202513SEP2025027FEB202616713DEC2025Fast0Low
13V013Otp150.02096APPROVED122202502DEC2025027FEB20268702MAR2026Fast0Low
14V005Videokyc150.05285APPROVED0425202525APR20252027FEB202630825JUL2025Fast0Medium
15V009Videokyc400.03192APPROVED089202509AUG20251027FEB202620209NOV2025Moderate0Medium
16V015Videokyc480.07289APPROVED0228202628FEB20262027FEB2026-128MAY2026Moderate0Medium
17V001Aadhaar350.02195APPROVED0115202515JAN20251027FEB202640815APR2025Moderate0Medium
18V004Aadhaar700.30560REJECTED1220202520DEC20255027FEB20266920MAR2026Slow1Medium
19V008Aadhaar600.25655REJECTED0721202521JUL20256027FEB202622121OCT2025Moderate1Medium
20V011Aadhaar900.50840REJECTED105202505OCT20258027FEB202614505JAN2026Slow1Medium
21V014Aadhaar1000.60935REJECTED0117202617JAN20269027FEB20264117APR2026Slow1Medium
22 Biometri....  ....... .High
23V002Biometric500.10388APPROVED0210202510FEB20253027FEB202638210MAY2025Moderate0 
24V007Biometric450.20475REJECTED0618202518JUN20254027FEB202625418SEP2025Moderate0 
25V012Biometric550.15380APPROVED1122202522NOV20253027FEB20269722FEB2026Moderate0 
26V003Otp200.00090REJECTED035202505MAR2025027FEB202635905JUN2025Fast0Low
27V006Otp300.01098APPROVED0511202511MAY2025027FEB202629211AUG2025Fast0Low
28V010Otp250.00099APPROVED0913202513SEP2025027FEB202616713DEC2025Fast0Low
29V013Otp150.02096APPROVED122202502DEC2025027FEB20268702MAR2026Fast0Low
30V005Videokyc150.05285APPROVED0425202525APR20252027FEB202630825JUL2025Fast0Medium
31V009Videokyc400.03192APPROVED089202509AUG20251027FEB202620209NOV2025Moderate0Medium
32V015Videokyc480.07289APPROVED0228202628FEB20262027FEB2026-128MAY2026Moderate0Medium

·  You are appending the dataset to itself

·  It will duplicate all observations

·  It is logically incorrect in a real project

·  It may cause unintended data inflation

8. Append Different Datasets

Step 1: Create New Dataset

title "New Incoming Digital Identity Records";

data digital_identity_new;

    input Verification_ID $ Method:$12. Processing_Time Failure_Rate 

          Fraud_Attempts Confidence_Score Outcome $ Month Day Year;

    format Verification_Date date9.;

    Verification_Date = mdy(Month, Day, Year);

    datalines;

V016 Aadhaar 42 0.04 1 93 Approved 03 10 2026

V017 Biometric 75 0.35 6 58 Rejected 04 15 2026

V018 OTP 28 0.01 0 97 Approved 05 21 2026

;

run;

proc print data=digital_identity_new;

run;

title;

OUTPUT:

New Incoming Digital Identity Records

ObsVerification_IDMethodProcessing_TimeFailure_RateFraud_AttemptsConfidence_ScoreOutcomeMonthDayYearVerification_Date
1V016Aadhaar420.04193Approved310202610MAR2026
2V017Biometric750.35658Rejected415202615APR2026
3V018OTP280.01097Approved521202621MAY2026

Step 2: Standardize Structure Before Append

data digital_identity_new_clean;

    set digital_identity_new;

    Method = propcase(strip(Method));

    Outcome = upcase(strip(Outcome));

    if Processing_Time < 0 then Processing_Time = abs(Processing_Time);

    if Failure_Rate = . then Failure_Rate = 0;

    Fraud_Percentage = (Fraud_Attempts/10)*100;

run;

proc print data=digital_identity_new_clean;

run;

OUTPUT:

ObsVerification_IDMethodProcessing_TimeFailure_RateFraud_AttemptsConfidence_ScoreOutcomeMonthDayYearVerification_DateFraud_Percentage
1V016Aadhaar420.04193APPROVED310202610MAR202610
2V017Biometric750.35658REJECTED415202615APR202660
3V018Otp280.01097APPROVED521202621MAY20260

·  Before APPEND, variable structure must match.

Step 3: Correct PROC APPEND Usage

title "Appending New Records To Main Dataset";

proc append base=digital_identity_merged

            data=digital_identity_new_clean

            force;

run;

proc print data=digital_identity_merged;

run;

OUTPUT:

Appending New Records To Main Dataset

ObsVerification_IDMethodProcessing_TimeFailure_RateFraud_AttemptsConfidence_ScoreOutcomeMonthDayYearVerification_DateFraud_PercentageToday_DateDays_Since_VerificationNext_Review_DateUtilization_ClassFraud_FlagRisk_Level
1V001Aadhaar350.02195APPROVED0115202515JAN20251027FEB202640815APR2025Moderate0Medium
2V004Aadhaar700.30560REJECTED1220202520DEC20255027FEB20266920MAR2026Slow1Medium
3V008Aadhaar600.25655REJECTED0721202521JUL20256027FEB202622121OCT2025Moderate1Medium
4V011Aadhaar900.50840REJECTED105202505OCT20258027FEB202614505JAN2026Slow1Medium
5V014Aadhaar1000.60935REJECTED0117202617JAN20269027FEB20264117APR2026Slow1Medium
6V002Biometric500.10388APPROVED0210202510FEB20253027FEB202638210MAY2025Moderate0High
7V007Biometric450.20475REJECTED0618202518JUN20254027FEB202625418SEP2025Moderate0High
8V012Biometric550.15380APPROVED1122202522NOV20253027FEB20269722FEB2026Moderate0High
9V003Otp200.00090REJECTED035202505MAR2025027FEB202635905JUN2025Fast0Low
10V006Otp300.01098APPROVED0511202511MAY2025027FEB202629211AUG2025Fast0Low
11V010Otp250.00099APPROVED0913202513SEP2025027FEB202616713DEC2025Fast0Low
12V013Otp150.02096APPROVED122202502DEC2025027FEB20268702MAR2026Fast0Low
13V005Videokyc150.05285APPROVED0425202525APR20252027FEB202630825JUL2025Fast0Medium
14V009Videokyc400.03192APPROVED089202509AUG20251027FEB202620209NOV2025Moderate0Medium
15V015Videokyc480.07289APPROVED0228202628FEB20262027FEB2026-128MAY2026Moderate0Medium
16V016Aadhaar420.04193APPROVED 10202610MAR202610... . 
17V017Biometric750.35658REJECTED 15202615APR202660... . 
18V018Otp280.01097APPROVED 21202621MAY20260... . 

BASE

·  Target dataset

·  Existing master dataset

DATA

·  New dataset to be added

FORCE

·  Allows append even if:

·       Variable lengths differ

·       Minor attribute mismatch

·  Should be used carefully in production

9. TRANSPOSE

proc transpose data=digital_identity_merged out=transposed_data;

    var Processing_Time Fraud_Attempts Confidence_Score;

run;

proc print data=transposed_data;

run;

OUTPUT:

Obs_NAME_COL1COL2COL3COL4COL5COL6COL7COL8COL9COL10COL11COL12COL13COL14COL15COL16COL17COL18
1Processing_Time3570609010050455520302515154048427528
2Fraud_Attempts156893430000212160
3Confidence_Score956055403588758090989996859289935897

10. Numeric Functions Used

data numeric_example;

    set digital_identity_merged;

    Rounded_Score = round(Confidence_Score);

    Total_Risk = sum(Fraud_Attempts, Failure_Rate*100);

run;

proc print data=numeric_example;

run;

OUTPUT:

ObsVerification_IDMethodProcessing_TimeFailure_RateFraud_AttemptsConfidence_ScoreOutcomeMonthDayYearVerification_DateFraud_PercentageToday_DateDays_Since_VerificationNext_Review_DateUtilization_ClassFraud_FlagRisk_LevelRounded_ScoreTotal_Risk
1V001Aadhaar350.02195APPROVED0115202515JAN20251027FEB202640815APR2025Moderate0Medium953
2V004Aadhaar700.30560REJECTED1220202520DEC20255027FEB20266920MAR2026Slow1Medium6035
3V008Aadhaar600.25655REJECTED0721202521JUL20256027FEB202622121OCT2025Moderate1Medium5531
4V011Aadhaar900.50840REJECTED105202505OCT20258027FEB202614505JAN2026Slow1Medium4058
5V014Aadhaar1000.60935REJECTED0117202617JAN20269027FEB20264117APR2026Slow1Medium3569
6V002Biometric500.10388APPROVED0210202510FEB20253027FEB202638210MAY2025Moderate0High8813
7V007Biometric450.20475REJECTED0618202518JUN20254027FEB202625418SEP2025Moderate0High7524
8V012Biometric550.15380APPROVED1122202522NOV20253027FEB20269722FEB2026Moderate0High8018
9V003Otp200.00090REJECTED035202505MAR2025027FEB202635905JUN2025Fast0Low900
10V006Otp300.01098APPROVED0511202511MAY2025027FEB202629211AUG2025Fast0Low981
11V010Otp250.00099APPROVED0913202513SEP2025027FEB202616713DEC2025Fast0Low990
12V013Otp150.02096APPROVED122202502DEC2025027FEB20268702MAR2026Fast0Low962
13V005Videokyc150.05285APPROVED0425202525APR20252027FEB202630825JUL2025Fast0Medium857
14V009Videokyc400.03192APPROVED089202509AUG20251027FEB202620209NOV2025Moderate0Medium924
15V015Videokyc480.07289APPROVED0228202628FEB20262027FEB2026-128MAY2026Moderate0Medium899
16V016Aadhaar420.04193APPROVED 10202610MAR202610... . 935
17V017Biometric750.35658REJECTED 15202615APR202660... . 5841
18V018Otp280.01097APPROVED 21202621MAY20260... . 971

11. Character Functions Used

data char_example;

    set digital_identity_merged;

    Full_Info = catx("-", Verification_ID, Method, Outcome);

    Lower_Method = lowcase(Method);

    Trimmed_Method = trim(Method);

    Clean_Method = strip(Method);

run;

proc print data=char_example;

run;

OUTPUT:

ObsVerification_IDMethodProcessing_TimeFailure_RateFraud_AttemptsConfidence_ScoreOutcomeMonthDayYearVerification_DateFraud_PercentageToday_DateDays_Since_VerificationNext_Review_DateUtilization_ClassFraud_FlagRisk_LevelFull_InfoLower_MethodTrimmed_MethodClean_Method
1V001Aadhaar350.02195APPROVED0115202515JAN20251027FEB202640815APR2025Moderate0MediumV001-Aadhaar-APPROVEDaadhaarAadhaarAadhaar
2V004Aadhaar700.30560REJECTED1220202520DEC20255027FEB20266920MAR2026Slow1MediumV004-Aadhaar-REJECTEDaadhaarAadhaarAadhaar
3V008Aadhaar600.25655REJECTED0721202521JUL20256027FEB202622121OCT2025Moderate1MediumV008-Aadhaar-REJECTEDaadhaarAadhaarAadhaar
4V011Aadhaar900.50840REJECTED105202505OCT20258027FEB202614505JAN2026Slow1MediumV011-Aadhaar-REJECTEDaadhaarAadhaarAadhaar
5V014Aadhaar1000.60935REJECTED0117202617JAN20269027FEB20264117APR2026Slow1MediumV014-Aadhaar-REJECTEDaadhaarAadhaarAadhaar
6V002Biometric500.10388APPROVED0210202510FEB20253027FEB202638210MAY2025Moderate0HighV002-Biometric-APPROVEDbiometricBiometricBiometric
7V007Biometric450.20475REJECTED0618202518JUN20254027FEB202625418SEP2025Moderate0HighV007-Biometric-REJECTEDbiometricBiometricBiometric
8V012Biometric550.15380APPROVED1122202522NOV20253027FEB20269722FEB2026Moderate0HighV012-Biometric-APPROVEDbiometricBiometricBiometric
9V003Otp200.00090REJECTED035202505MAR2025027FEB202635905JUN2025Fast0LowV003-Otp-REJECTEDotpOtpOtp
10V006Otp300.01098APPROVED0511202511MAY2025027FEB202629211AUG2025Fast0LowV006-Otp-APPROVEDotpOtpOtp
11V010Otp250.00099APPROVED0913202513SEP2025027FEB202616713DEC2025Fast0LowV010-Otp-APPROVEDotpOtpOtp
12V013Otp150.02096APPROVED122202502DEC2025027FEB20268702MAR2026Fast0LowV013-Otp-APPROVEDotpOtpOtp
13V005Videokyc150.05285APPROVED0425202525APR20252027FEB202630825JUL2025Fast0MediumV005-Videokyc-APPROVEDvideokycVideokycVideokyc
14V009Videokyc400.03192APPROVED089202509AUG20251027FEB202620209NOV2025Moderate0MediumV009-Videokyc-APPROVEDvideokycVideokycVideokyc
15V015Videokyc480.07289APPROVED0228202628FEB20262027FEB2026-128MAY2026Moderate0MediumV015-Videokyc-APPROVEDvideokycVideokycVideokyc
16V016Aadhaar420.04193APPROVED 10202610MAR202610... . V016-Aadhaar-APPROVEDaadhaarAadhaarAadhaar
17V017Biometric750.35658REJECTED 15202615APR202660... . V017-Biometric-REJECTEDbiometricBiometricBiometric
18V018Otp280.01097APPROVED 21202621MAY20260... . V018-Otp-APPROVEDotpOtpOtp

CAT

Concatenates without separator

CATX

Concatenates with separator

12. PROC DATASETS DELETE

proc datasets library=work nolist;

    delete digital_identity_raw;

quit;

LOG:

NOTE: Deleting WORK.DIGITAL_IDENTITY_RAW (memtype=DATA).

·  Used to clean temporary datasets.

13. Final Corrected Full Code

title "Final Cleaned Digital Identity Fraud Detection System";

data digital_identity_final;

    set digital_identity_raw;

    Method = propcase(strip(Method));

    Outcome = upcase(strip(Outcome));

    if Processing_Time < 0 then Processing_Time = abs(Processing_Time);

    if Failure_Rate = . then Failure_Rate = 0;

    if Month > 12 then Month = 12;

    Verification_Date = mdy(Month, Day, Year);

    Days_Since_Verification = intck('day', Verification_Date, today());

    Next_Review_Date = intnx('month', Verification_Date, 3, 'same');

    if Processing_Time <= 30 then Utilization_Class="Fast";

    else if Processing_Time <= 60 then Utilization_Class="Moderate";

    else Utilization_Class="Slow";

    if Fraud_Attempts >= 5 and Confidence_Score < 70 then Fraud_Flag=1;

    else if Failure_Rate > 0.25 then Fraud_Flag=1;

    else Fraud_Flag=0;

    format Verification_Date Next_Review_Date date9.;

run;

proc print data=digital_identity_final;

run;

OUTPUT:

Final Cleaned Digital Identity Fraud Detection System

ObsVerification_IDMethodProcessing_TimeFailure_RateFraud_AttemptsConfidence_ScoreOutcomeMonthDayYearVerification_DateDays_Since_VerificationNext_Review_DateUtilization_ClassFraud_Flag
1V001Aadhaar350.02195APPROVED0115202515JAN202540815APR2025Mode0
2V002Biometric500.10388APPROVED0210202510FEB202538210MAY2025Mode0
3V003Otp200.00090REJECTED035202505MAR202535905JUN2025Fast0
4V004Aadhaar700.30560REJECTED1220202520DEC20256920MAR2026Slow1
5V005Videokyc150.05285APPROVED0425202525APR202530825JUL2025Fast0
6V006Otp300.01098APPROVED0511202511MAY202529211AUG2025Fast0
7V007Biometric450.20475REJECTED0618202518JUN202525418SEP2025Mode0
8V008Aadhaar600.25655REJECTED0721202521JUL202522121OCT2025Mode1
9V009Videokyc400.03192APPROVED089202509AUG202520209NOV2025Mode0
10V010Otp250.00099APPROVED0913202513SEP202516713DEC2025Fast0
11V011Aadhaar900.50840REJECTED105202505OCT202514505JAN2026Slow1
12V012Biometric550.15380APPROVED1122202522NOV20259722FEB2026Mode0
13V013Otp150.02096APPROVED122202502DEC20258702MAR2026Fast0
14V014Aadhaar1000.60935REJECTED0117202617JAN20264117APR2026Slow1
15V015Videokyc480.07289APPROVED0228202628FEB2026-128MAY2026Mode0

15 Key Points About This Project:

1.This project simulates a real-world digital identity verification system used in fintech and banking environments. 

2.A structured dataset with 15+ observations was created including verification, fraud, performance, and date variables.

3.Intentional data quality errors were introduced to mimic real production data issues.

4. The SET statement was used to read and transform raw datasets during cleaning.

5.Character functions like PROPCASE, UPCASE, LOWCASE, STRIP, and TRIM were used for standardization.

6.Numeric functions like ABS, SUM, ROUND, and INT were used to correct and derive numeric values.

7.The MDY function was used to construct proper SAS date values from separate month, day, and year fields.

8.INTCK was applied to calculate date differences such as days since verification.

9.INTNX was used to generate future review dates based on verification timelines.

10.Utilization classification was created based on processing time thresholds (Fast, Moderate, Slow).

11. Fraud detection logic was implemented using business rules combining Fraud_Attempts, Failure_Rate, and Confidence_Score.

12.A reusable macro was built to automate fraud detection logic across datasets.

13.MERGE, SET, and PROC APPEND were used to combine datasets based on business requirements.

14.PROC TRANSPOSE was applied to reshape data for reporting and dashboard purposes.

15.PROC DATASETS DELETE was used to remove temporary datasets and maintain a clean working environment.

 

Benefits:

1.Improves Fraud Detection
It helps identify suspicious verification attempts early, reducing financial loss and identity misuse.

2.Improves Data Quality
By fixing errors like missing values, wrong dates, and inconsistent text, the system becomes more reliable and accurate.

3.Automates Fraud Logic
Using macros, fraud detection rules can be reused and applied automatically to new incoming data.

4.Enhances Operational Efficiency
Utilization classification (Fast, Moderate, Slow) helps measure processing performance and improve turnaround time.

5.Supports Business Decision Making
Fraud percentage, failure rate, and confidence score analysis provide management with clear insights for risk control and policy improvement.


Summary:
This project builds a digital identity verification system in SAS that detects fraud and fixes data errors.It creates a dataset with verification details, cleans incorrect values, standardizes text fields, and corrects date formats.Fraud detection rules are applied using logical conditions and macros to automatically flag risky cases.Processing time is classified to measure system performance.Overall, the project improves data accuracy, automates fraud detection, and supports better business decision-making.

Conclusion:

In this complete digital identity verification analytics system, we:

·       Created raw dataset with intentional data quality issues

·       Identified and corrected errors

·       Standardized character fields

·       Applied date engineering

·       Calculated fraud percentages

·       Implemented utilization classification

·       Designed macro-based fraud detection

·       Used SET, MERGE, APPEND, TRANSPOSE

·       Applied numeric and character functions

·       Cleaned datasets using PROC DATASETS DELETE

This mirrors real-world fintech fraud detection pipelines.

If you attend a SAS interview at companies like TCS, Cognizant, or Syneos, explaining this end-to-end workflow will demonstrate:

·       Data cleaning capability

·       Business logic understanding

·       SAS technical strength

·       Macro programming expertise

·       Fraud analytics mindset

SAS INTERVIEW QUESTIONS

1.How do you append datasets in PROC SQL?

2.Difference between UNION and UNION ALL?

3.How to detect fraud using SAS?


--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

About the Author:

SAS Learning Hub is a data analytics and SAS programming platform focused on clinical, financial, and real-world data analysis. The content is created by professionals with academic training in Pharmaceutics and hands-on experience in Base SAS, PROC SQL, Macros, SDTM, and ADaM, providing practical and industry-relevant SAS learning resources.


Disclaimer:

The datasets and analysis in this article are created for educational and demonstration purposes only. They do not represent DIGITAL IDENTITY data.


Our Mission:

This blog provides industry-focused SAS programming tutorials and analytics projects covering finance, healthcare, and technology.


This project is suitable for:

·  Students learning SAS

·  Data analysts building portfolios

·  Professionals preparing for SAS interviews

·  Bloggers writing about analytics 

·  Clinical SAS Programmer

·  Research Data Analyst

·  Regulatory Data Validator


--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Follow Us On : 


 
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

--->Follow our blog for more SAS-based analytics projects and industry data models.

---> Support Us By Following Our Blog..

To deepen your understanding of SAS analytics, please refer to our other data science and industry-focused projects listed below:

1.Which Country Truly Dominates the Olympics? – A Complete SAS Medal Efficiency Analytics Project

2.Which Airports Are Really the Busiest? – An End-to-End SAS Airport Traffic Analytics Project

3.Can Data Predict Election Outcomes? – A Complete SAS Voting Analytics Project

--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

About Us | Contact Privacy Policy

Comments

Popular posts from this blog

412.Can We Build And Clean A University Course Analytics & Fraud Detection System In Sas Using Only Macros While Intentionally Creating And Fixing Errors?

409.Can We Build a Reliable Emergency Services Analytics & Fraud Detection System in SAS While Identifying and Fixing Intentional Errors?

397.If a satellite has excellent signal strength but very high latency, can it still deliver good quality communication? Why or why not?A Sas Study