378.CAN SAS REALLY DETECT INSURANCE FRAUD AND CONTROL CLAIM LOSSES USING REAL DATA ANALYTICS?

CAN SAS REALLY DETECT INSURANCE FRAUD AND CONTROL CLAIM LOSSES USING REAL DATA ANALYTICS?


HERE IN THIS PROJECT WE USED THESE SAS STATEMENTS ---DATA STEP | PROC SQL | PROC FREQ | PROC MEANS | PROC UNIVARIATE | MACROS | MDY | INTCK | INTNX | SET | MERGE | APPEND | TRANSPOSE | SAS FUNCTIONS


1.  BUSINESS CONTEXT – Why Insurance Claim Analytics Matters

Insurance companies process millions of claims every year.
Each claim is a potential financial risk.

Insurance analytics teams focus on:

• Detecting fraud
• Reducing claim leakage
• Improving settlement turnaround
• Identifying risky policies
• Ensuring regulatory compliance

Fraud can occur in:

• Health insurance
• Motor insurance
• Property insurance
• Life insurance
• Travel insurance

Even 1–2% fraud can cause crores of loss annually.

This SAS project simulates a real-world claim processing system where:

• Claims arrive
• Risk is calculated
• Fraud is flagged
• Settlement is decided


2. Project Learning Objectives

By completing this project, you will master:

• DATA step programming
• PROC SQL joins & calculations
• Risk scoring logic
• Fraud flagging with macros
• Date difference calculations
• Claim settlement analysis
• Production-style SAS coding


3. Creating Raw Insurance Claims Dataset

data claims_raw;

    format Claim_Date Submission_Date date9.;

    input Claim_ID $ Policy_Type $ Claim_Amount Approval_Time Fraud_Flag$ Settlement_Status $ 

          Claim_Date:date9. Submission_Date:date9.;

    datalines;

C001 Health 120000 5 N Approved 01JAN2024 28DEC2023

C002 Motor 450000 12 Y Pending 05JAN2024 20DEC2023

C003 Travel 60000 3 N Approved 10JAN2024 08JAN2024

C004 Property 950000 25 Y Rejected 12JAN2024 01JAN2024

C005 Health 180000 7 N Approved 15JAN2024 10JAN2024

C006 Motor 320000 15 Y Pending 18JAN2024 02JAN2024

C007 Health 75000 4 N Approved 20JAN2024 17JAN2024

C008 Property 1500000 30 Y Rejected 25JAN2024 05JAN2024

C009 Travel 45000 2 N Approved 28JAN2024 27JAN2024

C010 Motor 280000 10 N Approved 30JAN2024 20JAN2024

C011 Health 220000 9 N Approved 02FEB2024 25JAN2024

C012 Property 680000 18 Y Pending 05FEB2024 15JAN2024

C013 Travel 55000 3 N Approved 07FEB2024 06FEB2024

C014 Motor 410000 14 Y Pending 10FEB2024 28JAN2024

C015 Health 195000 8 N Approved 12FEB2024 05FEB2024

;

run;

proc print data=claims_raw;

run;

OUTPUT:

ObsClaim_DateSubmission_DateClaim_IDPolicy_TypeClaim_AmountApproval_TimeFraud_FlagSettlement_Status
101JAN202428DEC2023C001Health1200005NApproved
205JAN202420DEC2023C002Motor45000012YPending
310JAN202408JAN2024C003Travel600003NApproved
412JAN202401JAN2024C004Property95000025YRejected
515JAN202410JAN2024C005Health1800007NApproved
618JAN202402JAN2024C006Motor32000015YPending
720JAN202417JAN2024C007Health750004NApproved
825JAN202405JAN2024C008Property150000030YRejected
928JAN202427JAN2024C009Travel450002NApproved
1030JAN202420JAN2024C010Motor28000010NApproved
1102FEB202425JAN2024C011Health2200009NApproved
1205FEB202415JAN2024C012Property68000018YPending
1307FEB202406FEB2024C013Travel550003NApproved
1410FEB202428JAN2024C014Motor41000014YPending
1512FEB202405FEB2024C015Health1950008NApproved


4. Date Calculations using INTCK

data claims_dates;

    set claims_raw;

    Days_To_Submit = intck('day', Submission_Date, Claim_Date);

run;

proc print data=claims_dates;

run;

OUTPUT:

ObsClaim_DateSubmission_DateClaim_IDPolicy_TypeClaim_AmountApproval_TimeFraud_FlagSettlement_StatusDays_To_Submit
101JAN202428DEC2023C001Health1200005NApproved4
205JAN202420DEC2023C002Motor45000012YPending16
310JAN202408JAN2024C003Travel600003NApproved2
412JAN202401JAN2024C004Property95000025YRejected11
515JAN202410JAN2024C005Health1800007NApproved5
618JAN202402JAN2024C006Motor32000015YPending16
720JAN202417JAN2024C007Health750004NApproved3
825JAN202405JAN2024C008Property150000030YRejected20
928JAN202427JAN2024C009Travel450002NApproved1
1030JAN202420JAN2024C010Motor28000010NApproved10
1102FEB202425JAN2024C011Health2200009NApproved8
1205FEB202415JAN2024C012Property68000018YPending21
1307FEB202406FEB2024C013Travel550003NApproved1
1410FEB202428JAN2024C014Motor41000014YPending13
1512FEB202405FEB2024C015Health1950008NApproved7

claims_dates (INTCK)

• Calculates submission delay, a key fraud indicator.
• Helps identify rushed or delayed claims.
• Used in regulatory turnaround tracking.


5. Risk Score Calculation

data claims_risk;

    set claims_dates;

    Risk_Score = (Claim_Amount/10000) + Approval_Time + (Days_To_Submit*2);

run;

proc print data=claims_risk;

run;

OUTPUT:

ObsClaim_DateSubmission_DateClaim_IDPolicy_TypeClaim_AmountApproval_TimeFraud_FlagSettlement_StatusDays_To_SubmitRisk_Score
101JAN202428DEC2023C001Health1200005NApproved425.0
205JAN202420DEC2023C002Motor45000012YPending1689.0
310JAN202408JAN2024C003Travel600003NApproved213.0
412JAN202401JAN2024C004Property95000025YRejected11142.0
515JAN202410JAN2024C005Health1800007NApproved535.0
618JAN202402JAN2024C006Motor32000015YPending1679.0
720JAN202417JAN2024C007Health750004NApproved317.5
825JAN202405JAN2024C008Property150000030YRejected20220.0
928JAN202427JAN2024C009Travel450002NApproved18.5
1030JAN202420JAN2024C010Motor28000010NApproved1058.0
1102FEB202425JAN2024C011Health2200009NApproved847.0
1205FEB202415JAN2024C012Property68000018YPending21128.0
1307FEB202406FEB2024C013Travel550003NApproved110.5
1410FEB202428JAN2024C014Motor41000014YPending1381.0
1512FEB202405FEB2024C015Health1950008NApproved741.5

claims_risk

• Converts multiple claim factors into one numeric score.
• Enables ranking of claims by financial risk.
• Supports automated fraud logic.


6. Character Cleaning

data claims_clean;

    set claims_risk;

    Policy_Type = propcase(strip(Policy_Type));

    Settlement_Status = upcase(trim(Settlement_Status));

run;

proc print data=claims_clean;

run;

OUTPUT:

ObsClaim_DateSubmission_DateClaim_IDPolicy_TypeClaim_AmountApproval_TimeFraud_FlagSettlement_StatusDays_To_SubmitRisk_Score
101JAN202428DEC2023C001Health1200005NAPPROVED425.0
205JAN202420DEC2023C002Motor45000012YPENDING1689.0
310JAN202408JAN2024C003Travel600003NAPPROVED213.0
412JAN202401JAN2024C004Property95000025YREJECTED11142.0
515JAN202410JAN2024C005Health1800007NAPPROVED535.0
618JAN202402JAN2024C006Motor32000015YPENDING1679.0
720JAN202417JAN2024C007Health750004NAPPROVED317.5
825JAN202405JAN2024C008Property150000030YREJECTED20220.0
928JAN202427JAN2024C009Travel450002NAPPROVED18.5
1030JAN202420JAN2024C010Motor28000010NAPPROVED1058.0
1102FEB202425JAN2024C011Health2200009NAPPROVED847.0
1205FEB202415JAN2024C012Property68000018YPENDING21128.0
1307FEB202406FEB2024C013Travel550003NAPPROVED110.5
1410FEB202428JAN2024C014Motor41000014YPENDING1381.0
1512FEB202405FEB2024C015Health1950008NAPPROVED741.5

claims_clean (Character Functions)

• Standardizes text values for accurate grouping.
• Prevents duplicate categories caused by case or spaces.
• Improves PROC SQL and PROC FREQ reliability.


7. Fraud Detection Macro

%macro fraud_flag(input=, output=);

data &output;

    set &input;

    length Fraud_Category $12;

    if Risk_Score > 80 then Fraud_Category = "HIGH";

    else if Risk_Score > 50 then Fraud_Category = "MEDIUM";

    else Fraud_Category = "LOW";

run;

proc print data=&output;

run;

%mend;


%fraud_flag(input=claims_clean, output=claims_flagged);

OUTPUT:

ObsClaim_DateSubmission_DateClaim_IDPolicy_TypeClaim_AmountApproval_TimeFraud_FlagSettlement_StatusDays_To_SubmitRisk_ScoreFraud_Category
101JAN202428DEC2023C001Health1200005NAPPROVED425.0LOW
205JAN202420DEC2023C002Motor45000012YPENDING1689.0HIGH
310JAN202408JAN2024C003Travel600003NAPPROVED213.0LOW
412JAN202401JAN2024C004Property95000025YREJECTED11142.0HIGH
515JAN202410JAN2024C005Health1800007NAPPROVED535.0LOW
618JAN202402JAN2024C006Motor32000015YPENDING1679.0MEDIUM
720JAN202417JAN2024C007Health750004NAPPROVED317.5LOW
825JAN202405JAN2024C008Property150000030YREJECTED20220.0HIGH
928JAN202427JAN2024C009Travel450002NAPPROVED18.5LOW
1030JAN202420JAN2024C010Motor28000010NAPPROVED1058.0MEDIUM
1102FEB202425JAN2024C011Health2200009NAPPROVED847.0LOW
1205FEB202415JAN2024C012Property68000018YPENDING21128.0HIGH
1307FEB202406FEB2024C013Travel550003NAPPROVED110.5LOW
1410FEB202428JAN2024C014Motor41000014YPENDING1381.0HIGH
1512FEB202405FEB2024C015Health1950008NAPPROVED741.5LOW

fraud_flag Macro

• Automates fraud classification logic.
• Allows consistent risk grading across datasets.
• Supports production-level reuse.


8. PROC FREQ – Fraud Patterns

proc freq data=claims_flagged;

    tables Fraud_Category*Settlement_Status;

run;

OUTPUT:

The FREQ Procedure

Frequency
Percent
Row Pct
Col Pct
Table of Fraud_Category by Settlement_Status
Fraud_CategorySettlement_Status
APPROVEDPENDINGREJECTEDTotal
HIGH
0
0.00
0.00
0.00
3
20.00
60.00
75.00
2
13.33
40.00
100.00
5
33.33
 
 
LOW
8
53.33
100.00
88.89
0
0.00
0.00
0.00
0
0.00
0.00
0.00
8
53.33
 
 
MEDIUM
1
6.67
50.00
11.11
1
6.67
50.00
25.00
0
0.00
0.00
0.00
2
13.33
 
 
Total
9
60.00
4
26.67
2
13.33
15
100.00

PROC FREQ

• Detects fraud patterns by settlement status.
• Identifies abnormal approval behavior.
• Used in compliance audits.

9. PROC MEANS – Financial Risk

proc means data=claims_flagged mean sum max;

    class Fraud_Category;

    var Claim_Amount Risk_Score;

run;

OUTPUT:

The MEANS Procedure

Fraud_CategoryN ObsVariableMeanSumMaximum
HIGH5
Claim_Amount
Risk_Score
798000.00
132.0000000
3990000.00
660.0000000
1500000.00
220.0000000
LOW8
Claim_Amount
Risk_Score
118750.00
24.7500000
950000.00
198.0000000
220000.00
47.0000000
MEDIUM2
Claim_Amount
Risk_Score
300000.00
68.5000000
600000.00
137.0000000
320000.00
79.0000000

PROC MEANS

• Summarizes claim exposure by risk level.
• Measures financial impact of fraud.
• Supports management reporting.

10. PROC UNIVARIATE – Risk Distribution

proc univariate data=claims_flagged;

    var Risk_Score;

run;

OUTPUT:

The UNIVARIATE Procedure

Variable: Risk_Score

Moments
N15Sum Weights15
Mean66.3333333Sum Observations995
Std Deviation59.4452529Variance3533.7381
Skewness1.39401055Kurtosis1.90037472
Uncorrected SS115474Corrected SS49472.3333
Coeff Variation89.6159592Std Error Mean15.3486983
Basic Statistical Measures
LocationVariability
Mean66.33333Std Deviation59.44525
Median47.00000Variance3534
Mode.Range211.50000
  Interquartile Range71.50000
Tests for Location: Mu0=0
TestStatisticp Value
Student's tt4.321756Pr > |t|0.0007
SignM7.5Pr >= |M|<.0001
Signed RankS60Pr >= |S|<.0001
Quantiles (Definition 5)
LevelQuantile
100% Max220.0
99%220.0
95%220.0
90%142.0
75% Q389.0
50% Median47.0
25% Q117.5
10%10.5
5%8.5
1%8.5
0% Min8.5
Extreme Observations
LowestHighest
ValueObsValueObs
8.598114
10.513892
13.0312812
17.571424
25.012208

PROC UNIVARIATE

• Analyzes risk score distribution.
• Detects extreme or suspicious values.
• Supports fraud threshold tuning.

11. PROC SQL – High Risk Claims

proc sql;

    create table high_risk as

    select Claim_ID, Policy_Type, Claim_Amount, Risk_Score

    from claims_flagged

    where Risk_Score > 70;

quit;

proc print data=high_risk;

run;

OUTPUT:

ObsClaim_IDPolicy_TypeClaim_AmountRisk_Score
1C002Motor45000089
2C004Property950000142
3C006Motor32000079
4C008Property1500000220
5C012Property680000128
6C014Motor41000081

PROC SQL

• Filters and extracts high-risk claims.
• Enables audit-ready fraud listings.
• Supports investigator review.


12. PROC TRANSPOSE – Reporting

proc transpose data=claims_flagged out=claims_transposed;

    by Policy_Type NotSorted;

    id Fraud_Category;

    var Claim_Amount;

run;

proc print data=claims_transposed;

run;

OUTPUT:

ObsPolicy_Type_NAME_LOWHIGHMEDIUM
1HealthClaim_Amount120000..
2MotorClaim_Amount.450000.
3TravelClaim_Amount60000..
4PropertyClaim_Amount.950000.
5HealthClaim_Amount180000..
6MotorClaim_Amount..320000
7HealthClaim_Amount75000..
8PropertyClaim_Amount.1500000.
9TravelClaim_Amount45000..
10MotorClaim_Amount..280000
11HealthClaim_Amount220000..
12PropertyClaim_Amount.680000.
13TravelClaim_Amount55000..
14MotorClaim_Amount.410000.
15HealthClaim_Amount195000..

PROC TRANSPOSE

• Converts risk categories into report-ready columns.
• Used for dashboard and executive summaries.
• Improves visualization structure.


13. PROC APPEND

proc append base=claims_flagged 

            data=high_risk;

run;

proc print data=claims_flagged;

run;

OUTPUT:

ObsClaim_DateSubmission_DateClaim_IDPolicy_TypeClaim_AmountApproval_TimeFraud_FlagSettlement_StatusDays_To_SubmitRisk_ScoreFraud_Category
101JAN202428DEC2023C001Health1200005NAPPROVED425.0LOW
205JAN202420DEC2023C002Motor45000012YPENDING1689.0HIGH
310JAN202408JAN2024C003Travel600003NAPPROVED213.0LOW
412JAN202401JAN2024C004Property95000025YREJECTED11142.0HIGH
515JAN202410JAN2024C005Health1800007NAPPROVED535.0LOW
618JAN202402JAN2024C006Motor32000015YPENDING1679.0MEDIUM
720JAN202417JAN2024C007Health750004NAPPROVED317.5LOW
825JAN202405JAN2024C008Property150000030YREJECTED20220.0HIGH
928JAN202427JAN2024C009Travel450002NAPPROVED18.5LOW
1030JAN202420JAN2024C010Motor28000010NAPPROVED1058.0MEDIUM
1102FEB202425JAN2024C011Health2200009NAPPROVED847.0LOW
1205FEB202415JAN2024C012Property68000018YPENDING21128.0HIGH
1307FEB202406FEB2024C013Travel550003NAPPROVED110.5LOW
1410FEB202428JAN2024C014Motor41000014YPENDING1381.0HIGH
1512FEB202405FEB2024C015Health1950008NAPPROVED741.5LOW
16..C002Motor450000.  .89.0 
17..C004Property950000.  .142.0 
18..C006Motor320000.  .79.0 
19..C008Property1500000.  .220.0 
20..C012Property680000.  .128.0 
21..C014Motor410000.  .81.0 

PROC APPEND

• Merges high-risk cases back into master data.
• Preserves fraud tracking history.
• Used in risk escalation pipelines.


14. PROC DATASETS – Cleanup

proc datasets library=work;

    delete claims_transposed;

quit;

LOG: 
NOTE: Deleting WORK.CLAIMS_TRANSPOSED (memtype=DATA).

PROC DATASETS DELETE

• Removes temporary working tables.
• Reduces memory and workspace clutter.
• Maintains production data hygiene.


15. WHAT THIS PROJECT TEACH US

This is exactly how insurance companies build:

• Fraud detection engines
• Claim risk scoring models
• Settlement decision systems
• Regulatory reports
• Financial risk dashboards

We practiced:

• Real SAS data engineering
• Real business rules
• Real fraud logic
• Real insurance workflows

This project alone is strong enough for:

• Insurance domain interviews
• Banking analytics roles
• SAS Programmer positions
• Risk & compliance jobs

16. CONCLUSION

This project demonstrated how SAS can be used to build a complete insurance claims fraud detection and risk scoring system. By combining DATA steps, PROC SQL, statistical procedures, and macros, we transformed raw claim data into actionable risk intelligence. The model identified high-risk claims, optimized settlement decisions, and improved fraud visibility. This approach mirrors how real insurance companies protect revenue and ensure regulatory compliance.



INTERVIEW QUESTIONS FOR YOU

1.What is the difference between INTCK and INTNX in SAS date handling?
2.How does PROC TRANSPOSE work and when is it used in reporting?
3.What is the role of PROC DATASETS compared to deleting datasets using a DATA step?

About the Author:

SAS Learning Hub is a data analytics and SAS programming platform focused on clinical, financial, and real-world data analysis. The content is created by professionals with academic training in Pharmaceutics and hands-on experience in Base SAS, PROC SQL, Macros, SDTM, and ADaM, providing practical and industry-relevant SAS learning resources.


Disclaimer:

The datasets and analysis in this article are created for educational and demonstration purposes only. They do not represent Insurance Fraud data.


Our Mission:

This blog provides industry-focused SAS programming tutorials and analytics projects covering finance, healthcare, and technology.


This project is suitable for:

SAS Programmer Interviews

SAS Programmer Job Seekers

SAS Analysts


Follow Us On : 


 


--->Follow our blog for more SAS-based analytics projects and industry data models.

To deepen your understanding of SAS analytics, please refer to our other data science and industry-focused projects listed below:







Comments

Popular posts from this blog

397.If a satellite has excellent signal strength but very high latency, can it still deliver good quality communication? Why or why not?A Sas Study

401.How Efficient Are Global Data Centers? A Complete SAS Analytics Study

383.Which Pharma Distributors Are High-Risk? Can SAS Detect Delays, Temperature Violations, and Fraud?