If fraud exists in laboratory experiments, would your SAS analytics detect it in time?

--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

HERE WE USED THESE SAS STATEMENTS AND FUNCTIONS FOR THIS PROJECT:

1. Introduction

2. Business Context

3. Project Objectives

4. Dataset Design

5. Raw Data Creation (With Intentional Errors)

6. Identifying & Explaining Errors

7. Corrected Dataset – Full Code

8. Data Cleaning & Transformation

9. Utilization Classification Macro

10. Fraud Detection Macro

11. SQL Analysis

12. Statistical Analysis (MEANS, UNIVARIATE)

13. Correlation Study

14. Visualization (PROC SGPLOT)

15. Transpose & Append Operations

16. Dataset Cleanup

17. 5 Key Points About The Project

18. Summary

19. Conclusion

1. INTRODUCTION

Laboratory experiments are the backbone of pharmaceutical research, biotechnology innovation, chemical analysis, and material science validation.

Every experiment has:

· Duration

· Cost

· Error count

· Success rate

· Validation status

· Resource utilization

From an analytics perspective, we must answer:

· Which domain performs better?

· Which experiments show fraud-like patterns?

· Are long durations increasing cost?

· Does higher error count reduce success rate?

· Which experiments need re-validation?

2. BUSINESS CONTEXT

Imagine you are working as:

· Clinical Data Analyst

· Research Analytics Lead

· QA Validation Officer

· Lab Operations Manager

Your responsibilities:

· Monitor cost efficiency

· Identify suspicious experiments

· Validate success rate claims

· Track experiment timelines

· Improve operational utilization

Incorrect data may result in:

· Financial loss

· Regulatory failure

· Scientific misinterpretation

· Audit penalties

Therefore, data validation + fraud detection = critical.

3. PROJECT OBJECTIVES

1. Create 15+ experiment dataset
2. Introduce multiple intentional data errors
3. Clean & correct dataset
4. Classify utilization levels
5. Detect fraud patterns
6. Perform statistical analysis
7. Visualize trends
8. Explain each step clearly

4. DATASET DESIGN

Variables:

Variable	Type	Description
Experiment_ID	Char	Unique experiment code
Domain	Char	Pharma / Biotech / Chemical
Start_Date	Date	Experiment start
End_Date	Date	Experiment end
Duration_Days	Num	INTCK calculation
Cost	Num	Total experiment cost
Success_Rate	Num	% success
Error_Count	Num	Number of operational errors
Validation_Status	Char	Validated / Pending
Scientist_Name	Char	Lead scientist
Equipment_Utilization	Num	% usage
Fraud_Flag	Char	Derived flag

5. RAW DATA CREATION (WITH INTENTIONAL ERRORS)

data lab_raw;

length Experiment_ID $8 Domain $15 Validation_Status $12 Scientist_Name $20;

format Start_Date End_Date date9.;

input Experiment_ID $ Domain $ Start_Date :date9. End_Date :date9. Cost Success_Rate

Error_Count Validation_Status $ Scientist_Name $ Equipment_Utilization;

datalines;

EXP001 Pharma 01JAN2025 10JAN2025 50000 92 2 Validated Dr.Ravi 85

EXP002 Biotech 05JAN2025 02JAN2025 45000 105 3 Pending Dr.Sita 78

EXP003 Chemical 12JAN2025 20JAN2025 -20000 88 1 Validated Dr.Khan 90

EXP004 Pharma 15JAN2025 25JAN2025 60000 75 -2 Validated Dr.Ravi 88

EXP005 Biotech 20JAN2025 28JAN2025 52000 68 5 Pending Dr.Meera 110

EXP006 Chemical 01FEB2025 10FEB2025 48000 81 0 Validated Dr.Ajay 70

EXP007 Pharma 05FEB2025 18FEB2025 75000 95 2 Validated Dr.Ravi 92

EXP008 Biotech 10FEB2025 15FEB2025 30000 45 8 Pending Dr.Sita 65

EXP009 Chemical 12FEB2025 25FEB2025 90000 98 1 Validated Dr.Khan 97

EXP010 Pharma 20FEB2025 01MAR2025 100000 99 0 Validated Dr.Meera 99

EXP011 Biotech 22FEB2025 28FEB2025 55000 60 4 Pending Dr.Sita 75

EXP012 Chemical 01MAR2025 15MAR2025 65000 85 2 Validated Dr.Ajay 83

EXP013 Pharma 05MAR2025 20MAR2025 72000 88 3 Pending Dr.Ravi 89

EXP014 Biotech 10MAR2025 18MAR2025 58000 91 1 Validated Dr.Meera 82

EXP015 Chemical 12MAR2025 22MAR2025 61000 77 2 Validated Dr.Khan 84

;

run;

proc print data=lab_raw;

run;

OUTPUT:


Obs	Experiment_ID	Domain	Validation_Status	Scientist_Name	Start_Date	End_Date	Cost	Success_Rate	Error_Count	Equipment_Utilization
1	EXP001	Pharma	Validated	Dr.Ravi	01JAN2025	10JAN2025	50000	92	2	85
2	EXP002	Biotech	Pending	Dr.Sita	05JAN2025	02JAN2025	45000	105	3	78
3	EXP003	Chemical	Validated	Dr.Khan	12JAN2025	20JAN2025	-20000	88	1	90
4	EXP004	Pharma	Validated	Dr.Ravi	15JAN2025	25JAN2025	60000	75	-2	88
5	EXP005	Biotech	Pending	Dr.Meera	20JAN2025	28JAN2025	52000	68	5	110
6	EXP006	Chemical	Validated	Dr.Ajay	01FEB2025	10FEB2025	48000	81	0	70
7	EXP007	Pharma	Validated	Dr.Ravi	05FEB2025	18FEB2025	75000	95	2	92
8	EXP008	Biotech	Pending	Dr.Sita	10FEB2025	15FEB2025	30000	45	8	65
9	EXP009	Chemical	Validated	Dr.Khan	12FEB2025	25FEB2025	90000	98	1	97
10	EXP010	Pharma	Validated	Dr.Meera	20FEB2025	01MAR2025	100000	99	0	99
11	EXP011	Biotech	Pending	Dr.Sita	22FEB2025	28FEB2025	55000	60	4	75
12	EXP012	Chemical	Validated	Dr.Ajay	01MAR2025	15MAR2025	65000	85	2	83
13	EXP013	Pharma	Pending	Dr.Ravi	05MAR2025	20MAR2025	72000	88	3	89
14	EXP014	Biotech	Validated	Dr.Meera	10MAR2025	18MAR2025	58000	91	1	82
15	EXP015	Chemical	Validated	Dr.Khan	12MAR2025	22MAR2025	61000	77	2	84

· Dates may be reversed

· Percentages exceed logical bounds

· Negative values appear due to system bugs

· Data entry mistakes occur

· Missing or inconsistent formats exist

6. INTENTIONAL ERRORS IDENTIFIED

❌ Error 1

EXP002 End_Date < Start_Date

❌ Error 2

Success_Rate = 105 (cannot exceed 100)

❌ Error 3

Cost = -20000 (negative impossible)

❌ Error 4

Error_Count = -2

❌ Error 5

Equipment_Utilization = 110 (>100 impossible)

7. CORRECTED DATASET – FULL LENGTH CODE

data lab_clean;

set lab_raw;

Duration_Days = intck('day', Start_Date, End_Date);

if End_Date < Start_Date then do;

End_Date = intnx('day', Start_Date, 5);

end;

if Success_Rate > 100 then Success_Rate = 100;

if Success_Rate < 0 then Success_Rate = 0;

if Cost < 0 then Cost = abs(Cost);

if Error_Count < 0 then Error_Count = 0;

if Equipment_Utilization > 100 then Equipment_Utilization = 100;

Domain = propcase(strip(Domain));

Validation_Status = upcase(strip(Validation_Status));

Scientist_Name = propcase(Scientist_Name);

Experiment_Full = catx('-', Experiment_ID, Domain);

format Duration_Days 8.;

run;

proc print data=lab_clean;

run;

OUTPUT:


Obs	Experiment_ID	Domain	Validation_Status	Scientist_Name	Start_Date	End_Date	Cost	Success_Rate	Error_Count	Equipment_Utilization	Duration_Days	Experiment_Full
1	EXP001	Pharma	VALIDATED	Dr.Ravi	01JAN2025	10JAN2025	50000	92	2	85	9	EXP001-Pharma
2	EXP002	Biotech	PENDING	Dr.Sita	05JAN2025	10JAN2025	45000	100	3	78	-3	EXP002-Biotech
3	EXP003	Chemical	VALIDATED	Dr.Khan	12JAN2025	20JAN2025	20000	88	1	90	8	EXP003-Chemical
4	EXP004	Pharma	VALIDATED	Dr.Ravi	15JAN2025	25JAN2025	60000	75	0	88	10	EXP004-Pharma
5	EXP005	Biotech	PENDING	Dr.Meera	20JAN2025	28JAN2025	52000	68	5	100	8	EXP005-Biotech
6	EXP006	Chemical	VALIDATED	Dr.Ajay	01FEB2025	10FEB2025	48000	81	0	70	9	EXP006-Chemical
7	EXP007	Pharma	VALIDATED	Dr.Ravi	05FEB2025	18FEB2025	75000	95	2	92	13	EXP007-Pharma
8	EXP008	Biotech	PENDING	Dr.Sita	10FEB2025	15FEB2025	30000	45	8	65	5	EXP008-Biotech
9	EXP009	Chemical	VALIDATED	Dr.Khan	12FEB2025	25FEB2025	90000	98	1	97	13	EXP009-Chemical
10	EXP010	Pharma	VALIDATED	Dr.Meera	20FEB2025	01MAR2025	100000	99	0	99	9	EXP010-Pharma
11	EXP011	Biotech	PENDING	Dr.Sita	22FEB2025	28FEB2025	55000	60	4	75	6	EXP011-Biotech
12	EXP012	Chemical	VALIDATED	Dr.Ajay	01MAR2025	15MAR2025	65000	85	2	83	14	EXP012-Chemical
13	EXP013	Pharma	PENDING	Dr.Ravi	05MAR2025	20MAR2025	72000	88	3	89	15	EXP013-Pharma
14	EXP014	Biotech	VALIDATED	Dr.Meera	10MAR2025	18MAR2025	58000	91	1	82	8	EXP014-Biotech
15	EXP015	Chemical	VALIDATED	Dr.Khan	12MAR2025	22MAR2025	61000	77	2	84	10	EXP015-Chemical

· Date correction using INTNX

· Duration calculation using INTCK

· ABS() for negative cost

· Logical boundary correction for percentages

· Character normalization using STRIP and PROPCASE

· STRIP() → removes extra spaces

· PROPCASE( ) → proper capitalization

· UPCASE( ) → standard format

· LOWCASE( ) → lower conversion

· CAT( ) / CATX( ) → combine strings

ABS ( ) : Converts negative to positive.

8. UTILIZATION CLASSIFICATION MACRO

%macro utilization_flag;

data lab_clean;

set lab_clean;

length Utilization_Level $10;

if Equipment_Utilization >= 90 then Utilization_Level="High";

else if Equipment_Utilization >= 75 then Utilization_Level="Medium";

else Utilization_Level="Low";

run;

proc print data=lab_clean;

run;

%mend;

%utilization_flag;

OUTPUT:


Obs	Experiment_ID	Domain	Validation_Status	Scientist_Name	Start_Date	End_Date	Cost	Success_Rate	Error_Count	Equipment_Utilization	Duration_Days	Experiment_Full	Utilization_Level
1	EXP001	Pharma	VALIDATED	Dr.Ravi	01JAN2025	10JAN2025	50000	92	2	85	9	EXP001-Pharma	Medium
2	EXP002	Biotech	PENDING	Dr.Sita	05JAN2025	10JAN2025	45000	100	3	78	-3	EXP002-Biotech	Medium
3	EXP003	Chemical	VALIDATED	Dr.Khan	12JAN2025	20JAN2025	20000	88	1	90	8	EXP003-Chemical	High
4	EXP004	Pharma	VALIDATED	Dr.Ravi	15JAN2025	25JAN2025	60000	75	0	88	10	EXP004-Pharma	Medium
5	EXP005	Biotech	PENDING	Dr.Meera	20JAN2025	28JAN2025	52000	68	5	100	8	EXP005-Biotech	High
6	EXP006	Chemical	VALIDATED	Dr.Ajay	01FEB2025	10FEB2025	48000	81	0	70	9	EXP006-Chemical	Low
7	EXP007	Pharma	VALIDATED	Dr.Ravi	05FEB2025	18FEB2025	75000	95	2	92	13	EXP007-Pharma	High
8	EXP008	Biotech	PENDING	Dr.Sita	10FEB2025	15FEB2025	30000	45	8	65	5	EXP008-Biotech	Low
9	EXP009	Chemical	VALIDATED	Dr.Khan	12FEB2025	25FEB2025	90000	98	1	97	13	EXP009-Chemical	High
10	EXP010	Pharma	VALIDATED	Dr.Meera	20FEB2025	01MAR2025	100000	99	0	99	9	EXP010-Pharma	High
11	EXP011	Biotech	PENDING	Dr.Sita	22FEB2025	28FEB2025	55000	60	4	75	6	EXP011-Biotech	Medium
12	EXP012	Chemical	VALIDATED	Dr.Ajay	01MAR2025	15MAR2025	65000	85	2	83	14	EXP012-Chemical	Medium
13	EXP013	Pharma	PENDING	Dr.Ravi	05MAR2025	20MAR2025	72000	88	3	89	15	EXP013-Pharma	Medium
14	EXP014	Biotech	VALIDATED	Dr.Meera	10MAR2025	18MAR2025	58000	91	1	82	8	EXP014-Biotech	Medium
15	EXP015	Chemical	VALIDATED	Dr.Khan	12MAR2025	22MAR2025	61000	77	2	84	10	EXP015-Chemical	Medium

· High (>=90%)

· Medium (75–89%)

· Low (<75%)

Macro automates classification.

· Code reuse

· Scalability

· Automation

· Efficiency

9. FRAUD DETECTION MACRO

%macro fraud_check;

data fraud_check;

set lab_clean;

length Fraud_Flag $12;

if Success_Rate=100 and Error_Count>2 then Fraud_Flag="Suspicious";

else if Cost>90000 and Duration_Days<5 then Fraud_Flag="Suspicious";

else Fraud_Flag="Normal";

run;

proc print data=fraud_check;

run;

%mend;

%fraud_check;

OUTPUT:


Obs	Experiment_ID	Domain	Validation_Status	Scientist_Name	Start_Date	End_Date	Cost	Success_Rate	Error_Count	Equipment_Utilization	Duration_Days	Experiment_Full	Utilization_Level	Fraud_Flag
1	EXP001	Pharma	VALIDATED	Dr.Ravi	01JAN2025	10JAN2025	50000	92	2	85	9	EXP001-Pharma	Medium	Normal
2	EXP002	Biotech	PENDING	Dr.Sita	05JAN2025	10JAN2025	45000	100	3	78	-3	EXP002-Biotech	Medium	Suspicious
3	EXP003	Chemical	VALIDATED	Dr.Khan	12JAN2025	20JAN2025	20000	88	1	90	8	EXP003-Chemical	High	Normal
4	EXP004	Pharma	VALIDATED	Dr.Ravi	15JAN2025	25JAN2025	60000	75	0	88	10	EXP004-Pharma	Medium	Normal
5	EXP005	Biotech	PENDING	Dr.Meera	20JAN2025	28JAN2025	52000	68	5	100	8	EXP005-Biotech	High	Normal
6	EXP006	Chemical	VALIDATED	Dr.Ajay	01FEB2025	10FEB2025	48000	81	0	70	9	EXP006-Chemical	Low	Normal
7	EXP007	Pharma	VALIDATED	Dr.Ravi	05FEB2025	18FEB2025	75000	95	2	92	13	EXP007-Pharma	High	Normal
8	EXP008	Biotech	PENDING	Dr.Sita	10FEB2025	15FEB2025	30000	45	8	65	5	EXP008-Biotech	Low	Normal
9	EXP009	Chemical	VALIDATED	Dr.Khan	12FEB2025	25FEB2025	90000	98	1	97	13	EXP009-Chemical	High	Normal
10	EXP010	Pharma	VALIDATED	Dr.Meera	20FEB2025	01MAR2025	100000	99	0	99	9	EXP010-Pharma	High	Normal
11	EXP011	Biotech	PENDING	Dr.Sita	22FEB2025	28FEB2025	55000	60	4	75	6	EXP011-Biotech	Medium	Normal
12	EXP012	Chemical	VALIDATED	Dr.Ajay	01MAR2025	15MAR2025	65000	85	2	83	14	EXP012-Chemical	Medium	Normal
13	EXP013	Pharma	PENDING	Dr.Ravi	05MAR2025	20MAR2025	72000	88	3	89	15	EXP013-Pharma	Medium	Normal
14	EXP014	Biotech	VALIDATED	Dr.Meera	10MAR2025	18MAR2025	58000	91	1	82	8	EXP014-Biotech	Medium	Normal
15	EXP015	Chemical	VALIDATED	Dr.Khan	12MAR2025	22MAR2025	61000	77	2	84	10	EXP015-Chemical	Medium	Normal

Fraud indicators used:

· 100% success with high errors

· High cost with extremely short duration

These patterns are statistically suspicious.

10. PROC SQL ANALYSIS

proc sql;

create table domain_summary as

select Domain,

count(*) as Total_Experiments,

avg(Cost) as Avg_Cost,

avg(Success_Rate) as Avg_Success

from fraud_check

group by Domain;

quit;

proc print data=domain_summary;

run;

OUTPUT:


Obs	Domain	Total_Experiments	Avg_Cost	Avg_Success
1	Biotech	5	48000	72.8
2	Chemical	5	56800	85.8
3	Pharma	5	71400	89.8

SQL allows:

· Aggregation

· Filtering

· Grouping

· Joining

11. PROC MEANS

proc means data=fraud_check mean median min max;

var Cost Success_Rate Duration_Days Error_Count;

run;

OUTPUT:

The MEANS Procedure


Variable	Mean	Median	Minimum	Maximum
Cost Success_Rate Duration_Days Error_Count	58733.33 82.8000000 8.9333333 2.2666667	58000.00 88.0000000 9.0000000 2.0000000	20000.00 45.0000000 -3.0000000 0	100000.00 100.0000000 15.0000000 8.0000000

Provides:

· Mean

· Median

· Minimum

· Maximum

· Standard deviation

12. PROC UNIVARIATE

proc univariate data=fraud_check;

var Cost;

histogram Cost;

run;

OUTPUT:

The UNIVARIATE Procedure

Variable: Cost


Moments
N	15	Sum Weights	15
Mean	58733.3333	Sum Observations	881000
Std Deviation	20585.9406	Variance	423780952
Skewness	0.19892832	Kurtosis	0.48197115
Uncorrected SS	5.7677E10	Corrected SS	5932933333
Coeff Variation	35.0498422	Std Error Mean	5315.26702


Basic Statistical Measures
Location		Variability
Mean	58733.33	Std Deviation	20586
Median	58000.00	Variance	423780952
Mode	.	Range	80000
		Interquartile Range	24000


Tests for Location: Mu0=0
Test	Statistic		p Value
Student's t	t	11.04993	Pr > \|t\|	<.0001
Sign	M	7.5	Pr >= \|M\|	<.0001
Signed Rank	S	60	Pr >= \|S\|	<.0001


Quantiles (Definition 5)
Level	Quantile
100% Max	100000
99%	100000
95%	100000
90%	90000
75% Q3	72000
50% Median	58000
25% Q1	48000
10%	30000
5%	20000
1%	20000
0% Min	20000


Extreme Observations
Lowest		Highest
Value	Obs	Value	Obs
20000	3	65000	12
30000	8	72000	13
45000	2	75000	7
48000	6	90000	9
50000	1	100000	10

The UNIVARIATE Procedure

Gives:

· Skewness

· Kurtosis

· Histogram

· Normality tests

13. PROC FREQ

proc freq data=fraud_check;

tables Domain*Fraud_Flag / chisq;

run;

OUTPUT:

The FREQ Procedure

Frequency Percent Row Pct Col Pct


Table of Domain by Fraud_Flag
Domain	Fraud_Flag
Domain	Normal	Suspicious	Total
Biotech	4 26.67 80.00 28.57	1 6.67 20.00 100.00	5 33.33
Chemical	5 33.33 100.00 35.71	0 0.00 0.00 0.00	5 33.33
Pharma	5 33.33 100.00 35.71	0 0.00 0.00 0.00	5 33.33
Total	14 93.33	1 6.67	15 100.00

Statistics for Table of Domain by Fraud_Flag


Statistic	DF	Value	Prob
WARNING: 100% of the cells have expected counts less than 5. Chi-Square may not be a valid test.
Chi-Square	2	2.1429	0.3425
Likelihood Ratio Chi-Square	2	2.3439	0.3098
Mantel-Haenszel Chi-Square	1	1.5000	0.2207
Phi Coefficient		0.3780
Contingency Coefficient		0.3536
Cramer's V		0.3780

Sample Size = 15

Used for:

· Domain distribution

· Fraud flag counts

· Validation status breakdown

With CHISQ option:

We test statistical association between domain and fraud.

14. PROC CORR

proc corr data=fraud_check;

var Cost Duration_Days Success_Rate Error_Count;

run;

OUTPUT:

The CORR Procedure


4 Variables:	Cost Duration_Days Success_Rate Error_Count


Simple Statistics
Variable	N	Mean	Std Dev	Sum	Minimum	Maximum
Cost	15	58733	20586	881000	20000	100000
Duration_Days	15	8.93333	4.36654	134.00000	-3.00000	15.00000
Success_Rate	15	82.80000	15.61684	1242	45.00000	100.00000
Error_Count	15	2.26667	2.15362	34.00000	0	8.00000


Pearson Correlation Coefficients, N = 15 Prob > \|r\| under H0: Rho=0
	Cost	Duration_Days	Success_Rate	Error_Count
Cost	1.00000	0.50994 0.0521	0.48173 0.0690	-0.41234 0.1267
Duration_Days	0.50994 0.0521	1.00000	0.16005 0.5688	-0.32459 0.2379
Success_Rate	0.48173 0.0690	0.16005 0.5688	1.00000	-0.71401 0.0028
Error_Count	-0.41234 0.1267	-0.32459 0.2379	-0.71401 0.0028	1.00000

Checks linear relationships:

· Duration vs Cost

· Error_Count vs Success_Rate

If correlation is strong:

We understand dependency.

15. PROC SGPLOT

proc sgplot data=fraud_check;

scatter x=Duration_Days y=Cost;

run;

OUTPUT:

Scatter plot:

Duration vs Cost.

Visual pattern reveals:

· Linear trend

· Outliers

· Clusters

Executives understand visuals faster than tables.

16. PROC TRANSPOSE

proc transpose data=fraud_check out=cost_transpose;

var Cost Success_Rate;

id Experiment_ID;

run;

proc print data=cost_transpose;

run;

OUTPUT:


Obs	_NAME_	EXP001	EXP002	EXP003	EXP004	EXP005	EXP006	EXP007	EXP008	EXP009	EXP010	EXP011	EXP012	EXP013	EXP014	EXP015
1	Cost	50000	45000	20000	60000	52000	48000	75000	30000	90000	100000	55000	65000	72000	58000	61000
2	Success_Rate	92	100	88	75	68	81	95	45	98	99	60	85	88	91	77

Converts rows to columns.

17. APPEND

data additional;

set fraud_check(obs=3);

Cost = Cost + 5000;

run;

proc print data=additional;

run;

OUTPUT:


Obs	Experiment_ID	Domain	Validation_Status	Scientist_Name	Start_Date	End_Date	Cost	Success_Rate	Error_Count	Equipment_Utilization	Duration_Days	Experiment_Full	Utilization_Level	Fraud_Flag
1	EXP001	Pharma	VALIDATED	Dr.Ravi	01JAN2025	10JAN2025	55000	92	2	85	9	EXP001-Pharma	Medium	Normal
2	EXP002	Biotech	PENDING	Dr.Sita	05JAN2025	10JAN2025	50000	100	3	78	-3	EXP002-Biotech	Medium	Suspicious
3	EXP003	Chemical	VALIDATED	Dr.Khan	12JAN2025	20JAN2025	25000	88	1	90	8	EXP003-Chemical	High	Normal

proc append base=fraud_check

data=additional force;

run;

proc print data=fraud_check;

run;

OUTPUT:


Obs	Experiment_ID	Domain	Validation_Status	Scientist_Name	Start_Date	End_Date	Cost	Success_Rate	Error_Count	Equipment_Utilization	Duration_Days	Experiment_Full	Utilization_Level	Fraud_Flag
1	EXP001	Pharma	VALIDATED	Dr.Ravi	01JAN2025	10JAN2025	50000	92	2	85	9	EXP001-Pharma	Medium	Normal
2	EXP002	Biotech	PENDING	Dr.Sita	05JAN2025	10JAN2025	45000	100	3	78	-3	EXP002-Biotech	Medium	Suspicious
3	EXP003	Chemical	VALIDATED	Dr.Khan	12JAN2025	20JAN2025	20000	88	1	90	8	EXP003-Chemical	High	Normal
4	EXP004	Pharma	VALIDATED	Dr.Ravi	15JAN2025	25JAN2025	60000	75	0	88	10	EXP004-Pharma	Medium	Normal
5	EXP005	Biotech	PENDING	Dr.Meera	20JAN2025	28JAN2025	52000	68	5	100	8	EXP005-Biotech	High	Normal
6	EXP006	Chemical	VALIDATED	Dr.Ajay	01FEB2025	10FEB2025	48000	81	0	70	9	EXP006-Chemical	Low	Normal
7	EXP007	Pharma	VALIDATED	Dr.Ravi	05FEB2025	18FEB2025	75000	95	2	92	13	EXP007-Pharma	High	Normal
8	EXP008	Biotech	PENDING	Dr.Sita	10FEB2025	15FEB2025	30000	45	8	65	5	EXP008-Biotech	Low	Normal
9	EXP009	Chemical	VALIDATED	Dr.Khan	12FEB2025	25FEB2025	90000	98	1	97	13	EXP009-Chemical	High	Normal
10	EXP010	Pharma	VALIDATED	Dr.Meera	20FEB2025	01MAR2025	100000	99	0	99	9	EXP010-Pharma	High	Normal
11	EXP011	Biotech	PENDING	Dr.Sita	22FEB2025	28FEB2025	55000	60	4	75	6	EXP011-Biotech	Medium	Normal
12	EXP012	Chemical	VALIDATED	Dr.Ajay	01MAR2025	15MAR2025	65000	85	2	83	14	EXP012-Chemical	Medium	Normal
13	EXP013	Pharma	PENDING	Dr.Ravi	05MAR2025	20MAR2025	72000	88	3	89	15	EXP013-Pharma	Medium	Normal
14	EXP014	Biotech	VALIDATED	Dr.Meera	10MAR2025	18MAR2025	58000	91	1	82	8	EXP014-Biotech	Medium	Normal
15	EXP015	Chemical	VALIDATED	Dr.Khan	12MAR2025	22MAR2025	61000	77	2	84	10	EXP015-Chemical	Medium	Normal
16	EXP001	Pharma	VALIDATED	Dr.Ravi	01JAN2025	10JAN2025	55000	92	2	85	9	EXP001-Pharma	Medium	Normal
17	EXP002	Biotech	PENDING	Dr.Sita	05JAN2025	10JAN2025	50000	100	3	78	-3	EXP002-Biotech	Medium	Suspicious
18	EXP003	Chemical	VALIDATED	Dr.Khan	12JAN2025	20JAN2025	25000	88	1	90	8	EXP003-Chemical	High	Normal

Adds new records to base dataset.

Used when incremental data arrives.

18. PROC DATASETS DELETE

proc datasets lib=work;

delete lab_clean fraud_check ;

run;

LOG:

 NOTE: Deleting WORK.LAB_CLEAN (memtype=DATA).

 NOTE: Deleting WORK.FRAUD_CHECK (memtype=DATA).

Deletes unwanted datasets.

Keeps WORK library clean.

Important for:

· Memory optimization

· Preventing dataset confusion

· Maintaining reproducibility

19. 5 Key Points About the Project

1. Comprehensive Data Cleaning & Validation
Implemented logical checks for dates, percentages, negative values, and utilization limits to ensure data integrity.

2. Automation Using Macros
Built reusable macros for utilization classification and fraud detection, improving scalability and efficiency.

3. Advanced Statistical Analysis
Applied PROC MEANS, PROC UNIVARIATE, and PROC CORR to understand distribution, variability, and relationships between cost, duration, and success rate.

4. Business Intelligence via PROC SQL
Generated domain-level summaries and performance metrics to support management decision-making.

5. Fraud & Risk Analytics Framework
Designed rule-based detection logic to flag suspicious experiments based on cost, duration, and error patterns.

20. Project Summary

Laboratory experiment performance monitoring, data validation, utilization optimization, and fraud analytics using structured SAS programming.

21. Conclusion

This project demonstrates a complete analytical lifecycle for laboratory experiment performance monitoring using SAS. Starting from raw data containing intentional logical errors, we applied systematic validation techniques to correct inconsistencies in dates, percentages, costs, and error counts. Through structured DATA step processing, SQL aggregation, statistical procedures, and macro automation, the dataset was transformed into a reliable analytical model.

The utilization classification macro provided operational insights into equipment efficiency, while the fraud detection logic identified suspicious experiment patterns based on abnormal success rates and cost-duration inconsistencies. Statistical procedures such as PROC MEANS, PROC UNIVARIATE, and PROC CORR enabled deeper understanding of cost distribution and variable relationships.

Overall, the project reflects real-world industry practices in pharmaceutical and research environments, combining data quality control, operational intelligence, and risk analytics into a single integrated SAS-based framework suitable for audit, regulatory review, and strategic decision-making.

SAS INTERVIEW QUESTIONS

1.Why use INTCK vs INTNX?

2.How do macros help operational classification?

3.How do you detect data quality issues in SAS?

--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
About the Author:
SAS Learning Hub is a data analytics and SAS programming platform focused on clinical, financial, and real-world data analysis. The content is created by professionals with academic training in Pharmaceutics and hands-on experience in Base SAS, PROC SQL, Macros, SDTM, and ADaM, providing practical and industry-relevant SAS learning resources.

Disclaimer:
The datasets and analysis in this article are created for educational and demonstration purposes only. They do not represent LABORATORY data.

Our Mission:
This blog provides industry-focused SAS programming tutorials and analytics projects covering finance, healthcare, and technology.

This project is suitable for:
·  Students learning SAS
·  Data analysts building portfolios
·  Professionals preparing for SAS interviews
·  Bloggers writing about analytics and smart cities
· Clinical SAS Programmer
· Research Data Analyst
· Quality Control Officer
· Regulatory Data Validator

· Lab Operations Analyst
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Follow Us On :

--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

--->Follow our blog for more SAS-based analytics projects and industry data models.

---> Support Us By Following Our Blog..

To deepen your understanding of SAS analytics, please refer to our other data science and industry-focused projects listed below:

1.Can DATA STEP and PROC SQL uncover which Virtual Reality (VR) applications are truly driving user engagement and revenue growth?

2.Can PROC SQL and PROC SORT uncover which space inventions truly transformed global technology and innovation?

3.How does PROC FORMAT transform raw Indian music data into meaningful cultural categories for deeper analytics?

--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

About Us | Contact | Privacy Policy

Search This Blog

SAS Learning Hub

406.If fraud exists in laboratory experiments, would your SAS analytics detect it in time?

If fraud exists in laboratory experiments, would your SAS analytics detect it in time?

TABLE OF CONTENTS

1. INTRODUCTION

2. BUSINESS CONTEXT

3. PROJECT OBJECTIVES

4. DATASET DESIGN

ABS ( ) : Converts negative to positive.

19. 5 Key Points About the Project

20. Project Summary

Follow Us On :

To deepen your understanding of SAS analytics, please refer to our other data science and industry-focused projects listed below:

About Us | Contact | Privacy Policy

Comments

Post a Comment

Popular posts from this blog

397.If a satellite has excellent signal strength but very high latency, can it still deliver good quality communication? Why or why not?A Sas Study

401.How Efficient Are Global Data Centers? A Complete SAS Analytics Study

383.Which Pharma Distributors Are High-Risk? Can SAS Detect Delays, Temperature Violations, and Fraud?