406.If fraud exists in laboratory experiments, would your SAS analytics detect it in time?

If fraud exists in laboratory experiments, would your SAS analytics detect it in time?

--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

HERE WE USED THESE SAS STATEMENTS AND FUNCTIONS FOR THIS PROJECT:

DATA STEP | SET | INPUT | DATALINES | LENGTH | FORMAT | IF–THEN–ELSE | DO–END | INTCK | INTNX | MDY | ABS | ROUND | SUM | COALESCE | STRIP | TRIM | CAT | CATX | PROPCASE | UPCASE | LOWCASE | MERGE | PROC DATASETS | DELETE | PROC SQL | CREATE TABLE | SELECT | GROUP BY | PROC MEANS | PROC UNIVARIATE | PROC FREQ | PROC CORR | PROC SGPLOT | PROC TRANSPOSE | PROC APPEND | MACRO | %MACRO | %MEND

--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

TABLE OF CONTENTS

1.     Introduction

2.     Business Context

3.     Project Objectives

4.     Dataset Design

5.     Raw Data Creation (With Intentional Errors)

6.     Identifying & Explaining Errors

7.     Corrected Dataset – Full Code

8.     Data Cleaning & Transformation

9.     Utilization Classification Macro

10.  Fraud Detection Macro

11.  SQL Analysis

12.  Statistical Analysis (MEANS, UNIVARIATE)

13.  Correlation Study

14.  Visualization (PROC SGPLOT)

15.  Transpose & Append Operations

16.  Dataset Cleanup

17.  5 Key Points About The Project

18. Summary

19. Conclusion

1. INTRODUCTION

Laboratory experiments are the backbone of pharmaceutical research, biotechnology innovation, chemical analysis, and material science validation.

Every experiment has:

·       Duration

·       Cost

·       Error count

·       Success rate

·       Validation status

·       Resource utilization

From an analytics perspective, we must answer:

·       Which domain performs better?

·       Which experiments show fraud-like patterns?

·       Are long durations increasing cost?

·       Does higher error count reduce success rate?

·       Which experiments need re-validation?

2. BUSINESS CONTEXT

Imagine you are working as:

·       Clinical Data Analyst

·       Research Analytics Lead

·       QA Validation Officer

·       Lab Operations Manager

Your responsibilities:

·       Monitor cost efficiency

·       Identify suspicious experiments

·       Validate success rate claims

·       Track experiment timelines

·       Improve operational utilization

Incorrect data may result in:

·       Financial loss

·       Regulatory failure

·       Scientific misinterpretation

·       Audit penalties

Therefore, data validation + fraud detection = critical.

3. PROJECT OBJECTIVES

1. Create 15+ experiment dataset
2. Introduce multiple intentional data errors
3. Clean & correct dataset
4. Classify utilization levels
5. Detect fraud patterns
6. Perform statistical analysis
7. Visualize trends
8. Explain each step clearly

4. DATASET DESIGN

Variables:

Variable

Type

Description

Experiment_ID

Char

Unique experiment code

Domain

Char

Pharma / Biotech / Chemical

Start_Date

Date

Experiment start

End_Date

Date

Experiment end

Duration_Days

Num

INTCK calculation

Cost

Num

Total experiment cost

Success_Rate

Num

% success

Error_Count

Num

Number of operational errors

Validation_Status

Char

Validated / Pending

Scientist_Name

Char

Lead scientist

Equipment_Utilization

Num

% usage

Fraud_Flag

Char

Derived flag

5. RAW DATA CREATION (WITH INTENTIONAL ERRORS)

data lab_raw;

length Experiment_ID $8 Domain $15 Validation_Status $12 Scientist_Name $20;

format Start_Date End_Date date9.;

input Experiment_ID $ Domain $ Start_Date :date9. End_Date :date9. Cost Success_Rate

       Error_Count Validation_Status $ Scientist_Name $ Equipment_Utilization;

datalines;

EXP001 Pharma 01JAN2025 10JAN2025 50000 92 2 Validated Dr.Ravi 85

EXP002 Biotech 05JAN2025 02JAN2025 45000 105 3 Pending Dr.Sita 78

EXP003 Chemical 12JAN2025 20JAN2025 -20000 88 1 Validated Dr.Khan 90

EXP004 Pharma 15JAN2025 25JAN2025 60000 75 -2 Validated Dr.Ravi 88

EXP005 Biotech 20JAN2025 28JAN2025 52000 68 5 Pending Dr.Meera 110

EXP006 Chemical 01FEB2025 10FEB2025 48000 81 0 Validated Dr.Ajay 70

EXP007 Pharma 05FEB2025 18FEB2025 75000 95 2 Validated Dr.Ravi 92

EXP008 Biotech 10FEB2025 15FEB2025 30000 45 8 Pending Dr.Sita 65

EXP009 Chemical 12FEB2025 25FEB2025 90000 98 1 Validated Dr.Khan 97

EXP010 Pharma 20FEB2025 01MAR2025 100000 99 0 Validated Dr.Meera 99

EXP011 Biotech 22FEB2025 28FEB2025 55000 60 4 Pending Dr.Sita 75

EXP012 Chemical 01MAR2025 15MAR2025 65000 85 2 Validated Dr.Ajay 83

EXP013 Pharma 05MAR2025 20MAR2025 72000 88 3 Pending Dr.Ravi 89

EXP014 Biotech 10MAR2025 18MAR2025 58000 91 1 Validated Dr.Meera 82

EXP015 Chemical 12MAR2025 22MAR2025 61000 77 2 Validated Dr.Khan 84

;

run;

proc print data=lab_raw;

run;

OUTPUT:

ObsExperiment_IDDomainValidation_StatusScientist_NameStart_DateEnd_DateCostSuccess_RateError_CountEquipment_Utilization
1EXP001PharmaValidatedDr.Ravi01JAN202510JAN20255000092285
2EXP002BiotechPendingDr.Sita05JAN202502JAN202545000105378
3EXP003ChemicalValidatedDr.Khan12JAN202520JAN2025-2000088190
4EXP004PharmaValidatedDr.Ravi15JAN202525JAN20256000075-288
5EXP005BiotechPendingDr.Meera20JAN202528JAN202552000685110
6EXP006ChemicalValidatedDr.Ajay01FEB202510FEB20254800081070
7EXP007PharmaValidatedDr.Ravi05FEB202518FEB20257500095292
8EXP008BiotechPendingDr.Sita10FEB202515FEB20253000045865
9EXP009ChemicalValidatedDr.Khan12FEB202525FEB20259000098197
10EXP010PharmaValidatedDr.Meera20FEB202501MAR202510000099099
11EXP011BiotechPendingDr.Sita22FEB202528FEB20255500060475
12EXP012ChemicalValidatedDr.Ajay01MAR202515MAR20256500085283
13EXP013PharmaPendingDr.Ravi05MAR202520MAR20257200088389
14EXP014BiotechValidatedDr.Meera10MAR202518MAR20255800091182
15EXP015ChemicalValidatedDr.Khan12MAR202522MAR20256100077284

·  Dates may be reversed

·  Percentages exceed logical bounds

·  Negative values appear due to system bugs

·  Data entry mistakes occur

·  Missing or inconsistent formats exist

6. INTENTIONAL ERRORS IDENTIFIED

 Error 1

EXP002 End_Date < Start_Date

❌ Error 2

Success_Rate = 105 (cannot exceed 100)

❌ Error 3

Cost = -20000 (negative impossible)

❌ Error 4

Error_Count = -2

❌ Error 5

Equipment_Utilization = 110 (>100 impossible)

7. CORRECTED DATASET – FULL LENGTH CODE

data lab_clean;

set lab_raw;


Duration_Days = intck('day', Start_Date, End_Date);

if End_Date < Start_Date then do;

    End_Date = intnx('day', Start_Date, 5);

end;

if Success_Rate > 100 then Success_Rate = 100;

if Success_Rate < 0 then Success_Rate = 0;

if Cost < 0 then Cost = abs(Cost);

if Error_Count < 0 then Error_Count = 0;

if Equipment_Utilization > 100 then Equipment_Utilization = 100;


Domain = propcase(strip(Domain));

Validation_Status = upcase(strip(Validation_Status));

Scientist_Name = propcase(Scientist_Name);

Experiment_Full = catx('-', Experiment_ID, Domain);

format Duration_Days 8.;

run;

proc print data=lab_clean;

run;

OUTPUT:

ObsExperiment_IDDomainValidation_StatusScientist_NameStart_DateEnd_DateCostSuccess_RateError_CountEquipment_UtilizationDuration_DaysExperiment_Full
1EXP001PharmaVALIDATEDDr.Ravi01JAN202510JAN202550000922859EXP001-Pharma
2EXP002BiotechPENDINGDr.Sita05JAN202510JAN202545000100378-3EXP002-Biotech
3EXP003ChemicalVALIDATEDDr.Khan12JAN202520JAN202520000881908EXP003-Chemical
4EXP004PharmaVALIDATEDDr.Ravi15JAN202525JAN2025600007508810EXP004-Pharma
5EXP005BiotechPENDINGDr.Meera20JAN202528JAN2025520006851008EXP005-Biotech
6EXP006ChemicalVALIDATEDDr.Ajay01FEB202510FEB202548000810709EXP006-Chemical
7EXP007PharmaVALIDATEDDr.Ravi05FEB202518FEB2025750009529213EXP007-Pharma
8EXP008BiotechPENDINGDr.Sita10FEB202515FEB202530000458655EXP008-Biotech
9EXP009ChemicalVALIDATEDDr.Khan12FEB202525FEB2025900009819713EXP009-Chemical
10EXP010PharmaVALIDATEDDr.Meera20FEB202501MAR2025100000990999EXP010-Pharma
11EXP011BiotechPENDINGDr.Sita22FEB202528FEB202555000604756EXP011-Biotech
12EXP012ChemicalVALIDATEDDr.Ajay01MAR202515MAR2025650008528314EXP012-Chemical
13EXP013PharmaPENDINGDr.Ravi05MAR202520MAR2025720008838915EXP013-Pharma
14EXP014BiotechVALIDATEDDr.Meera10MAR202518MAR202558000911828EXP014-Biotech
15EXP015ChemicalVALIDATEDDr.Khan12MAR202522MAR2025610007728410EXP015-Chemical

·  Date correction using INTNX

·  Duration calculation using INTCK

·  ABS() for negative cost

·  Logical boundary correction for percentages

·  Character normalization using STRIP and PROPCASE

·  STRIP() → removes extra spaces

·  PROPCASE( ) → proper capitalization

·  UPCASE( ) → standard format

·  LOWCASE( ) → lower conversion

·  CAT( ) / CATX( ) → combine strings

ABS ( ) : Converts negative to positive.

8. UTILIZATION CLASSIFICATION MACRO

%macro utilization_flag;

data lab_clean;

set lab_clean;

length Utilization_Level $10;

if Equipment_Utilization >= 90 then Utilization_Level="High";

else if Equipment_Utilization >= 75 then Utilization_Level="Medium";

else Utilization_Level="Low";

run;

proc print data=lab_clean;

run;

%mend;


%utilization_flag;

OUTPUT:

ObsExperiment_IDDomainValidation_StatusScientist_NameStart_DateEnd_DateCostSuccess_RateError_CountEquipment_UtilizationDuration_DaysExperiment_FullUtilization_Level
1EXP001PharmaVALIDATEDDr.Ravi01JAN202510JAN202550000922859EXP001-PharmaMedium
2EXP002BiotechPENDINGDr.Sita05JAN202510JAN202545000100378-3EXP002-BiotechMedium
3EXP003ChemicalVALIDATEDDr.Khan12JAN202520JAN202520000881908EXP003-ChemicalHigh
4EXP004PharmaVALIDATEDDr.Ravi15JAN202525JAN2025600007508810EXP004-PharmaMedium
5EXP005BiotechPENDINGDr.Meera20JAN202528JAN2025520006851008EXP005-BiotechHigh
6EXP006ChemicalVALIDATEDDr.Ajay01FEB202510FEB202548000810709EXP006-ChemicalLow
7EXP007PharmaVALIDATEDDr.Ravi05FEB202518FEB2025750009529213EXP007-PharmaHigh
8EXP008BiotechPENDINGDr.Sita10FEB202515FEB202530000458655EXP008-BiotechLow
9EXP009ChemicalVALIDATEDDr.Khan12FEB202525FEB2025900009819713EXP009-ChemicalHigh
10EXP010PharmaVALIDATEDDr.Meera20FEB202501MAR2025100000990999EXP010-PharmaHigh
11EXP011BiotechPENDINGDr.Sita22FEB202528FEB202555000604756EXP011-BiotechMedium
12EXP012ChemicalVALIDATEDDr.Ajay01MAR202515MAR2025650008528314EXP012-ChemicalMedium
13EXP013PharmaPENDINGDr.Ravi05MAR202520MAR2025720008838915EXP013-PharmaMedium
14EXP014BiotechVALIDATEDDr.Meera10MAR202518MAR202558000911828EXP014-BiotechMedium
15EXP015ChemicalVALIDATEDDr.Khan12MAR202522MAR2025610007728410EXP015-ChemicalMedium

·  High (>=90%)

·  Medium (75–89%)

·  Low (<75%)

Macro automates classification.

·  Code reuse

·  Scalability

·  Automation

·  Efficiency

9. FRAUD DETECTION MACRO

%macro fraud_check;

data fraud_check;

set lab_clean;

length Fraud_Flag $12;

if Success_Rate=100 and Error_Count>2 then Fraud_Flag="Suspicious";

else if Cost>90000 and Duration_Days<5 then Fraud_Flag="Suspicious";

else Fraud_Flag="Normal";

run;

proc print data=fraud_check;

run;

%mend;


%fraud_check;

OUTPUT:

ObsExperiment_IDDomainValidation_StatusScientist_NameStart_DateEnd_DateCostSuccess_RateError_CountEquipment_UtilizationDuration_DaysExperiment_FullUtilization_LevelFraud_Flag
1EXP001PharmaVALIDATEDDr.Ravi01JAN202510JAN202550000922859EXP001-PharmaMediumNormal
2EXP002BiotechPENDINGDr.Sita05JAN202510JAN202545000100378-3EXP002-BiotechMediumSuspicious
3EXP003ChemicalVALIDATEDDr.Khan12JAN202520JAN202520000881908EXP003-ChemicalHighNormal
4EXP004PharmaVALIDATEDDr.Ravi15JAN202525JAN2025600007508810EXP004-PharmaMediumNormal
5EXP005BiotechPENDINGDr.Meera20JAN202528JAN2025520006851008EXP005-BiotechHighNormal
6EXP006ChemicalVALIDATEDDr.Ajay01FEB202510FEB202548000810709EXP006-ChemicalLowNormal
7EXP007PharmaVALIDATEDDr.Ravi05FEB202518FEB2025750009529213EXP007-PharmaHighNormal
8EXP008BiotechPENDINGDr.Sita10FEB202515FEB202530000458655EXP008-BiotechLowNormal
9EXP009ChemicalVALIDATEDDr.Khan12FEB202525FEB2025900009819713EXP009-ChemicalHighNormal
10EXP010PharmaVALIDATEDDr.Meera20FEB202501MAR2025100000990999EXP010-PharmaHighNormal
11EXP011BiotechPENDINGDr.Sita22FEB202528FEB202555000604756EXP011-BiotechMediumNormal
12EXP012ChemicalVALIDATEDDr.Ajay01MAR202515MAR2025650008528314EXP012-ChemicalMediumNormal
13EXP013PharmaPENDINGDr.Ravi05MAR202520MAR2025720008838915EXP013-PharmaMediumNormal
14EXP014BiotechVALIDATEDDr.Meera10MAR202518MAR202558000911828EXP014-BiotechMediumNormal
15EXP015ChemicalVALIDATEDDr.Khan12MAR202522MAR2025610007728410EXP015-ChemicalMediumNormal

Fraud indicators used:

·  100% success with high errors

·  High cost with extremely short duration

These patterns are statistically suspicious.

10. PROC SQL ANALYSIS

proc sql;

create table domain_summary as

select Domain,

       count(*) as Total_Experiments,

       avg(Cost) as Avg_Cost,

       avg(Success_Rate) as Avg_Success

from fraud_check

group by Domain;

quit;

proc print data=domain_summary;

run;

OUTPUT:

ObsDomainTotal_ExperimentsAvg_CostAvg_Success
1Biotech54800072.8
2Chemical55680085.8
3Pharma57140089.8

SQL allows:

·  Aggregation

·  Filtering

·  Grouping

·  Joining

11. PROC MEANS

proc means data=fraud_check mean median min max;

   var Cost Success_Rate Duration_Days Error_Count;

run;

OUTPUT:

The MEANS Procedure

VariableMeanMedianMinimumMaximum
Cost
Success_Rate
Duration_Days
Error_Count
58733.33
82.8000000
8.9333333
2.2666667
58000.00
88.0000000
9.0000000
2.0000000
20000.00
45.0000000
-3.0000000
0
100000.00
100.0000000
15.0000000
8.0000000

Provides:

·  Mean

·  Median

·  Minimum

·  Maximum

·  Standard deviation

12. PROC UNIVARIATE

proc univariate data=fraud_check;

  var Cost;

  histogram Cost;

run;

OUTPUT:

The UNIVARIATE Procedure

Variable: Cost

Moments
N15Sum Weights15
Mean58733.3333Sum Observations881000
Std Deviation20585.9406Variance423780952
Skewness0.19892832Kurtosis0.48197115
Uncorrected SS5.7677E10Corrected SS5932933333
Coeff Variation35.0498422Std Error Mean5315.26702
Basic Statistical Measures
LocationVariability
Mean58733.33Std Deviation20586
Median58000.00Variance423780952
Mode.Range80000
  Interquartile Range24000
Tests for Location: Mu0=0
TestStatisticp Value
Student's tt11.04993Pr > |t|<.0001
SignM7.5Pr >= |M|<.0001
Signed RankS60Pr >= |S|<.0001
Quantiles (Definition 5)
LevelQuantile
100% Max100000
99%100000
95%100000
90%90000
75% Q372000
50% Median58000
25% Q148000
10%30000
5%20000
1%20000
0% Min20000
Extreme Observations
LowestHighest
ValueObsValueObs
2000036500012
3000087200013
450002750007
480006900009
50000110000010

The UNIVARIATE Procedure

Histogram for Cost

Gives:

·  Skewness

·  Kurtosis

·  Histogram

·  Normality tests

13. PROC FREQ

proc freq data=fraud_check;

   tables Domain*Fraud_Flag / chisq;

run;

OUTPUT:

The FREQ Procedure

Frequency
Percent
Row Pct
Col Pct
Table of Domain by Fraud_Flag
DomainFraud_Flag
NormalSuspiciousTotal
Biotech
4
26.67
80.00
28.57
1
6.67
20.00
100.00
5
33.33
 
 
Chemical
5
33.33
100.00
35.71
0
0.00
0.00
0.00
5
33.33
 
 
Pharma
5
33.33
100.00
35.71
0
0.00
0.00
0.00
5
33.33
 
 
Total
14
93.33
1
6.67
15
100.00

Statistics for Table of Domain by Fraud_Flag

StatisticDFValueProb
WARNING: 100% of the cells have expected counts less
than 5. Chi-Square may not be a valid test.
Chi-Square22.14290.3425
Likelihood Ratio Chi-Square22.34390.3098
Mantel-Haenszel Chi-Square11.50000.2207
Phi Coefficient 0.3780 
Contingency Coefficient 0.3536 
Cramer's V 0.3780 

Sample Size = 15


Used for:

·  Domain distribution

·  Fraud flag counts

·  Validation status breakdown

With CHISQ option:

We test statistical association between domain and fraud.

14. PROC CORR

proc corr data=fraud_check;

  var Cost Duration_Days Success_Rate Error_Count;

run;

OUTPUT:

The CORR Procedure

4 Variables:Cost Duration_Days Success_Rate Error_Count
Simple Statistics
VariableNMeanStd DevSumMinimumMaximum
Cost15587332058688100020000100000
Duration_Days158.933334.36654134.00000-3.0000015.00000
Success_Rate1582.8000015.61684124245.00000100.00000
Error_Count152.266672.1536234.0000008.00000
Pearson Correlation Coefficients, N = 15
Prob > |r| under H0: Rho=0
 CostDuration_DaysSuccess_RateError_Count
Cost
1.00000
 
0.50994
0.0521
0.48173
0.0690
-0.41234
0.1267
Duration_Days
0.50994
0.0521
1.00000
 
0.16005
0.5688
-0.32459
0.2379
Success_Rate
0.48173
0.0690
0.16005
0.5688
1.00000
 
-0.71401
0.0028
Error_Count
-0.41234
0.1267
-0.32459
0.2379
-0.71401
0.0028
1.00000
 

Checks linear relationships:

·  Duration vs Cost

·  Error_Count vs Success_Rate

If correlation is strong:

We understand dependency.

15. PROC SGPLOT

proc sgplot data=fraud_check;

  scatter x=Duration_Days y=Cost;

run;

OUTPUT:

The SGPlot Procedure

Scatter plot:

Duration vs Cost.

Visual pattern reveals:

·  Linear trend

·  Outliers

·  Clusters

Executives understand visuals faster than tables.

16. PROC TRANSPOSE

proc transpose data=fraud_check out=cost_transpose;

  var Cost Success_Rate;

  id Experiment_ID;

run;

proc print data=cost_transpose;

run;

OUTPUT:

Obs_NAME_EXP001EXP002EXP003EXP004EXP005EXP006EXP007EXP008EXP009EXP010EXP011EXP012EXP013EXP014EXP015
1Cost5000045000200006000052000480007500030000900001000005500065000720005800061000
2Success_Rate9210088756881954598996085889177

Converts rows to columns.

17. APPEND

data additional;

  set fraud_check(obs=3);

  Cost = Cost + 5000;

run;

proc print data=additional;

run;

OUTPUT:

ObsExperiment_IDDomainValidation_StatusScientist_NameStart_DateEnd_DateCostSuccess_RateError_CountEquipment_UtilizationDuration_DaysExperiment_FullUtilization_LevelFraud_Flag
1EXP001PharmaVALIDATEDDr.Ravi01JAN202510JAN202555000922859EXP001-PharmaMediumNormal
2EXP002BiotechPENDINGDr.Sita05JAN202510JAN202550000100378-3EXP002-BiotechMediumSuspicious
3EXP003ChemicalVALIDATEDDr.Khan12JAN202520JAN202525000881908EXP003-ChemicalHighNormal

proc append base=fraud_check 

            data=additional force;

run;

proc print data=fraud_check;

run;

OUTPUT:

ObsExperiment_IDDomainValidation_StatusScientist_NameStart_DateEnd_DateCostSuccess_RateError_CountEquipment_UtilizationDuration_DaysExperiment_FullUtilization_LevelFraud_Flag
1EXP001PharmaVALIDATEDDr.Ravi01JAN202510JAN202550000922859EXP001-PharmaMediumNormal
2EXP002BiotechPENDINGDr.Sita05JAN202510JAN202545000100378-3EXP002-BiotechMediumSuspicious
3EXP003ChemicalVALIDATEDDr.Khan12JAN202520JAN202520000881908EXP003-ChemicalHighNormal
4EXP004PharmaVALIDATEDDr.Ravi15JAN202525JAN2025600007508810EXP004-PharmaMediumNormal
5EXP005BiotechPENDINGDr.Meera20JAN202528JAN2025520006851008EXP005-BiotechHighNormal
6EXP006ChemicalVALIDATEDDr.Ajay01FEB202510FEB202548000810709EXP006-ChemicalLowNormal
7EXP007PharmaVALIDATEDDr.Ravi05FEB202518FEB2025750009529213EXP007-PharmaHighNormal
8EXP008BiotechPENDINGDr.Sita10FEB202515FEB202530000458655EXP008-BiotechLowNormal
9EXP009ChemicalVALIDATEDDr.Khan12FEB202525FEB2025900009819713EXP009-ChemicalHighNormal
10EXP010PharmaVALIDATEDDr.Meera20FEB202501MAR2025100000990999EXP010-PharmaHighNormal
11EXP011BiotechPENDINGDr.Sita22FEB202528FEB202555000604756EXP011-BiotechMediumNormal
12EXP012ChemicalVALIDATEDDr.Ajay01MAR202515MAR2025650008528314EXP012-ChemicalMediumNormal
13EXP013PharmaPENDINGDr.Ravi05MAR202520MAR2025720008838915EXP013-PharmaMediumNormal
14EXP014BiotechVALIDATEDDr.Meera10MAR202518MAR202558000911828EXP014-BiotechMediumNormal
15EXP015ChemicalVALIDATEDDr.Khan12MAR202522MAR2025610007728410EXP015-ChemicalMediumNormal
16EXP001PharmaVALIDATEDDr.Ravi01JAN202510JAN202555000922859EXP001-PharmaMediumNormal
17EXP002BiotechPENDINGDr.Sita05JAN202510JAN202550000100378-3EXP002-BiotechMediumSuspicious
18EXP003ChemicalVALIDATEDDr.Khan12JAN202520JAN202525000881908EXP003-ChemicalHighNormal

Adds new records to base dataset.

Used when incremental data arrives.

18. PROC DATASETS DELETE

proc datasets lib=work;

  delete lab_clean fraud_check ;

run;

LOG:

NOTE: Deleting WORK.LAB_CLEAN (memtype=DATA).
NOTE: Deleting WORK.FRAUD_CHECK (memtype=DATA).

Deletes unwanted datasets.

Keeps WORK library clean.

Important for:

·  Memory optimization

·  Preventing dataset confusion

·  Maintaining reproducibility

19. 5 Key Points About the Project

1.    Comprehensive Data Cleaning & Validation
Implemented logical checks for dates, percentages, negative values, and utilization limits to ensure data integrity.

2.    Automation Using Macros
Built reusable macros for utilization classification and fraud detection, improving scalability and efficiency.

3.    Advanced Statistical Analysis
Applied PROC MEANS, PROC UNIVARIATE, and PROC CORR to understand distribution, variability, and relationships between cost, duration, and success rate.

4.    Business Intelligence via PROC SQL
Generated domain-level summaries and performance metrics to support management decision-making.

5.    Fraud & Risk Analytics Framework
Designed rule-based detection logic to flag suspicious experiments based on cost, duration, and error patterns.

20. Project Summary

Laboratory experiment performance monitoring, data validation, utilization optimization, and fraud analytics using structured SAS programming.

21. Conclusion

This project demonstrates a complete analytical lifecycle for laboratory experiment performance monitoring using SAS. Starting from raw data containing intentional logical errors, we applied systematic validation techniques to correct inconsistencies in dates, percentages, costs, and error counts. Through structured DATA step processing, SQL aggregation, statistical procedures, and macro automation, the dataset was transformed into a reliable analytical model.

The utilization classification macro provided operational insights into equipment efficiency, while the fraud detection logic identified suspicious experiment patterns based on abnormal success rates and cost-duration inconsistencies. Statistical procedures such as PROC MEANS, PROC UNIVARIATE, and PROC CORR enabled deeper understanding of cost distribution and variable relationships.

Overall, the project reflects real-world industry practices in pharmaceutical and research environments, combining data quality control, operational intelligence, and risk analytics into a single integrated SAS-based framework suitable for audit, regulatory review, and strategic decision-making.

 

SAS INTERVIEW QUESTIONS

1.Why use INTCK vs INTNX?

2.How do macros help operational classification?

3.How do you detect data quality issues in SAS?


--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

About the Author:

SAS Learning Hub is a data analytics and SAS programming platform focused on clinical, financial, and real-world data analysis. The content is created by professionals with academic training in Pharmaceutics and hands-on experience in Base SAS, PROC SQL, Macros, SDTM, and ADaM, providing practical and industry-relevant SAS learning resources.


Disclaimer:

The datasets and analysis in this article are created for educational and demonstration purposes only. They do not represent LABORATORY data.


Our Mission:

This blog provides industry-focused SAS programming tutorials and analytics projects covering finance, healthcare, and technology.


This project is suitable for:

·  Students learning SAS

·  Data analysts building portfolios

·  Professionals preparing for SAS interviews

·  Bloggers writing about analytics and smart cities

·  Clinical SAS Programmer

·  Research Data Analyst

·  Quality Control Officer

·  Regulatory Data Validator

·  Lab Operations Analyst

--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Follow Us On : 


 
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

--->Follow our blog for more SAS-based analytics projects and industry data models.

---> Support Us By Following Our Blog..

To deepen your understanding of SAS analytics, please refer to our other data science and industry-focused projects listed below:

About Us | Contact Privacy Policy

Comments

Popular posts from this blog

397.If a satellite has excellent signal strength but very high latency, can it still deliver good quality communication? Why or why not?A Sas Study

401.How Efficient Are Global Data Centers? A Complete SAS Analytics Study

383.Which Pharma Distributors Are High-Risk? Can SAS Detect Delays, Temperature Violations, and Fraud?