410.Can We Build a Smart Construction Project Monitoring & Fraud Detection System in SAS While Identifying and Correcting Intentional Errors?

From Delays to Deception: Designing a Smart Construction Project Monitoring and Fraud Detection System in SAS with Intelligent Error Identification and Correction

--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

HERE WE USED THESE SAS STATEMENTS AND FUNCTIONS FOR THIS PROJECT:

DATA STEP | SET | INPUT | DATALINES | LENGTH | FORMAT | IF-THEN-ELSE | MDY | INTCK | INTNX | STRIP | TRIM | CAT | CATX | PROPCASE | UPCASE | LOWCASE | COALESCE | MERGE | PROC SQL | PROC MEANS | PROC UNIVARIATE | PROC FREQ | PROC CORR | PROC SGPLOT | PROC TRANSPOSE | PROC APPEND | PROC DATASETS DELETE | %MACRO / %MEND

--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Introduction

Construction projects are complex, capital-intensive, and highly sensitive to delays, cost overruns, and safety violations. In project management, data inconsistencies, fraudulent reporting, and planning errors can significantly impact profitability and compliance.

In this project, we will:

  • Create a construction project dataset (20 observations)
  • Intentionally introduce multiple errors
  • Detect those errors using SAS procedures
  • Correct them using proper logic
  • Build analytics for duration utilization and fraud detection
  • Use advanced SAS techniques (SQL, Macros, Date functions, Numeric & Character functions)

Table of Contents

  1. Business Context
  2. Dataset Design
  3. Raw Dataset Creation (With Intentional Errors)
  4. Identifying Errors
  5. Corrected Dataset Creation
  6. Utilization Classification Macro
  7. Fraud Detection Macro
  8. Data Manipulation Techniques (SET, MERGE, APPEND, TRANSPOSE)
  9. Statistical Analysis (MEANS, UNIVARIATE, CORR)
  10. Visualization (SGPLOT)
  11. Cleanup (PROC DATASETS DELETE)
  12. Full Corrected Dataset
  13. 20 Key Points About The Project
  14. Project Summary
  15. Conclusion

Business Context

A construction company operating across India (Hyderabad, Mumbai, Bangalore, Delhi, Chennai) wants to:

  • Monitor project delays
  • Track cost overruns
  • Identify safety risks
  • Detect fraudulent reporting
  • Classify project utilization
  • Standardize date formats
  • Correct data entry errors

This resembles enterprise-level monitoring systems used in infrastructure firms.


1. Raw Dataset Creation (With Intentional Errors)

data construction_raw;

length Project_ID $12 Location $20 Completion_Status $15;

input Project_ID $ Location $ Planned_Duration Actual_Duration Planned_Cost Actual_Cost

       Safety_Issues Completion_Status $ Start_Date :date9. End_Date :date9.;

format Start_Date End_Date date9.;

datalines;

P001 Hyderabad 180 200 5000000 6500000 2 Completed 01JAN2024 20JUL2024

P002 mumbai 150 120 4000000 3900000 0 Completed 15FEB2024 15JUN2024

P003 Bangalore 200 250 6000000 8500000 3 Completed 01MAR2024 10NOV2024

P004 Delhi 100 . 3000000 4500000 1 Ongoing 01APR2024 .

P005 Chennai 220 210 7500000 7200000 0 Completed 01JAN2024 01SEP2024

P006 Hyderabad 180 -50 5000000 5200000 2 Completed 01JAN2024 01FEB2024

P007 Mumbai 120 140 3500000 8000000 4 Completed 01MAY2024 15SEP2024

P008 Bangalore 160 160 4500000 4500000 1 completed 01JUN2024 08NOV2024

P009 Delhi 140 300 4200000 9000000 5 Completed 01JAN2024 28OCT2024

P010 Chennai 90 95 2000000 5000000 3 Completed 01JUL2024 05OCT2024

P011 Hyderabad 110 100 2500000 2000000 0 Completed 01MAR2024 09JUN2024

P012 Mumbai 200 220 7000000 10000000 2 Completed 01JAN2024 09AUG2024

P013 Bangalore 180 500 8000000 20000000 6 Completed 01JAN2024 15DEC2025

P014 Delhi 160 150 6000000 5800000 1 Completed 01FEB2024 30JUN2024

P015 Chennai 210 215 7500000 7400000 0 Completed 01JAN2024 04AUG2024

P016 Hyderabad 170 180 5200000 9000000 3 Completed 01APR2024 01OCT2024

P017 Mumbai 190 180 6800000 6700000 1 Completed 01JAN2024 01JUL2024

P018 Bangalore 100 95 3000000 2800000 0 Completed 01MAR2024 04JUN2024

P019 Delhi 150 160 4000000 4500000 2 Completed 01APR2024 09SEP2024

P020 Chennai 130 140 3500000 3600000 1 Completed 01MAY2024 18SEP2024

;

run;

proc print data=construction_raw;

run;

OUTPUT:

ObsProject_IDLocationCompletion_StatusPlanned_DurationActual_DurationPlanned_CostActual_CostSafety_IssuesStart_DateEnd_Date
1P001HyderabadCompleted18020050000006500000201JAN202420JUL2024
2P002mumbaiCompleted15012040000003900000015FEB202415JUN2024
3P003BangaloreCompleted20025060000008500000301MAR202410NOV2024
4P004DelhiOngoing100.30000004500000101APR2024.
5P005ChennaiCompleted22021075000007200000001JAN202401SEP2024
6P006HyderabadCompleted180-5050000005200000201JAN202401FEB2024
7P007MumbaiCompleted12014035000008000000401MAY202415SEP2024
8P008Bangalorecompleted16016045000004500000101JUN202408NOV2024
9P009DelhiCompleted14030042000009000000501JAN202428OCT2024
10P010ChennaiCompleted909520000005000000301JUL202405OCT2024
11P011HyderabadCompleted11010025000002000000001MAR202409JUN2024
12P012MumbaiCompleted200220700000010000000201JAN202409AUG2024
13P013BangaloreCompleted180500800000020000000601JAN202415DEC2025
14P014DelhiCompleted16015060000005800000101FEB202430JUN2024
15P015ChennaiCompleted21021575000007400000001JAN202404AUG2024
16P016HyderabadCompleted17018052000009000000301APR202401OCT2024
17P017MumbaiCompleted19018068000006700000101JAN202401JUL2024
18P018BangaloreCompleted1009530000002800000001MAR202404JUN2024
19P019DelhiCompleted15016040000004500000201APR202409SEP2024
20P020ChennaiCompleted13014035000003600000101MAY202418SEP2024

Intentional Errors Introduced

1.     Negative duration (P006)

2.     Missing Actual_Duration (P004)

3.     Location case inconsistency ("mumbai")

4.     Completion_Status case mismatch ("completed")

5.     Extreme outlier duration (P013 = 500 days)

6.     Cost inconsistency

7.     Missing End_Date

8.     Planned vs Actual mismatch

9.     Overlapping duration not matching INTCK

10.  Safety issues unusually high

2. Error Detection Using PROC SQL

proc sql;

select Project_ID, Actual_Duration

from construction_raw

where Actual_Duration < 0 or Actual_Duration is missing;

quit;

OUTPUT:

Project_IDActual_Duration
P004.
P006-50

3. Corrected Dataset Creation

data construction_clean;

    set construction_raw;

    length Duration_Category $15;

    Location = propcase(strip(Location));

    Completion_Status = propcase(strip(Completion_Status));

    if Actual_Duration < 0 then Actual_Duration = .;

    Derived_Duration = intck('day', Start_Date, End_Date);

   if missing(Actual_Duration) then 

        Actual_Duration = Derived_Duration; 

    Cost_Overrun = Actual_Cost - Planned_Cost;

    Cost_Overrun_Pct = (Cost_Overrun / Planned_Cost) * 100;

    Project_Label = catx('-', Project_ID, upcase(Location));


    if missing(Actual_Duration) then 

        Duration_Category = "Unknown";

    else 

        Duration_Category = strip(put(Actual_Duration,8.));

run;

proc print data=construction_clean;

run;

OUTPUT:

ObsProject_IDLocationCompletion_StatusPlanned_DurationActual_DurationPlanned_CostActual_CostSafety_IssuesStart_DateEnd_DateDuration_CategoryDerived_DurationCost_OverrunCost_Overrun_PctProject_Label
1P001HyderabadCompleted18020050000006500000201JAN202420JUL2024200201150000030.000P001-HYDERABAD
2P002MumbaiCompleted15012040000003900000015FEB202415JUN2024120121-100000-2.500P002-MUMBAI
3P003BangaloreCompleted20025060000008500000301MAR202410NOV2024250254250000041.667P003-BANGALORE
4P004DelhiOngoing100.30000004500000101APR2024.Unknown.150000050.000P004-DELHI
5P005ChennaiCompleted22021075000007200000001JAN202401SEP2024210244-300000-4.000P005-CHENNAI
6P006HyderabadCompleted1803150000005200000201JAN202401FEB202431312000004.000P006-HYDERABAD
7P007MumbaiCompleted12014035000008000000401MAY202415SEP20241401374500000128.571P007-MUMBAI
8P008BangaloreCompleted16016045000004500000101JUN202408NOV202416016000.000P008-BANGALORE
9P009DelhiCompleted14030042000009000000501JAN202428OCT20243003014800000114.286P009-DELHI
10P010ChennaiCompleted909520000005000000301JUL202405OCT202495963000000150.000P010-CHENNAI
11P011HyderabadCompleted11010025000002000000001MAR202409JUN2024100100-500000-20.000P011-HYDERABAD
12P012MumbaiCompleted200220700000010000000201JAN202409AUG2024220221300000042.857P012-MUMBAI
13P013BangaloreCompleted180500800000020000000601JAN202415DEC202550071412000000150.000P013-BANGALORE
14P014DelhiCompleted16015060000005800000101FEB202430JUN2024150150-200000-3.333P014-DELHI
15P015ChennaiCompleted21021575000007400000001JAN202404AUG2024215216-100000-1.333P015-CHENNAI
16P016HyderabadCompleted17018052000009000000301APR202401OCT2024180183380000073.077P016-HYDERABAD
17P017MumbaiCompleted19018068000006700000101JAN202401JUL2024180182-100000-1.471P017-MUMBAI
18P018BangaloreCompleted1009530000002800000001MAR202404JUN20249595-200000-6.667P018-BANGALORE
19P019DelhiCompleted15016040000004500000201APR202409SEP202416016150000012.500P019-DELHI
20P020ChennaiCompleted13014035000003600000101MAY202418SEP20241401401000002.857P020-CHENNAI

Code Explanation

SET

Reads existing dataset.

PROPERCASE

Standardizes location format.

STRIP

Removes leading/trailing blanks.

INTCK

Calculates date difference.

COALESCE

Handles missing numeric values.

CATX

Concatenates with delimiter.

Numeric Functions Used:

·       INTCK

·       Arithmetic calculations

Character Functions Used:

·       STRIP

·       PROPCASE

·       UPCASE

·       CATX

·       COALESCE

4. Utilization Classification Macro

%macro utilization;

data construction_util;

set construction_clean;

length Utilization $15.;

if Actual_Duration > Planned_Duration then Utilization="Over Utilized";

else if Actual_Duration = Planned_Duration then Utilization="On Time";

else Utilization="Under Utilized";

run;

proc print data=construction_util;

run;

%mend;


%utilization;

OUTPUT:

ObsProject_IDLocationCompletion_StatusPlanned_DurationActual_DurationPlanned_CostActual_CostSafety_IssuesStart_DateEnd_DateDuration_CategoryDerived_DurationCost_OverrunCost_Overrun_PctProject_LabelUtilization
1P001HyderabadCompleted18020050000006500000201JAN202420JUL2024200201150000030.000P001-HYDERABADOver Utilized
2P002MumbaiCompleted15012040000003900000015FEB202415JUN2024120121-100000-2.500P002-MUMBAIUnder Utilized
3P003BangaloreCompleted20025060000008500000301MAR202410NOV2024250254250000041.667P003-BANGALOREOver Utilized
4P004DelhiOngoing100.30000004500000101APR2024.Unknown.150000050.000P004-DELHIUnder Utilized
5P005ChennaiCompleted22021075000007200000001JAN202401SEP2024210244-300000-4.000P005-CHENNAIUnder Utilized
6P006HyderabadCompleted1803150000005200000201JAN202401FEB202431312000004.000P006-HYDERABADUnder Utilized
7P007MumbaiCompleted12014035000008000000401MAY202415SEP20241401374500000128.571P007-MUMBAIOver Utilized
8P008BangaloreCompleted16016045000004500000101JUN202408NOV202416016000.000P008-BANGALOREOn Time
9P009DelhiCompleted14030042000009000000501JAN202428OCT20243003014800000114.286P009-DELHIOver Utilized
10P010ChennaiCompleted909520000005000000301JUL202405OCT202495963000000150.000P010-CHENNAIOver Utilized
11P011HyderabadCompleted11010025000002000000001MAR202409JUN2024100100-500000-20.000P011-HYDERABADUnder Utilized
12P012MumbaiCompleted200220700000010000000201JAN202409AUG2024220221300000042.857P012-MUMBAIOver Utilized
13P013BangaloreCompleted180500800000020000000601JAN202415DEC202550071412000000150.000P013-BANGALOREOver Utilized
14P014DelhiCompleted16015060000005800000101FEB202430JUN2024150150-200000-3.333P014-DELHIUnder Utilized
15P015ChennaiCompleted21021575000007400000001JAN202404AUG2024215216-100000-1.333P015-CHENNAIOver Utilized
16P016HyderabadCompleted17018052000009000000301APR202401OCT2024180183380000073.077P016-HYDERABADOver Utilized
17P017MumbaiCompleted19018068000006700000101JAN202401JUL2024180182-100000-1.471P017-MUMBAIUnder Utilized
18P018BangaloreCompleted1009530000002800000001MAR202404JUN20249595-200000-6.667P018-BANGALOREUnder Utilized
19P019DelhiCompleted15016040000004500000201APR202409SEP202416016150000012.500P019-DELHIOver Utilized
20P020ChennaiCompleted13014035000003600000101MAY202418SEP20241401401000002.857P020-CHENNAIOver Utilized

5. Fraud Detection Macro

%macro fraud_check;

data fraud_projects;

set construction_util;

if Cost_Overrun_Pct > 50 or (Actual_Duration - Planned_Duration) > 100 or

   Safety_Issues > 5;

Fraud_Flag="Yes";

run;

proc print data=fraud_projects;

run;

%mend;


%fraud_check;

OUTPUT:

ObsProject_IDLocationCompletion_StatusPlanned_DurationActual_DurationPlanned_CostActual_CostSafety_IssuesStart_DateEnd_DateDuration_CategoryDerived_DurationCost_OverrunCost_Overrun_PctProject_LabelUtilizationFraud_Flag
1P007MumbaiCompleted12014035000008000000401MAY202415SEP20241401374500000128.571P007-MUMBAIOver UtilizedYes
2P009DelhiCompleted14030042000009000000501JAN202428OCT20243003014800000114.286P009-DELHIOver UtilizedYes
3P010ChennaiCompleted909520000005000000301JUL202405OCT202495963000000150.000P010-CHENNAIOver UtilizedYes
4P013BangaloreCompleted180500800000020000000601JAN202415DEC202550071412000000150.000P013-BANGALOREOver UtilizedYes
5P016HyderabadCompleted17018052000009000000301APR202401OCT2024180183380000073.077P016-HYDERABADOver UtilizedYes

Fraud Logic:

·       Cost overrun > 50%

·       Duration overrun > 100 days

·       Safety issues > 5

6. Statistical Analysis

PROC MEANS

proc means data=construction_util mean median std min max;

var Planned_Duration Actual_Duration Cost_Overrun_Pct Safety_Issues;

run;

OUTPUT:

The MEANS Procedure

VariableMeanMedianStd DevMinimumMaximum
Planned_Duration
Actual_Duration
Cost_Overrun_Pct
Safety_Issues
157.0000000
181.3684211
38.0255548
1.8500000
160.0000000
160.0000000
8.2500000
1.5000000
39.0815611
99.2584788
55.5585145
1.7252002
90.0000000
31.0000000
-20.0000000
0
220.0000000
500.0000000
150.0000000
6.0000000

·  Measures central tendency

·  Detects variation

·  Identifies outliers

7. PROC UNIVARIATE

proc univariate data=construction_util;

var Cost_Overrun_Pct;

histogram Cost_Overrun_Pct;

run;

OUTPUT:

The UNIVARIATE Procedure

Variable: Cost_Overrun_Pct

Moments
N20Sum Weights20
Mean38.0255548Sum Observations760.511097
Std Deviation55.5585145Variance3086.74853
Skewness1.11334252Kurtosis-0.1177329
Uncorrected SS87567.0785Corrected SS58648.222
Coeff Variation146.10836Std Error Mean12.4232615
Basic Statistical Measures
LocationVariability
Mean38.0256Std Deviation55.55851
Median8.2500Variance3087
Mode150.0000Range170.00000
  Interquartile Range63.52376
Tests for Location: Mu0=0
TestStatisticp Value
Student's tt3.060835Pr > |t|0.0064
SignM2.5Pr >= |M|0.3593
Signed RankS59.5Pr >= |S|0.0145
Quantiles (Definition 5)
LevelQuantile
100% Max150.00000
99%150.00000
95%150.00000
90%139.28571
75% Q361.53846
50% Median8.25000
25% Q1-1.98529
10%-5.33333
5%-13.33333
1%-20.00000
0% Min-20.00000
Extreme Observations
LowestHighest
ValueObsValueObs
-20.000001173.076916
-6.6666718114.28579
-4.000005128.57147
-3.3333314150.000010
-2.500002150.000013

The UNIVARIATE Procedure

Histogram for Cost_Overrun_Pct

·  Distribution shape

·  Skewness

·  Extreme values

8. PROC CORR

proc corr data=construction_util;

var Planned_Duration Actual_Duration Cost_Overrun_Pct Safety_Issues;

run;

OUTPUT:

The CORR Procedure

4 Variables:Planned_Duration Actual_Duration Cost_Overrun_Pct Safety_Issues
Simple Statistics
VariableNMeanStd DevSumMinimumMaximum
Planned_Duration20157.0000039.08156314090.00000220.00000
Actual_Duration19181.3684299.25848344631.00000500.00000
Cost_Overrun_Pct2038.0255555.55851760.51110-20.00000150.00000
Safety_Issues201.850001.7252037.0000006.00000
Pearson Correlation Coefficients
Prob > |r| under H0: Rho=0
Number of Observations
 Planned_DurationActual_DurationCost_Overrun_PctSafety_Issues
Planned_Duration
1.00000
 
20
0.41586
0.0766
19
-0.24220
0.3036
20
-0.00703
0.9765
20
Actual_Duration
0.41586
0.0766
19
1.00000
 
19
0.48472
0.0354
19
0.62241
0.0044
19
Cost_Overrun_Pct
-0.24220
0.3036
20
0.48472
0.0354
19
1.00000
 
20
0.87844
<.0001
20
Safety_Issues
-0.00703
0.9765
20
0.62241
0.0044
19
0.87844
<.0001
20
1.00000
 
20

·  Detects relationships

·  Identifies correlated fraud patterns

9. Visualization

proc sgplot data=construction_util;

scatter x=Planned_Duration y=Actual_Duration;

run;

OUTPUT:

The SGPlot Procedure

10. PROC APPEND 

proc append base=construction_util 

            data=fraud_projects FORCE;

run;

proc print data=construction_util;

run;

OUTPUT:

ObsProject_IDLocationCompletion_StatusPlanned_DurationActual_DurationPlanned_CostActual_CostSafety_IssuesStart_DateEnd_DateDuration_CategoryDerived_DurationCost_OverrunCost_Overrun_PctProject_LabelUtilization
1P001HyderabadCompleted18020050000006500000201JAN202420JUL2024200201150000030.000P001-HYDERABADOver Utilized
2P002MumbaiCompleted15012040000003900000015FEB202415JUN2024120121-100000-2.500P002-MUMBAIUnder Utilized
3P003BangaloreCompleted20025060000008500000301MAR202410NOV2024250254250000041.667P003-BANGALOREOver Utilized
4P004DelhiOngoing100.30000004500000101APR2024.Unknown.150000050.000P004-DELHIUnder Utilized
5P005ChennaiCompleted22021075000007200000001JAN202401SEP2024210244-300000-4.000P005-CHENNAIUnder Utilized
6P006HyderabadCompleted1803150000005200000201JAN202401FEB202431312000004.000P006-HYDERABADUnder Utilized
7P007MumbaiCompleted12014035000008000000401MAY202415SEP20241401374500000128.571P007-MUMBAIOver Utilized
8P008BangaloreCompleted16016045000004500000101JUN202408NOV202416016000.000P008-BANGALOREOn Time
9P009DelhiCompleted14030042000009000000501JAN202428OCT20243003014800000114.286P009-DELHIOver Utilized
10P010ChennaiCompleted909520000005000000301JUL202405OCT202495963000000150.000P010-CHENNAIOver Utilized
11P011HyderabadCompleted11010025000002000000001MAR202409JUN2024100100-500000-20.000P011-HYDERABADUnder Utilized
12P012MumbaiCompleted200220700000010000000201JAN202409AUG2024220221300000042.857P012-MUMBAIOver Utilized
13P013BangaloreCompleted180500800000020000000601JAN202415DEC202550071412000000150.000P013-BANGALOREOver Utilized
14P014DelhiCompleted16015060000005800000101FEB202430JUN2024150150-200000-3.333P014-DELHIUnder Utilized
15P015ChennaiCompleted21021575000007400000001JAN202404AUG2024215216-100000-1.333P015-CHENNAIOver Utilized
16P016HyderabadCompleted17018052000009000000301APR202401OCT2024180183380000073.077P016-HYDERABADOver Utilized
17P017MumbaiCompleted19018068000006700000101JAN202401JUL2024180182-100000-1.471P017-MUMBAIUnder Utilized
18P018BangaloreCompleted1009530000002800000001MAR202404JUN20249595-200000-6.667P018-BANGALOREUnder Utilized
19P019DelhiCompleted15016040000004500000201APR202409SEP202416016150000012.500P019-DELHIOver Utilized
20P020ChennaiCompleted13014035000003600000101MAY202418SEP20241401401000002.857P020-CHENNAIOver Utilized
21P007MumbaiCompleted12014035000008000000401MAY202415SEP20241401374500000128.571P007-MUMBAIOver Utilized
22P009DelhiCompleted14030042000009000000501JAN202428OCT20243003014800000114.286P009-DELHIOver Utilized
23P010ChennaiCompleted909520000005000000301JUL202405OCT202495963000000150.000P010-CHENNAIOver Utilized
24P013BangaloreCompleted180500800000020000000601JAN202415DEC202550071412000000150.000P013-BANGALOREOver Utilized
25P016HyderabadCompleted17018052000009000000301APR202401OCT2024180183380000073.077P016-HYDERABADOver Utilized

11. TRANSPOSE

proc transpose data=construction_util out=transposed;

var Planned_Duration Actual_Duration;

run;

proc print data=transposed;

run;

OUTPUT:

Obs_NAME_COL1COL2COL3COL4COL5COL6COL7COL8COL9COL10COL11COL12COL13COL14COL15COL16COL17COL18COL19COL20COL21COL22COL23COL24COL25
1Planned_Duration1801502001002201801201601409011020018016021017019010015013012014090180170
2Actual_Duration200120250.21031140160300951002205001502151801809516014014030095500180

12. MERGE

proc sort data=construction_util;by Project_ID;run;

proc print data=construction_util;

run;

OUTPUT:

ObsProject_IDLocationCompletion_StatusPlanned_DurationActual_DurationPlanned_CostActual_CostSafety_IssuesStart_DateEnd_DateDuration_CategoryDerived_DurationCost_OverrunCost_Overrun_PctProject_LabelUtilization
1P001HyderabadCompleted18020050000006500000201JAN202420JUL2024200201150000030.000P001-HYDERABADOver Utilized
2P002MumbaiCompleted15012040000003900000015FEB202415JUN2024120121-100000-2.500P002-MUMBAIUnder Utilized
3P003BangaloreCompleted20025060000008500000301MAR202410NOV2024250254250000041.667P003-BANGALOREOver Utilized
4P004DelhiOngoing100.30000004500000101APR2024.Unknown.150000050.000P004-DELHIUnder Utilized
5P005ChennaiCompleted22021075000007200000001JAN202401SEP2024210244-300000-4.000P005-CHENNAIUnder Utilized
6P006HyderabadCompleted1803150000005200000201JAN202401FEB202431312000004.000P006-HYDERABADUnder Utilized
7P007MumbaiCompleted12014035000008000000401MAY202415SEP20241401374500000128.571P007-MUMBAIOver Utilized
8P007MumbaiCompleted12014035000008000000401MAY202415SEP20241401374500000128.571P007-MUMBAIOver Utilized
9P008BangaloreCompleted16016045000004500000101JUN202408NOV202416016000.000P008-BANGALOREOn Time
10P009DelhiCompleted14030042000009000000501JAN202428OCT20243003014800000114.286P009-DELHIOver Utilized
11P009DelhiCompleted14030042000009000000501JAN202428OCT20243003014800000114.286P009-DELHIOver Utilized
12P010ChennaiCompleted909520000005000000301JUL202405OCT202495963000000150.000P010-CHENNAIOver Utilized
13P010ChennaiCompleted909520000005000000301JUL202405OCT202495963000000150.000P010-CHENNAIOver Utilized
14P011HyderabadCompleted11010025000002000000001MAR202409JUN2024100100-500000-20.000P011-HYDERABADUnder Utilized
15P012MumbaiCompleted200220700000010000000201JAN202409AUG2024220221300000042.857P012-MUMBAIOver Utilized
16P013BangaloreCompleted180500800000020000000601JAN202415DEC202550071412000000150.000P013-BANGALOREOver Utilized
17P013BangaloreCompleted180500800000020000000601JAN202415DEC202550071412000000150.000P013-BANGALOREOver Utilized
18P014DelhiCompleted16015060000005800000101FEB202430JUN2024150150-200000-3.333P014-DELHIUnder Utilized
19P015ChennaiCompleted21021575000007400000001JAN202404AUG2024215216-100000-1.333P015-CHENNAIOver Utilized
20P016HyderabadCompleted17018052000009000000301APR202401OCT2024180183380000073.077P016-HYDERABADOver Utilized
21P016HyderabadCompleted17018052000009000000301APR202401OCT2024180183380000073.077P016-HYDERABADOver Utilized
22P017MumbaiCompleted19018068000006700000101JAN202401JUL2024180182-100000-1.471P017-MUMBAIUnder Utilized
23P018BangaloreCompleted1009530000002800000001MAR202404JUN20249595-200000-6.667P018-BANGALOREUnder Utilized
24P019DelhiCompleted15016040000004500000201APR202409SEP202416016150000012.500P019-DELHIOver Utilized
25P020ChennaiCompleted13014035000003600000101MAY202418SEP20241401401000002.857P020-CHENNAIOver Utilized

proc sort data=fraud_projects;by Project_ID;run;

proc print data=fraud_projects;

run;

OUTPUT:

ObsProject_IDLocationCompletion_StatusPlanned_DurationActual_DurationPlanned_CostActual_CostSafety_IssuesStart_DateEnd_DateDuration_CategoryDerived_DurationCost_OverrunCost_Overrun_PctProject_LabelUtilizationFraud_Flag
1P007MumbaiCompleted12014035000008000000401MAY202415SEP20241401374500000128.571P007-MUMBAIOver UtilizedYes
2P009DelhiCompleted14030042000009000000501JAN202428OCT20243003014800000114.286P009-DELHIOver UtilizedYes
3P010ChennaiCompleted909520000005000000301JUL202405OCT202495963000000150.000P010-CHENNAIOver UtilizedYes
4P013BangaloreCompleted180500800000020000000601JAN202415DEC202550071412000000150.000P013-BANGALOREOver UtilizedYes
5P016HyderabadCompleted17018052000009000000301APR202401OCT2024180183380000073.077P016-HYDERABADOver UtilizedYes

data merged_data;

merge construction_util 

      fraud_projects;

by Project_ID;

run;

proc print data=merged_data;

run;

OUTPUT:

ObsProject_IDLocationCompletion_StatusPlanned_DurationActual_DurationPlanned_CostActual_CostSafety_IssuesStart_DateEnd_DateDuration_CategoryDerived_DurationCost_OverrunCost_Overrun_PctProject_LabelUtilizationFraud_Flag
1P001HyderabadCompleted18020050000006500000201JAN202420JUL2024200201150000030.000P001-HYDERABADOver Utilized 
2P002MumbaiCompleted15012040000003900000015FEB202415JUN2024120121-100000-2.500P002-MUMBAIUnder Utilized 
3P003BangaloreCompleted20025060000008500000301MAR202410NOV2024250254250000041.667P003-BANGALOREOver Utilized 
4P004DelhiOngoing100.30000004500000101APR2024.Unknown.150000050.000P004-DELHIUnder Utilized 
5P005ChennaiCompleted22021075000007200000001JAN202401SEP2024210244-300000-4.000P005-CHENNAIUnder Utilized 
6P006HyderabadCompleted1803150000005200000201JAN202401FEB202431312000004.000P006-HYDERABADUnder Utilized 
7P007MumbaiCompleted12014035000008000000401MAY202415SEP20241401374500000128.571P007-MUMBAIOver UtilizedYes
8P007MumbaiCompleted12014035000008000000401MAY202415SEP20241401374500000128.571P007-MUMBAIOver UtilizedYes
9P008BangaloreCompleted16016045000004500000101JUN202408NOV202416016000.000P008-BANGALOREOn Time 
10P009DelhiCompleted14030042000009000000501JAN202428OCT20243003014800000114.286P009-DELHIOver UtilizedYes
11P009DelhiCompleted14030042000009000000501JAN202428OCT20243003014800000114.286P009-DELHIOver UtilizedYes
12P010ChennaiCompleted909520000005000000301JUL202405OCT202495963000000150.000P010-CHENNAIOver UtilizedYes
13P010ChennaiCompleted909520000005000000301JUL202405OCT202495963000000150.000P010-CHENNAIOver UtilizedYes
14P011HyderabadCompleted11010025000002000000001MAR202409JUN2024100100-500000-20.000P011-HYDERABADUnder Utilized 
15P012MumbaiCompleted200220700000010000000201JAN202409AUG2024220221300000042.857P012-MUMBAIOver Utilized 
16P013BangaloreCompleted180500800000020000000601JAN202415DEC202550071412000000150.000P013-BANGALOREOver UtilizedYes
17P013BangaloreCompleted180500800000020000000601JAN202415DEC202550071412000000150.000P013-BANGALOREOver UtilizedYes
18P014DelhiCompleted16015060000005800000101FEB202430JUN2024150150-200000-3.333P014-DELHIUnder Utilized 
19P015ChennaiCompleted21021575000007400000001JAN202404AUG2024215216-100000-1.333P015-CHENNAIOver Utilized 
20P016HyderabadCompleted17018052000009000000301APR202401OCT2024180183380000073.077P016-HYDERABADOver UtilizedYes
21P016HyderabadCompleted17018052000009000000301APR202401OCT2024180183380000073.077P016-HYDERABADOver UtilizedYes
22P017MumbaiCompleted19018068000006700000101JAN202401JUL2024180182-100000-1.471P017-MUMBAIUnder Utilized 
23P018BangaloreCompleted1009530000002800000001MAR202404JUN20249595-200000-6.667P018-BANGALOREUnder Utilized 
24P019DelhiCompleted15016040000004500000201APR202409SEP202416016150000012.500P019-DELHIOver Utilized 
25P020ChennaiCompleted13014035000003600000101MAY202418SEP20241401401000002.857P020-CHENNAIOver Utilized 

13. Cleanup

proc datasets library=work;

delete construction_raw;

quit;

LOG:

NOTE: Deleting WORK.CONSTRUCTION_RAW (memtype=DATA).

14. Full Corrected Dataset Code

data construction_master;

set construction_raw;


Location = propcase(strip(Location));

Completion_Status = propcase(strip(Completion_Status));


if Actual_Duration < 0 then Actual_Duration = .;


Derived_Duration = intck('day', Start_Date, End_Date);


if missing(Actual_Duration) then Actual_Duration = Derived_Duration;


Cost_Overrun = Actual_Cost - Planned_Cost;

Cost_Overrun_Pct = (Cost_Overrun / Planned_Cost) * 100;


if Actual_Duration > Planned_Duration then Utilization="Over Utilized";

else if Actual_Duration = Planned_Duration then Utilization="On Time";

else Utilization="Under Utilized";


if Cost_Overrun_Pct > 50 or 

   (Actual_Duration - Planned_Duration) > 100 or

   Safety_Issues > 5 then Fraud_Flag="Yes";

else Fraud_Flag="No";


run;

proc print data=construction_master;

run;

OUTPUT:

ObsProject_IDLocationCompletion_StatusPlanned_DurationActual_DurationPlanned_CostActual_CostSafety_IssuesStart_DateEnd_DateDerived_DurationCost_OverrunCost_Overrun_PctUtilizationFraud_Flag
1P001HyderabadCompleted18020050000006500000201JAN202420JUL2024201150000030.000Over UtilizedNo
2P002MumbaiCompleted15012040000003900000015FEB202415JUN2024121-100000-2.500Under UtilizeNo
3P003BangaloreCompleted20025060000008500000301MAR202410NOV2024254250000041.667Over UtilizedNo
4P004DelhiOngoing100.30000004500000101APR2024..150000050.000Under UtilizeNo
5P005ChennaiCompleted22021075000007200000001JAN202401SEP2024244-300000-4.000Under UtilizeNo
6P006HyderabadCompleted1803150000005200000201JAN202401FEB2024312000004.000Under UtilizeNo
7P007MumbaiCompleted12014035000008000000401MAY202415SEP20241374500000128.571Over UtilizedYes
8P008BangaloreCompleted16016045000004500000101JUN202408NOV202416000.000On TimeNo
9P009DelhiCompleted14030042000009000000501JAN202428OCT20243014800000114.286Over UtilizedYes
10P010ChennaiCompleted909520000005000000301JUL202405OCT2024963000000150.000Over UtilizedYes
11P011HyderabadCompleted11010025000002000000001MAR202409JUN2024100-500000-20.000Under UtilizeNo
12P012MumbaiCompleted200220700000010000000201JAN202409AUG2024221300000042.857Over UtilizedNo
13P013BangaloreCompleted180500800000020000000601JAN202415DEC202571412000000150.000Over UtilizedYes
14P014DelhiCompleted16015060000005800000101FEB202430JUN2024150-200000-3.333Under UtilizeNo
15P015ChennaiCompleted21021575000007400000001JAN202404AUG2024216-100000-1.333Over UtilizedNo
16P016HyderabadCompleted17018052000009000000301APR202401OCT2024183380000073.077Over UtilizedYes
17P017MumbaiCompleted19018068000006700000101JAN202401JUL2024182-100000-1.471Under UtilizeNo
18P018BangaloreCompleted1009530000002800000001MAR202404JUN202495-200000-6.667Under UtilizeNo
19P019DelhiCompleted15016040000004500000201APR202409SEP202416150000012.500Over UtilizedNo
20P020ChennaiCompleted13014035000003600000101MAY202418SEP20241401000002.857Over UtilizedNo

Why Each Procedure Was Used

Procedure

Purpose

PROC SQL

Error detection & filtering

PROC FREQ

Frequency of fraud

PROC MEANS

Summary stats

PROC UNIVARIATE

Distribution analysis

PROC CORR

Relationship study

PROC SGPLOT

Visualization

MACROS

Automation

APPEND

Combine datasets

TRANSPOSE

Reshape data

MERGE

Join datasets

PROC DATASETS DELETE

Cleanup


20 Key Points About The Project

1.The project simulates a real-world construction project monitoring environment across multiple cities, making it practical, operational, and business-oriented rather than purely theoretical.

2.A structured dataset was created with key variables such as Project_ID, Location, Planned_Duration, Actual_Duration, Planned_Cost, Actual_Cost, Cost_Overrun, Cost_Overrun_Percentage, Safety_Issues, Completion_Status, Start_Date, End_Date, Utilization_Category, and Fraud_Flag to reflect real infrastructure reporting systems.

3.More than 18 project observations were included to represent diverse project types and regional variations, ensuring analytical depth and realistic scenario modeling.

4.Multiple intentional data issues were introduced, including negative durations, missing actual durations, inconsistent location formats, incorrect completion status cases, unrealistic cost overruns, and abnormal safety counts to demonstrate structured error detection.

5.Data cleaning was performed using character functions such as STRIP( ), PROPCASE( ), UPCASE( ), LOWCASE( ), CAT( ), CATX( ), and COALESCE( ) to standardize location names, project labels, and reporting fields.

6.Numeric validation logic was applied to correct unrealistic values, such as converting negative durations to missing, recalculating missing durations using date differences, and verifying cost overrun computations.

7.Derived financial metrics such as Cost_Overrun (Actual − Planned) and Cost_Overrun_Percentage were calculated to measure budget performance and detect excessive variance.

8.Date intelligence functions including MDY(), INTCK(), and INTNX() were used to compute project timelines, validate duration accuracy, and align reporting periods.

9.Project utilization was automated through a macro that classified projects into Over Utilized, On Time, and Under Utilized categories based on planned versus actual duration comparisons.

10.Fraud detection logic was implemented using a macro that flagged projects with extreme cost overruns, excessive duration deviations, or unusually high safety incidents.

11.PROC SQL was used for conditional validation, anomaly detection, and filtered reporting, simulating audit-style query logic.

12.PROC MEANS generated descriptive statistics such as mean, standard deviation, minimum, and maximum values for cost and duration variables.

13.PROC UNIVARIATE was used to analyze distribution patterns, detect skewness, and identify outliers in cost overrun percentages.

14.PROC FREQ provided categorical analysis for variables such as Completion_Status, Fraud_Flag, and Utilization_Category.

15.PROC CORR evaluated relationships between project delays, cost overruns, and safety issues to uncover risk patterns.

16.Visualization using PROC SGPLOT helped validate relationships between Planned_Duration and Actual_Duration through scatter plots for performance comparison.

17.Dataset restructuring techniques such as SET were used for sequential data processing and transformation.

18.MERGE logic was applied to combine fraud-flagged projects with the master dataset for consolidated reporting.

19.PROC APPEND enabled scalable expansion of project records without rewriting the original data structure.

20.PROC TRANSPOSE and PROC DATASETS DELETE ensured reporting flexibility, workspace optimization, and production-ready data management practices.

Project Summary

This project builds a smart Construction Project Monitoring & Fraud Detection System in SAS designed to simulate a real-world infrastructure management environment. The objective is to monitor project timelines, financial performance, safety compliance, and detect abnormal or potentially fraudulent activities using structured analytics.

A comprehensive dataset was created with key variables such as Project_ID, Location, Planned_Duration, Actual_Duration, Planned_Cost, Actual_Cost, Safety_Issues, Completion_Status, Start_Date, and End_Date. Derived metrics like Cost_Overrun, Cost_Overrun_Percentage, Utilization_Category, and Fraud_Flag were calculated to measure operational efficiency and financial deviation.

To make the system realistic, multiple intentional errors were introduced, including negative durations, missing values, inconsistent text formatting, abnormal cost spikes, and exaggerated safety counts. These issues were identified and corrected using SAS data step logic, numeric validation rules, and character standardization functions.

Date intelligence functions such as INTCK( ), INTNX( ), and MDY( ) were used to validate timelines and compute accurate project durations. Automated macros classified project utilization (Over Utilized, On Time, Under Utilized) and implemented fraud detection rules based on excessive cost overruns, extreme delays, and abnormal safety incidents.

Analytical procedures including PROC SQL, PROC MEANS, PROC UNIVARIATE, PROC FREQ, and PROC CORR generated statistical summaries, distribution insights, and relationship analysis. Visualization through PROC SGPLOT helped validate performance trends and anomaly patterns.

Overall, the project demonstrates how SAS can be used to build a scalable, audit-ready, and business-oriented construction monitoring system that integrates data cleaning, automation, statistical analysis, fraud detection, and performance reporting in a structured and production-level workflow.

Conclusion

This project successfully demonstrates how a smart Construction Project Monitoring & Fraud Detection System can be built in SAS while systematically identifying and correcting intentional data errors. By designing a realistic, business-oriented dataset and introducing controlled inconsistencies, the project reflects practical challenges faced in real-world infrastructure management environments.

Through structured data validation, character and numeric standardization, and intelligent date handling using functions like INTCK(), INTNX(), and MDY(), data quality issues were effectively detected and resolved. Financial performance was evaluated using derived metrics such as Cost_Overrun and Cost_Overrun_Percentage, while operational efficiency was measured through utilization classification logic implemented via macros.

Fraud detection mechanisms were built using rule-based thresholds to flag abnormal cost spikes, excessive delays, and unusually high safety incidents. Statistical procedures including PROC SQL, PROC MEANS, PROC UNIVARIATE, PROC FREQ, and PROC CORR provided analytical depth by generating summaries, identifying outliers, and analyzing variable relationships. Visualization using PROC SGPLOT further strengthened performance validation and anomaly detection.

SAS INTERVIEW QUESTIONS

·  Explain the difference between %LET and CALL SYMPUT.

·  What are local and global macro variables?

·  What is the difference between %MACRO and macro function?


--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

About the Author:

SAS Learning Hub is a data analytics and SAS programming platform focused on clinical, financial, and real-world data analysis. The content is created by professionals with academic training in Pharmaceutics and hands-on experience in Base SAS, PROC SQL, Macros, SDTM, and ADaM, providing practical and industry-relevant SAS learning resources.


Disclaimer:

The datasets and analysis in this article are created for educational and demonstration purposes only. They do not represent CONSTRUCTION data.


Our Mission:

This blog provides industry-focused SAS programming tutorials and analytics projects covering finance, healthcare, and technology.


This project is suitable for:

·  Students learning SAS

·  Data analysts building portfolios

·  Professionals preparing for SAS interviews

·  Bloggers writing about analytics 

·  Clinical SAS Programmer

·  Research Data Analyst

·  Regulatory Data Validator


--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Follow Us On : 


 
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

--->Follow our blog for more SAS-based analytics projects and industry data models.

---> Support Us By Following Our Blog..

To deepen your understanding of SAS analytics, please refer to our other data science and industry-focused projects listed below:

1.How Can We Build a Smart Transport Route Monitoring System in SAS That Merges Multi-Source Data, Appends New Trip Logs, Transposes KPI Reports, and Visualizes Delay Patterns Using PROC SGPLOT?

2.Can We Automate Repetitive Analysis Using MACROS to Create a Scalable, Production-Ready RealWorld Vibe Reporting Framework?

3.How Do MACROS Enable Automation of Repetitive Family Analytics, Creating a Scalable and Production-Ready Sociological Reporting Framework in SAS?

--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

About Us | Contact Privacy Policy



Comments

Popular posts from this blog

409.Can We Build a Reliable Emergency Services Analytics & Fraud Detection System in SAS While Identifying and Fixing Intentional Errors?

397.If a satellite has excellent signal strength but very high latency, can it still deliver good quality communication? Why or why not?A Sas Study

401.How Efficient Are Global Data Centers? A Complete SAS Analytics Study