410.Can We Build a Smart Construction Project Monitoring & Fraud Detection System in SAS While Identifying and Correcting Intentional Errors?
From Delays to Deception: Designing a Smart Construction Project Monitoring and Fraud Detection System in SAS with Intelligent Error Identification and Correction
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
HERE WE USED THESE SAS STATEMENTS AND FUNCTIONS FOR THIS PROJECT:
DATA STEP | SET | INPUT | DATALINES | LENGTH | FORMAT | IF-THEN-ELSE | MDY | INTCK | INTNX | STRIP | TRIM | CAT | CATX | PROPCASE | UPCASE | LOWCASE | COALESCE | MERGE | PROC SQL | PROC MEANS | PROC UNIVARIATE | PROC FREQ | PROC CORR | PROC SGPLOT | PROC TRANSPOSE | PROC APPEND | PROC DATASETS DELETE | %MACRO / %MEND
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Introduction
Construction
projects are complex, capital-intensive, and highly sensitive to delays, cost
overruns, and safety violations. In project management, data inconsistencies,
fraudulent reporting, and planning errors can significantly impact profitability
and compliance.
In this
project, we will:
- Create a construction
project dataset (20 observations)
- Intentionally introduce
multiple errors
- Detect those errors using
SAS procedures
- Correct them using proper
logic
- Build analytics for duration
utilization and fraud detection
- Use advanced SAS techniques
(SQL, Macros, Date functions, Numeric & Character functions)
Table of Contents
- Business Context
- Dataset Design
- Raw Dataset Creation (With
Intentional Errors)
- Identifying Errors
- Corrected Dataset Creation
- Utilization Classification
Macro
- Fraud Detection Macro
- Data Manipulation Techniques
(SET, MERGE, APPEND, TRANSPOSE)
- Statistical Analysis (MEANS,
UNIVARIATE, CORR)
- Visualization (SGPLOT)
- Cleanup (PROC DATASETS
DELETE)
- Full Corrected Dataset
- 20 Key Points About The Project
- Project Summary
- Conclusion
Business Context
A
construction company operating across India (Hyderabad, Mumbai, Bangalore,
Delhi, Chennai) wants to:
- Monitor project delays
- Track cost overruns
- Identify safety risks
- Detect fraudulent reporting
- Classify project utilization
- Standardize date formats
- Correct data entry errors
This
resembles enterprise-level monitoring systems used in infrastructure firms.
1. Raw Dataset Creation (With Intentional Errors)
data construction_raw;
length Project_ID $12 Location $20 Completion_Status $15;
input Project_ID $ Location $ Planned_Duration Actual_Duration Planned_Cost Actual_Cost
Safety_Issues Completion_Status $ Start_Date :date9. End_Date :date9.;
format Start_Date End_Date date9.;
datalines;
P001 Hyderabad 180 200 5000000 6500000 2 Completed 01JAN2024 20JUL2024
P002 mumbai 150 120 4000000 3900000 0 Completed 15FEB2024 15JUN2024
P003 Bangalore 200 250 6000000 8500000 3 Completed 01MAR2024 10NOV2024
P004 Delhi 100 . 3000000 4500000 1 Ongoing 01APR2024 .
P005 Chennai 220 210 7500000 7200000 0 Completed 01JAN2024 01SEP2024
P006 Hyderabad 180 -50 5000000 5200000 2 Completed 01JAN2024 01FEB2024
P007 Mumbai 120 140 3500000 8000000 4 Completed 01MAY2024 15SEP2024
P008 Bangalore 160 160 4500000 4500000 1 completed 01JUN2024 08NOV2024
P009 Delhi 140 300 4200000 9000000 5 Completed 01JAN2024 28OCT2024
P010 Chennai 90 95 2000000 5000000 3 Completed 01JUL2024 05OCT2024
P011 Hyderabad 110 100 2500000 2000000 0 Completed 01MAR2024 09JUN2024
P012 Mumbai 200 220 7000000 10000000 2 Completed 01JAN2024 09AUG2024
P013 Bangalore 180 500 8000000 20000000 6 Completed 01JAN2024 15DEC2025
P014 Delhi 160 150 6000000 5800000 1 Completed 01FEB2024 30JUN2024
P015 Chennai 210 215 7500000 7400000 0 Completed 01JAN2024 04AUG2024
P016 Hyderabad 170 180 5200000 9000000 3 Completed 01APR2024 01OCT2024
P017 Mumbai 190 180 6800000 6700000 1 Completed 01JAN2024 01JUL2024
P018 Bangalore 100 95 3000000 2800000 0 Completed 01MAR2024 04JUN2024
P019 Delhi 150 160 4000000 4500000 2 Completed 01APR2024 09SEP2024
P020 Chennai 130 140 3500000 3600000 1 Completed 01MAY2024 18SEP2024
;
run;
proc print data=construction_raw;
run;
OUTPUT:
| Obs | Project_ID | Location | Completion_Status | Planned_Duration | Actual_Duration | Planned_Cost | Actual_Cost | Safety_Issues | Start_Date | End_Date |
|---|---|---|---|---|---|---|---|---|---|---|
| 1 | P001 | Hyderabad | Completed | 180 | 200 | 5000000 | 6500000 | 2 | 01JAN2024 | 20JUL2024 |
| 2 | P002 | mumbai | Completed | 150 | 120 | 4000000 | 3900000 | 0 | 15FEB2024 | 15JUN2024 |
| 3 | P003 | Bangalore | Completed | 200 | 250 | 6000000 | 8500000 | 3 | 01MAR2024 | 10NOV2024 |
| 4 | P004 | Delhi | Ongoing | 100 | . | 3000000 | 4500000 | 1 | 01APR2024 | . |
| 5 | P005 | Chennai | Completed | 220 | 210 | 7500000 | 7200000 | 0 | 01JAN2024 | 01SEP2024 |
| 6 | P006 | Hyderabad | Completed | 180 | -50 | 5000000 | 5200000 | 2 | 01JAN2024 | 01FEB2024 |
| 7 | P007 | Mumbai | Completed | 120 | 140 | 3500000 | 8000000 | 4 | 01MAY2024 | 15SEP2024 |
| 8 | P008 | Bangalore | completed | 160 | 160 | 4500000 | 4500000 | 1 | 01JUN2024 | 08NOV2024 |
| 9 | P009 | Delhi | Completed | 140 | 300 | 4200000 | 9000000 | 5 | 01JAN2024 | 28OCT2024 |
| 10 | P010 | Chennai | Completed | 90 | 95 | 2000000 | 5000000 | 3 | 01JUL2024 | 05OCT2024 |
| 11 | P011 | Hyderabad | Completed | 110 | 100 | 2500000 | 2000000 | 0 | 01MAR2024 | 09JUN2024 |
| 12 | P012 | Mumbai | Completed | 200 | 220 | 7000000 | 10000000 | 2 | 01JAN2024 | 09AUG2024 |
| 13 | P013 | Bangalore | Completed | 180 | 500 | 8000000 | 20000000 | 6 | 01JAN2024 | 15DEC2025 |
| 14 | P014 | Delhi | Completed | 160 | 150 | 6000000 | 5800000 | 1 | 01FEB2024 | 30JUN2024 |
| 15 | P015 | Chennai | Completed | 210 | 215 | 7500000 | 7400000 | 0 | 01JAN2024 | 04AUG2024 |
| 16 | P016 | Hyderabad | Completed | 170 | 180 | 5200000 | 9000000 | 3 | 01APR2024 | 01OCT2024 |
| 17 | P017 | Mumbai | Completed | 190 | 180 | 6800000 | 6700000 | 1 | 01JAN2024 | 01JUL2024 |
| 18 | P018 | Bangalore | Completed | 100 | 95 | 3000000 | 2800000 | 0 | 01MAR2024 | 04JUN2024 |
| 19 | P019 | Delhi | Completed | 150 | 160 | 4000000 | 4500000 | 2 | 01APR2024 | 09SEP2024 |
| 20 | P020 | Chennai | Completed | 130 | 140 | 3500000 | 3600000 | 1 | 01MAY2024 | 18SEP2024 |
Intentional Errors Introduced
1.
Negative duration (P006)
2.
Missing Actual_Duration (P004)
3.
Location case inconsistency ("mumbai")
4.
Completion_Status case mismatch ("completed")
5.
Extreme outlier duration (P013 = 500 days)
6.
Cost inconsistency
7.
Missing End_Date
8.
Planned vs Actual mismatch
9.
Overlapping duration not matching INTCK
10. Safety
issues unusually high
2. Error Detection Using PROC SQL
proc sql;
select Project_ID, Actual_Duration
from construction_raw
where Actual_Duration < 0 or Actual_Duration is missing;
quit;
OUTPUT:
| Project_ID | Actual_Duration |
|---|---|
| P004 | . |
| P006 | -50 |
3. Corrected Dataset Creation
data construction_clean;
set construction_raw;
length Duration_Category $15;
Location = propcase(strip(Location));
Completion_Status = propcase(strip(Completion_Status));
if Actual_Duration < 0 then Actual_Duration = .;
Derived_Duration = intck('day', Start_Date, End_Date);
if missing(Actual_Duration) then
Actual_Duration = Derived_Duration;
Cost_Overrun = Actual_Cost - Planned_Cost;
Cost_Overrun_Pct = (Cost_Overrun / Planned_Cost) * 100;
Project_Label = catx('-', Project_ID, upcase(Location));
if missing(Actual_Duration) then
Duration_Category = "Unknown";
else
Duration_Category = strip(put(Actual_Duration,8.));
run;
proc print data=construction_clean;
run;
OUTPUT:
| Obs | Project_ID | Location | Completion_Status | Planned_Duration | Actual_Duration | Planned_Cost | Actual_Cost | Safety_Issues | Start_Date | End_Date | Duration_Category | Derived_Duration | Cost_Overrun | Cost_Overrun_Pct | Project_Label |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | P001 | Hyderabad | Completed | 180 | 200 | 5000000 | 6500000 | 2 | 01JAN2024 | 20JUL2024 | 200 | 201 | 1500000 | 30.000 | P001-HYDERABAD |
| 2 | P002 | Mumbai | Completed | 150 | 120 | 4000000 | 3900000 | 0 | 15FEB2024 | 15JUN2024 | 120 | 121 | -100000 | -2.500 | P002-MUMBAI |
| 3 | P003 | Bangalore | Completed | 200 | 250 | 6000000 | 8500000 | 3 | 01MAR2024 | 10NOV2024 | 250 | 254 | 2500000 | 41.667 | P003-BANGALORE |
| 4 | P004 | Delhi | Ongoing | 100 | . | 3000000 | 4500000 | 1 | 01APR2024 | . | Unknown | . | 1500000 | 50.000 | P004-DELHI |
| 5 | P005 | Chennai | Completed | 220 | 210 | 7500000 | 7200000 | 0 | 01JAN2024 | 01SEP2024 | 210 | 244 | -300000 | -4.000 | P005-CHENNAI |
| 6 | P006 | Hyderabad | Completed | 180 | 31 | 5000000 | 5200000 | 2 | 01JAN2024 | 01FEB2024 | 31 | 31 | 200000 | 4.000 | P006-HYDERABAD |
| 7 | P007 | Mumbai | Completed | 120 | 140 | 3500000 | 8000000 | 4 | 01MAY2024 | 15SEP2024 | 140 | 137 | 4500000 | 128.571 | P007-MUMBAI |
| 8 | P008 | Bangalore | Completed | 160 | 160 | 4500000 | 4500000 | 1 | 01JUN2024 | 08NOV2024 | 160 | 160 | 0 | 0.000 | P008-BANGALORE |
| 9 | P009 | Delhi | Completed | 140 | 300 | 4200000 | 9000000 | 5 | 01JAN2024 | 28OCT2024 | 300 | 301 | 4800000 | 114.286 | P009-DELHI |
| 10 | P010 | Chennai | Completed | 90 | 95 | 2000000 | 5000000 | 3 | 01JUL2024 | 05OCT2024 | 95 | 96 | 3000000 | 150.000 | P010-CHENNAI |
| 11 | P011 | Hyderabad | Completed | 110 | 100 | 2500000 | 2000000 | 0 | 01MAR2024 | 09JUN2024 | 100 | 100 | -500000 | -20.000 | P011-HYDERABAD |
| 12 | P012 | Mumbai | Completed | 200 | 220 | 7000000 | 10000000 | 2 | 01JAN2024 | 09AUG2024 | 220 | 221 | 3000000 | 42.857 | P012-MUMBAI |
| 13 | P013 | Bangalore | Completed | 180 | 500 | 8000000 | 20000000 | 6 | 01JAN2024 | 15DEC2025 | 500 | 714 | 12000000 | 150.000 | P013-BANGALORE |
| 14 | P014 | Delhi | Completed | 160 | 150 | 6000000 | 5800000 | 1 | 01FEB2024 | 30JUN2024 | 150 | 150 | -200000 | -3.333 | P014-DELHI |
| 15 | P015 | Chennai | Completed | 210 | 215 | 7500000 | 7400000 | 0 | 01JAN2024 | 04AUG2024 | 215 | 216 | -100000 | -1.333 | P015-CHENNAI |
| 16 | P016 | Hyderabad | Completed | 170 | 180 | 5200000 | 9000000 | 3 | 01APR2024 | 01OCT2024 | 180 | 183 | 3800000 | 73.077 | P016-HYDERABAD |
| 17 | P017 | Mumbai | Completed | 190 | 180 | 6800000 | 6700000 | 1 | 01JAN2024 | 01JUL2024 | 180 | 182 | -100000 | -1.471 | P017-MUMBAI |
| 18 | P018 | Bangalore | Completed | 100 | 95 | 3000000 | 2800000 | 0 | 01MAR2024 | 04JUN2024 | 95 | 95 | -200000 | -6.667 | P018-BANGALORE |
| 19 | P019 | Delhi | Completed | 150 | 160 | 4000000 | 4500000 | 2 | 01APR2024 | 09SEP2024 | 160 | 161 | 500000 | 12.500 | P019-DELHI |
| 20 | P020 | Chennai | Completed | 130 | 140 | 3500000 | 3600000 | 1 | 01MAY2024 | 18SEP2024 | 140 | 140 | 100000 | 2.857 | P020-CHENNAI |
Code Explanation
SET
Reads existing dataset.
PROPERCASE
Standardizes location format.
STRIP
Removes leading/trailing blanks.
INTCK
Calculates date difference.
COALESCE
Handles missing numeric values.
CATX
Concatenates with delimiter.
Numeric Functions Used:
·
INTCK
·
Arithmetic calculations
Character Functions Used:
·
STRIP
·
PROPCASE
·
UPCASE
·
CATX
·
COALESCE
4. Utilization Classification Macro
%macro utilization;
data construction_util;
set construction_clean;
length Utilization $15.;
if Actual_Duration > Planned_Duration then Utilization="Over Utilized";
else if Actual_Duration = Planned_Duration then Utilization="On Time";
else Utilization="Under Utilized";
run;
proc print data=construction_util;
run;
%mend;
%utilization;
OUTPUT:
| Obs | Project_ID | Location | Completion_Status | Planned_Duration | Actual_Duration | Planned_Cost | Actual_Cost | Safety_Issues | Start_Date | End_Date | Duration_Category | Derived_Duration | Cost_Overrun | Cost_Overrun_Pct | Project_Label | Utilization |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | P001 | Hyderabad | Completed | 180 | 200 | 5000000 | 6500000 | 2 | 01JAN2024 | 20JUL2024 | 200 | 201 | 1500000 | 30.000 | P001-HYDERABAD | Over Utilized |
| 2 | P002 | Mumbai | Completed | 150 | 120 | 4000000 | 3900000 | 0 | 15FEB2024 | 15JUN2024 | 120 | 121 | -100000 | -2.500 | P002-MUMBAI | Under Utilized |
| 3 | P003 | Bangalore | Completed | 200 | 250 | 6000000 | 8500000 | 3 | 01MAR2024 | 10NOV2024 | 250 | 254 | 2500000 | 41.667 | P003-BANGALORE | Over Utilized |
| 4 | P004 | Delhi | Ongoing | 100 | . | 3000000 | 4500000 | 1 | 01APR2024 | . | Unknown | . | 1500000 | 50.000 | P004-DELHI | Under Utilized |
| 5 | P005 | Chennai | Completed | 220 | 210 | 7500000 | 7200000 | 0 | 01JAN2024 | 01SEP2024 | 210 | 244 | -300000 | -4.000 | P005-CHENNAI | Under Utilized |
| 6 | P006 | Hyderabad | Completed | 180 | 31 | 5000000 | 5200000 | 2 | 01JAN2024 | 01FEB2024 | 31 | 31 | 200000 | 4.000 | P006-HYDERABAD | Under Utilized |
| 7 | P007 | Mumbai | Completed | 120 | 140 | 3500000 | 8000000 | 4 | 01MAY2024 | 15SEP2024 | 140 | 137 | 4500000 | 128.571 | P007-MUMBAI | Over Utilized |
| 8 | P008 | Bangalore | Completed | 160 | 160 | 4500000 | 4500000 | 1 | 01JUN2024 | 08NOV2024 | 160 | 160 | 0 | 0.000 | P008-BANGALORE | On Time |
| 9 | P009 | Delhi | Completed | 140 | 300 | 4200000 | 9000000 | 5 | 01JAN2024 | 28OCT2024 | 300 | 301 | 4800000 | 114.286 | P009-DELHI | Over Utilized |
| 10 | P010 | Chennai | Completed | 90 | 95 | 2000000 | 5000000 | 3 | 01JUL2024 | 05OCT2024 | 95 | 96 | 3000000 | 150.000 | P010-CHENNAI | Over Utilized |
| 11 | P011 | Hyderabad | Completed | 110 | 100 | 2500000 | 2000000 | 0 | 01MAR2024 | 09JUN2024 | 100 | 100 | -500000 | -20.000 | P011-HYDERABAD | Under Utilized |
| 12 | P012 | Mumbai | Completed | 200 | 220 | 7000000 | 10000000 | 2 | 01JAN2024 | 09AUG2024 | 220 | 221 | 3000000 | 42.857 | P012-MUMBAI | Over Utilized |
| 13 | P013 | Bangalore | Completed | 180 | 500 | 8000000 | 20000000 | 6 | 01JAN2024 | 15DEC2025 | 500 | 714 | 12000000 | 150.000 | P013-BANGALORE | Over Utilized |
| 14 | P014 | Delhi | Completed | 160 | 150 | 6000000 | 5800000 | 1 | 01FEB2024 | 30JUN2024 | 150 | 150 | -200000 | -3.333 | P014-DELHI | Under Utilized |
| 15 | P015 | Chennai | Completed | 210 | 215 | 7500000 | 7400000 | 0 | 01JAN2024 | 04AUG2024 | 215 | 216 | -100000 | -1.333 | P015-CHENNAI | Over Utilized |
| 16 | P016 | Hyderabad | Completed | 170 | 180 | 5200000 | 9000000 | 3 | 01APR2024 | 01OCT2024 | 180 | 183 | 3800000 | 73.077 | P016-HYDERABAD | Over Utilized |
| 17 | P017 | Mumbai | Completed | 190 | 180 | 6800000 | 6700000 | 1 | 01JAN2024 | 01JUL2024 | 180 | 182 | -100000 | -1.471 | P017-MUMBAI | Under Utilized |
| 18 | P018 | Bangalore | Completed | 100 | 95 | 3000000 | 2800000 | 0 | 01MAR2024 | 04JUN2024 | 95 | 95 | -200000 | -6.667 | P018-BANGALORE | Under Utilized |
| 19 | P019 | Delhi | Completed | 150 | 160 | 4000000 | 4500000 | 2 | 01APR2024 | 09SEP2024 | 160 | 161 | 500000 | 12.500 | P019-DELHI | Over Utilized |
| 20 | P020 | Chennai | Completed | 130 | 140 | 3500000 | 3600000 | 1 | 01MAY2024 | 18SEP2024 | 140 | 140 | 100000 | 2.857 | P020-CHENNAI | Over Utilized |
5. Fraud Detection Macro
%macro fraud_check;
data fraud_projects;
set construction_util;
if Cost_Overrun_Pct > 50 or (Actual_Duration - Planned_Duration) > 100 or
Safety_Issues > 5;
Fraud_Flag="Yes";
run;
proc print data=fraud_projects;
run;
%mend;
%fraud_check;
OUTPUT:
| Obs | Project_ID | Location | Completion_Status | Planned_Duration | Actual_Duration | Planned_Cost | Actual_Cost | Safety_Issues | Start_Date | End_Date | Duration_Category | Derived_Duration | Cost_Overrun | Cost_Overrun_Pct | Project_Label | Utilization | Fraud_Flag |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | P007 | Mumbai | Completed | 120 | 140 | 3500000 | 8000000 | 4 | 01MAY2024 | 15SEP2024 | 140 | 137 | 4500000 | 128.571 | P007-MUMBAI | Over Utilized | Yes |
| 2 | P009 | Delhi | Completed | 140 | 300 | 4200000 | 9000000 | 5 | 01JAN2024 | 28OCT2024 | 300 | 301 | 4800000 | 114.286 | P009-DELHI | Over Utilized | Yes |
| 3 | P010 | Chennai | Completed | 90 | 95 | 2000000 | 5000000 | 3 | 01JUL2024 | 05OCT2024 | 95 | 96 | 3000000 | 150.000 | P010-CHENNAI | Over Utilized | Yes |
| 4 | P013 | Bangalore | Completed | 180 | 500 | 8000000 | 20000000 | 6 | 01JAN2024 | 15DEC2025 | 500 | 714 | 12000000 | 150.000 | P013-BANGALORE | Over Utilized | Yes |
| 5 | P016 | Hyderabad | Completed | 170 | 180 | 5200000 | 9000000 | 3 | 01APR2024 | 01OCT2024 | 180 | 183 | 3800000 | 73.077 | P016-HYDERABAD | Over Utilized | Yes |
Fraud Logic:
·
Cost overrun > 50%
·
Duration overrun > 100 days
·
Safety issues > 5
6. Statistical Analysis
PROC MEANS
proc means data=construction_util mean median std min max;
var Planned_Duration Actual_Duration Cost_Overrun_Pct Safety_Issues;
run;
OUTPUT:
The MEANS Procedure
| Variable | Mean | Median | Std Dev | Minimum | Maximum |
|---|---|---|---|---|---|
Planned_Duration Actual_Duration Cost_Overrun_Pct Safety_Issues | 157.0000000 181.3684211 38.0255548 1.8500000 | 160.0000000 160.0000000 8.2500000 1.5000000 | 39.0815611 99.2584788 55.5585145 1.7252002 | 90.0000000 31.0000000 -20.0000000 0 | 220.0000000 500.0000000 150.0000000 6.0000000 |
·
Measures central tendency
·
Detects variation
·
Identifies outliers
7. PROC UNIVARIATE
proc univariate data=construction_util;
var Cost_Overrun_Pct;
histogram Cost_Overrun_Pct;
run;
OUTPUT:
The UNIVARIATE Procedure
Variable: Cost_Overrun_Pct
| Moments | |||
|---|---|---|---|
| N | 20 | Sum Weights | 20 |
| Mean | 38.0255548 | Sum Observations | 760.511097 |
| Std Deviation | 55.5585145 | Variance | 3086.74853 |
| Skewness | 1.11334252 | Kurtosis | -0.1177329 |
| Uncorrected SS | 87567.0785 | Corrected SS | 58648.222 |
| Coeff Variation | 146.10836 | Std Error Mean | 12.4232615 |
| Basic Statistical Measures | |||
|---|---|---|---|
| Location | Variability | ||
| Mean | 38.0256 | Std Deviation | 55.55851 |
| Median | 8.2500 | Variance | 3087 |
| Mode | 150.0000 | Range | 170.00000 |
| Interquartile Range | 63.52376 | ||
| Tests for Location: Mu0=0 | ||||
|---|---|---|---|---|
| Test | Statistic | p Value | ||
| Student's t | t | 3.060835 | Pr > |t| | 0.0064 |
| Sign | M | 2.5 | Pr >= |M| | 0.3593 |
| Signed Rank | S | 59.5 | Pr >= |S| | 0.0145 |
| Quantiles (Definition 5) | |
|---|---|
| Level | Quantile |
| 100% Max | 150.00000 |
| 99% | 150.00000 |
| 95% | 150.00000 |
| 90% | 139.28571 |
| 75% Q3 | 61.53846 |
| 50% Median | 8.25000 |
| 25% Q1 | -1.98529 |
| 10% | -5.33333 |
| 5% | -13.33333 |
| 1% | -20.00000 |
| 0% Min | -20.00000 |
| Extreme Observations | |||
|---|---|---|---|
| Lowest | Highest | ||
| Value | Obs | Value | Obs |
| -20.00000 | 11 | 73.0769 | 16 |
| -6.66667 | 18 | 114.2857 | 9 |
| -4.00000 | 5 | 128.5714 | 7 |
| -3.33333 | 14 | 150.0000 | 10 |
| -2.50000 | 2 | 150.0000 | 13 |
The UNIVARIATE Procedure
·
Distribution shape
·
Skewness
·
Extreme values
8. PROC CORR
proc corr data=construction_util;
var Planned_Duration Actual_Duration Cost_Overrun_Pct Safety_Issues;
run;
OUTPUT:
The CORR Procedure
| 4 Variables: | Planned_Duration Actual_Duration Cost_Overrun_Pct Safety_Issues |
|---|
| Simple Statistics | ||||||
|---|---|---|---|---|---|---|
| Variable | N | Mean | Std Dev | Sum | Minimum | Maximum |
| Planned_Duration | 20 | 157.00000 | 39.08156 | 3140 | 90.00000 | 220.00000 |
| Actual_Duration | 19 | 181.36842 | 99.25848 | 3446 | 31.00000 | 500.00000 |
| Cost_Overrun_Pct | 20 | 38.02555 | 55.55851 | 760.51110 | -20.00000 | 150.00000 |
| Safety_Issues | 20 | 1.85000 | 1.72520 | 37.00000 | 0 | 6.00000 |
| Pearson Correlation Coefficients Prob > |r| under H0: Rho=0 Number of Observations | ||||
|---|---|---|---|---|
| Planned_Duration | Actual_Duration | Cost_Overrun_Pct | Safety_Issues | |
| Planned_Duration | 1.00000 20 | 0.41586 0.0766 19 | -0.24220 0.3036 20 | -0.00703 0.9765 20 |
| Actual_Duration | 0.41586 0.0766 19 | 1.00000 19 | 0.48472 0.0354 19 | 0.62241 0.0044 19 |
| Cost_Overrun_Pct | -0.24220 0.3036 20 | 0.48472 0.0354 19 | 1.00000 20 | 0.87844 <.0001 20 |
| Safety_Issues | -0.00703 0.9765 20 | 0.62241 0.0044 19 | 0.87844 <.0001 20 | 1.00000 20 |
·
Detects relationships
·
Identifies correlated fraud patterns
9. Visualization
proc sgplot data=construction_util;
scatter x=Planned_Duration y=Actual_Duration;
run;
OUTPUT:
10. PROC APPEND
proc append base=construction_util
data=fraud_projects FORCE;
run;
proc print data=construction_util;
run;
OUTPUT:
| Obs | Project_ID | Location | Completion_Status | Planned_Duration | Actual_Duration | Planned_Cost | Actual_Cost | Safety_Issues | Start_Date | End_Date | Duration_Category | Derived_Duration | Cost_Overrun | Cost_Overrun_Pct | Project_Label | Utilization |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | P001 | Hyderabad | Completed | 180 | 200 | 5000000 | 6500000 | 2 | 01JAN2024 | 20JUL2024 | 200 | 201 | 1500000 | 30.000 | P001-HYDERABAD | Over Utilized |
| 2 | P002 | Mumbai | Completed | 150 | 120 | 4000000 | 3900000 | 0 | 15FEB2024 | 15JUN2024 | 120 | 121 | -100000 | -2.500 | P002-MUMBAI | Under Utilized |
| 3 | P003 | Bangalore | Completed | 200 | 250 | 6000000 | 8500000 | 3 | 01MAR2024 | 10NOV2024 | 250 | 254 | 2500000 | 41.667 | P003-BANGALORE | Over Utilized |
| 4 | P004 | Delhi | Ongoing | 100 | . | 3000000 | 4500000 | 1 | 01APR2024 | . | Unknown | . | 1500000 | 50.000 | P004-DELHI | Under Utilized |
| 5 | P005 | Chennai | Completed | 220 | 210 | 7500000 | 7200000 | 0 | 01JAN2024 | 01SEP2024 | 210 | 244 | -300000 | -4.000 | P005-CHENNAI | Under Utilized |
| 6 | P006 | Hyderabad | Completed | 180 | 31 | 5000000 | 5200000 | 2 | 01JAN2024 | 01FEB2024 | 31 | 31 | 200000 | 4.000 | P006-HYDERABAD | Under Utilized |
| 7 | P007 | Mumbai | Completed | 120 | 140 | 3500000 | 8000000 | 4 | 01MAY2024 | 15SEP2024 | 140 | 137 | 4500000 | 128.571 | P007-MUMBAI | Over Utilized |
| 8 | P008 | Bangalore | Completed | 160 | 160 | 4500000 | 4500000 | 1 | 01JUN2024 | 08NOV2024 | 160 | 160 | 0 | 0.000 | P008-BANGALORE | On Time |
| 9 | P009 | Delhi | Completed | 140 | 300 | 4200000 | 9000000 | 5 | 01JAN2024 | 28OCT2024 | 300 | 301 | 4800000 | 114.286 | P009-DELHI | Over Utilized |
| 10 | P010 | Chennai | Completed | 90 | 95 | 2000000 | 5000000 | 3 | 01JUL2024 | 05OCT2024 | 95 | 96 | 3000000 | 150.000 | P010-CHENNAI | Over Utilized |
| 11 | P011 | Hyderabad | Completed | 110 | 100 | 2500000 | 2000000 | 0 | 01MAR2024 | 09JUN2024 | 100 | 100 | -500000 | -20.000 | P011-HYDERABAD | Under Utilized |
| 12 | P012 | Mumbai | Completed | 200 | 220 | 7000000 | 10000000 | 2 | 01JAN2024 | 09AUG2024 | 220 | 221 | 3000000 | 42.857 | P012-MUMBAI | Over Utilized |
| 13 | P013 | Bangalore | Completed | 180 | 500 | 8000000 | 20000000 | 6 | 01JAN2024 | 15DEC2025 | 500 | 714 | 12000000 | 150.000 | P013-BANGALORE | Over Utilized |
| 14 | P014 | Delhi | Completed | 160 | 150 | 6000000 | 5800000 | 1 | 01FEB2024 | 30JUN2024 | 150 | 150 | -200000 | -3.333 | P014-DELHI | Under Utilized |
| 15 | P015 | Chennai | Completed | 210 | 215 | 7500000 | 7400000 | 0 | 01JAN2024 | 04AUG2024 | 215 | 216 | -100000 | -1.333 | P015-CHENNAI | Over Utilized |
| 16 | P016 | Hyderabad | Completed | 170 | 180 | 5200000 | 9000000 | 3 | 01APR2024 | 01OCT2024 | 180 | 183 | 3800000 | 73.077 | P016-HYDERABAD | Over Utilized |
| 17 | P017 | Mumbai | Completed | 190 | 180 | 6800000 | 6700000 | 1 | 01JAN2024 | 01JUL2024 | 180 | 182 | -100000 | -1.471 | P017-MUMBAI | Under Utilized |
| 18 | P018 | Bangalore | Completed | 100 | 95 | 3000000 | 2800000 | 0 | 01MAR2024 | 04JUN2024 | 95 | 95 | -200000 | -6.667 | P018-BANGALORE | Under Utilized |
| 19 | P019 | Delhi | Completed | 150 | 160 | 4000000 | 4500000 | 2 | 01APR2024 | 09SEP2024 | 160 | 161 | 500000 | 12.500 | P019-DELHI | Over Utilized |
| 20 | P020 | Chennai | Completed | 130 | 140 | 3500000 | 3600000 | 1 | 01MAY2024 | 18SEP2024 | 140 | 140 | 100000 | 2.857 | P020-CHENNAI | Over Utilized |
| 21 | P007 | Mumbai | Completed | 120 | 140 | 3500000 | 8000000 | 4 | 01MAY2024 | 15SEP2024 | 140 | 137 | 4500000 | 128.571 | P007-MUMBAI | Over Utilized |
| 22 | P009 | Delhi | Completed | 140 | 300 | 4200000 | 9000000 | 5 | 01JAN2024 | 28OCT2024 | 300 | 301 | 4800000 | 114.286 | P009-DELHI | Over Utilized |
| 23 | P010 | Chennai | Completed | 90 | 95 | 2000000 | 5000000 | 3 | 01JUL2024 | 05OCT2024 | 95 | 96 | 3000000 | 150.000 | P010-CHENNAI | Over Utilized |
| 24 | P013 | Bangalore | Completed | 180 | 500 | 8000000 | 20000000 | 6 | 01JAN2024 | 15DEC2025 | 500 | 714 | 12000000 | 150.000 | P013-BANGALORE | Over Utilized |
| 25 | P016 | Hyderabad | Completed | 170 | 180 | 5200000 | 9000000 | 3 | 01APR2024 | 01OCT2024 | 180 | 183 | 3800000 | 73.077 | P016-HYDERABAD | Over Utilized |
11. TRANSPOSE
proc transpose data=construction_util out=transposed;
var Planned_Duration Actual_Duration;
run;
proc print data=transposed;
run;
OUTPUT:
| Obs | _NAME_ | COL1 | COL2 | COL3 | COL4 | COL5 | COL6 | COL7 | COL8 | COL9 | COL10 | COL11 | COL12 | COL13 | COL14 | COL15 | COL16 | COL17 | COL18 | COL19 | COL20 | COL21 | COL22 | COL23 | COL24 | COL25 |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | Planned_Duration | 180 | 150 | 200 | 100 | 220 | 180 | 120 | 160 | 140 | 90 | 110 | 200 | 180 | 160 | 210 | 170 | 190 | 100 | 150 | 130 | 120 | 140 | 90 | 180 | 170 |
| 2 | Actual_Duration | 200 | 120 | 250 | . | 210 | 31 | 140 | 160 | 300 | 95 | 100 | 220 | 500 | 150 | 215 | 180 | 180 | 95 | 160 | 140 | 140 | 300 | 95 | 500 | 180 |
12. MERGE
proc sort data=construction_util;by Project_ID;run;
proc print data=construction_util;
run;
OUTPUT:
| Obs | Project_ID | Location | Completion_Status | Planned_Duration | Actual_Duration | Planned_Cost | Actual_Cost | Safety_Issues | Start_Date | End_Date | Duration_Category | Derived_Duration | Cost_Overrun | Cost_Overrun_Pct | Project_Label | Utilization |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | P001 | Hyderabad | Completed | 180 | 200 | 5000000 | 6500000 | 2 | 01JAN2024 | 20JUL2024 | 200 | 201 | 1500000 | 30.000 | P001-HYDERABAD | Over Utilized |
| 2 | P002 | Mumbai | Completed | 150 | 120 | 4000000 | 3900000 | 0 | 15FEB2024 | 15JUN2024 | 120 | 121 | -100000 | -2.500 | P002-MUMBAI | Under Utilized |
| 3 | P003 | Bangalore | Completed | 200 | 250 | 6000000 | 8500000 | 3 | 01MAR2024 | 10NOV2024 | 250 | 254 | 2500000 | 41.667 | P003-BANGALORE | Over Utilized |
| 4 | P004 | Delhi | Ongoing | 100 | . | 3000000 | 4500000 | 1 | 01APR2024 | . | Unknown | . | 1500000 | 50.000 | P004-DELHI | Under Utilized |
| 5 | P005 | Chennai | Completed | 220 | 210 | 7500000 | 7200000 | 0 | 01JAN2024 | 01SEP2024 | 210 | 244 | -300000 | -4.000 | P005-CHENNAI | Under Utilized |
| 6 | P006 | Hyderabad | Completed | 180 | 31 | 5000000 | 5200000 | 2 | 01JAN2024 | 01FEB2024 | 31 | 31 | 200000 | 4.000 | P006-HYDERABAD | Under Utilized |
| 7 | P007 | Mumbai | Completed | 120 | 140 | 3500000 | 8000000 | 4 | 01MAY2024 | 15SEP2024 | 140 | 137 | 4500000 | 128.571 | P007-MUMBAI | Over Utilized |
| 8 | P007 | Mumbai | Completed | 120 | 140 | 3500000 | 8000000 | 4 | 01MAY2024 | 15SEP2024 | 140 | 137 | 4500000 | 128.571 | P007-MUMBAI | Over Utilized |
| 9 | P008 | Bangalore | Completed | 160 | 160 | 4500000 | 4500000 | 1 | 01JUN2024 | 08NOV2024 | 160 | 160 | 0 | 0.000 | P008-BANGALORE | On Time |
| 10 | P009 | Delhi | Completed | 140 | 300 | 4200000 | 9000000 | 5 | 01JAN2024 | 28OCT2024 | 300 | 301 | 4800000 | 114.286 | P009-DELHI | Over Utilized |
| 11 | P009 | Delhi | Completed | 140 | 300 | 4200000 | 9000000 | 5 | 01JAN2024 | 28OCT2024 | 300 | 301 | 4800000 | 114.286 | P009-DELHI | Over Utilized |
| 12 | P010 | Chennai | Completed | 90 | 95 | 2000000 | 5000000 | 3 | 01JUL2024 | 05OCT2024 | 95 | 96 | 3000000 | 150.000 | P010-CHENNAI | Over Utilized |
| 13 | P010 | Chennai | Completed | 90 | 95 | 2000000 | 5000000 | 3 | 01JUL2024 | 05OCT2024 | 95 | 96 | 3000000 | 150.000 | P010-CHENNAI | Over Utilized |
| 14 | P011 | Hyderabad | Completed | 110 | 100 | 2500000 | 2000000 | 0 | 01MAR2024 | 09JUN2024 | 100 | 100 | -500000 | -20.000 | P011-HYDERABAD | Under Utilized |
| 15 | P012 | Mumbai | Completed | 200 | 220 | 7000000 | 10000000 | 2 | 01JAN2024 | 09AUG2024 | 220 | 221 | 3000000 | 42.857 | P012-MUMBAI | Over Utilized |
| 16 | P013 | Bangalore | Completed | 180 | 500 | 8000000 | 20000000 | 6 | 01JAN2024 | 15DEC2025 | 500 | 714 | 12000000 | 150.000 | P013-BANGALORE | Over Utilized |
| 17 | P013 | Bangalore | Completed | 180 | 500 | 8000000 | 20000000 | 6 | 01JAN2024 | 15DEC2025 | 500 | 714 | 12000000 | 150.000 | P013-BANGALORE | Over Utilized |
| 18 | P014 | Delhi | Completed | 160 | 150 | 6000000 | 5800000 | 1 | 01FEB2024 | 30JUN2024 | 150 | 150 | -200000 | -3.333 | P014-DELHI | Under Utilized |
| 19 | P015 | Chennai | Completed | 210 | 215 | 7500000 | 7400000 | 0 | 01JAN2024 | 04AUG2024 | 215 | 216 | -100000 | -1.333 | P015-CHENNAI | Over Utilized |
| 20 | P016 | Hyderabad | Completed | 170 | 180 | 5200000 | 9000000 | 3 | 01APR2024 | 01OCT2024 | 180 | 183 | 3800000 | 73.077 | P016-HYDERABAD | Over Utilized |
| 21 | P016 | Hyderabad | Completed | 170 | 180 | 5200000 | 9000000 | 3 | 01APR2024 | 01OCT2024 | 180 | 183 | 3800000 | 73.077 | P016-HYDERABAD | Over Utilized |
| 22 | P017 | Mumbai | Completed | 190 | 180 | 6800000 | 6700000 | 1 | 01JAN2024 | 01JUL2024 | 180 | 182 | -100000 | -1.471 | P017-MUMBAI | Under Utilized |
| 23 | P018 | Bangalore | Completed | 100 | 95 | 3000000 | 2800000 | 0 | 01MAR2024 | 04JUN2024 | 95 | 95 | -200000 | -6.667 | P018-BANGALORE | Under Utilized |
| 24 | P019 | Delhi | Completed | 150 | 160 | 4000000 | 4500000 | 2 | 01APR2024 | 09SEP2024 | 160 | 161 | 500000 | 12.500 | P019-DELHI | Over Utilized |
| 25 | P020 | Chennai | Completed | 130 | 140 | 3500000 | 3600000 | 1 | 01MAY2024 | 18SEP2024 | 140 | 140 | 100000 | 2.857 | P020-CHENNAI | Over Utilized |
proc sort data=fraud_projects;by Project_ID;run;
proc print data=fraud_projects;
run;
OUTPUT:
| Obs | Project_ID | Location | Completion_Status | Planned_Duration | Actual_Duration | Planned_Cost | Actual_Cost | Safety_Issues | Start_Date | End_Date | Duration_Category | Derived_Duration | Cost_Overrun | Cost_Overrun_Pct | Project_Label | Utilization | Fraud_Flag |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | P007 | Mumbai | Completed | 120 | 140 | 3500000 | 8000000 | 4 | 01MAY2024 | 15SEP2024 | 140 | 137 | 4500000 | 128.571 | P007-MUMBAI | Over Utilized | Yes |
| 2 | P009 | Delhi | Completed | 140 | 300 | 4200000 | 9000000 | 5 | 01JAN2024 | 28OCT2024 | 300 | 301 | 4800000 | 114.286 | P009-DELHI | Over Utilized | Yes |
| 3 | P010 | Chennai | Completed | 90 | 95 | 2000000 | 5000000 | 3 | 01JUL2024 | 05OCT2024 | 95 | 96 | 3000000 | 150.000 | P010-CHENNAI | Over Utilized | Yes |
| 4 | P013 | Bangalore | Completed | 180 | 500 | 8000000 | 20000000 | 6 | 01JAN2024 | 15DEC2025 | 500 | 714 | 12000000 | 150.000 | P013-BANGALORE | Over Utilized | Yes |
| 5 | P016 | Hyderabad | Completed | 170 | 180 | 5200000 | 9000000 | 3 | 01APR2024 | 01OCT2024 | 180 | 183 | 3800000 | 73.077 | P016-HYDERABAD | Over Utilized | Yes |
data merged_data;
merge construction_util
fraud_projects;
by Project_ID;
run;
proc print data=merged_data;
run;
OUTPUT:
| Obs | Project_ID | Location | Completion_Status | Planned_Duration | Actual_Duration | Planned_Cost | Actual_Cost | Safety_Issues | Start_Date | End_Date | Duration_Category | Derived_Duration | Cost_Overrun | Cost_Overrun_Pct | Project_Label | Utilization | Fraud_Flag |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | P001 | Hyderabad | Completed | 180 | 200 | 5000000 | 6500000 | 2 | 01JAN2024 | 20JUL2024 | 200 | 201 | 1500000 | 30.000 | P001-HYDERABAD | Over Utilized | |
| 2 | P002 | Mumbai | Completed | 150 | 120 | 4000000 | 3900000 | 0 | 15FEB2024 | 15JUN2024 | 120 | 121 | -100000 | -2.500 | P002-MUMBAI | Under Utilized | |
| 3 | P003 | Bangalore | Completed | 200 | 250 | 6000000 | 8500000 | 3 | 01MAR2024 | 10NOV2024 | 250 | 254 | 2500000 | 41.667 | P003-BANGALORE | Over Utilized | |
| 4 | P004 | Delhi | Ongoing | 100 | . | 3000000 | 4500000 | 1 | 01APR2024 | . | Unknown | . | 1500000 | 50.000 | P004-DELHI | Under Utilized | |
| 5 | P005 | Chennai | Completed | 220 | 210 | 7500000 | 7200000 | 0 | 01JAN2024 | 01SEP2024 | 210 | 244 | -300000 | -4.000 | P005-CHENNAI | Under Utilized | |
| 6 | P006 | Hyderabad | Completed | 180 | 31 | 5000000 | 5200000 | 2 | 01JAN2024 | 01FEB2024 | 31 | 31 | 200000 | 4.000 | P006-HYDERABAD | Under Utilized | |
| 7 | P007 | Mumbai | Completed | 120 | 140 | 3500000 | 8000000 | 4 | 01MAY2024 | 15SEP2024 | 140 | 137 | 4500000 | 128.571 | P007-MUMBAI | Over Utilized | Yes |
| 8 | P007 | Mumbai | Completed | 120 | 140 | 3500000 | 8000000 | 4 | 01MAY2024 | 15SEP2024 | 140 | 137 | 4500000 | 128.571 | P007-MUMBAI | Over Utilized | Yes |
| 9 | P008 | Bangalore | Completed | 160 | 160 | 4500000 | 4500000 | 1 | 01JUN2024 | 08NOV2024 | 160 | 160 | 0 | 0.000 | P008-BANGALORE | On Time | |
| 10 | P009 | Delhi | Completed | 140 | 300 | 4200000 | 9000000 | 5 | 01JAN2024 | 28OCT2024 | 300 | 301 | 4800000 | 114.286 | P009-DELHI | Over Utilized | Yes |
| 11 | P009 | Delhi | Completed | 140 | 300 | 4200000 | 9000000 | 5 | 01JAN2024 | 28OCT2024 | 300 | 301 | 4800000 | 114.286 | P009-DELHI | Over Utilized | Yes |
| 12 | P010 | Chennai | Completed | 90 | 95 | 2000000 | 5000000 | 3 | 01JUL2024 | 05OCT2024 | 95 | 96 | 3000000 | 150.000 | P010-CHENNAI | Over Utilized | Yes |
| 13 | P010 | Chennai | Completed | 90 | 95 | 2000000 | 5000000 | 3 | 01JUL2024 | 05OCT2024 | 95 | 96 | 3000000 | 150.000 | P010-CHENNAI | Over Utilized | Yes |
| 14 | P011 | Hyderabad | Completed | 110 | 100 | 2500000 | 2000000 | 0 | 01MAR2024 | 09JUN2024 | 100 | 100 | -500000 | -20.000 | P011-HYDERABAD | Under Utilized | |
| 15 | P012 | Mumbai | Completed | 200 | 220 | 7000000 | 10000000 | 2 | 01JAN2024 | 09AUG2024 | 220 | 221 | 3000000 | 42.857 | P012-MUMBAI | Over Utilized | |
| 16 | P013 | Bangalore | Completed | 180 | 500 | 8000000 | 20000000 | 6 | 01JAN2024 | 15DEC2025 | 500 | 714 | 12000000 | 150.000 | P013-BANGALORE | Over Utilized | Yes |
| 17 | P013 | Bangalore | Completed | 180 | 500 | 8000000 | 20000000 | 6 | 01JAN2024 | 15DEC2025 | 500 | 714 | 12000000 | 150.000 | P013-BANGALORE | Over Utilized | Yes |
| 18 | P014 | Delhi | Completed | 160 | 150 | 6000000 | 5800000 | 1 | 01FEB2024 | 30JUN2024 | 150 | 150 | -200000 | -3.333 | P014-DELHI | Under Utilized | |
| 19 | P015 | Chennai | Completed | 210 | 215 | 7500000 | 7400000 | 0 | 01JAN2024 | 04AUG2024 | 215 | 216 | -100000 | -1.333 | P015-CHENNAI | Over Utilized | |
| 20 | P016 | Hyderabad | Completed | 170 | 180 | 5200000 | 9000000 | 3 | 01APR2024 | 01OCT2024 | 180 | 183 | 3800000 | 73.077 | P016-HYDERABAD | Over Utilized | Yes |
| 21 | P016 | Hyderabad | Completed | 170 | 180 | 5200000 | 9000000 | 3 | 01APR2024 | 01OCT2024 | 180 | 183 | 3800000 | 73.077 | P016-HYDERABAD | Over Utilized | Yes |
| 22 | P017 | Mumbai | Completed | 190 | 180 | 6800000 | 6700000 | 1 | 01JAN2024 | 01JUL2024 | 180 | 182 | -100000 | -1.471 | P017-MUMBAI | Under Utilized | |
| 23 | P018 | Bangalore | Completed | 100 | 95 | 3000000 | 2800000 | 0 | 01MAR2024 | 04JUN2024 | 95 | 95 | -200000 | -6.667 | P018-BANGALORE | Under Utilized | |
| 24 | P019 | Delhi | Completed | 150 | 160 | 4000000 | 4500000 | 2 | 01APR2024 | 09SEP2024 | 160 | 161 | 500000 | 12.500 | P019-DELHI | Over Utilized | |
| 25 | P020 | Chennai | Completed | 130 | 140 | 3500000 | 3600000 | 1 | 01MAY2024 | 18SEP2024 | 140 | 140 | 100000 | 2.857 | P020-CHENNAI | Over Utilized |
13. Cleanup
proc datasets library=work;
delete construction_raw;
quit;
LOG:
14. Full Corrected Dataset Code
data construction_master;
set construction_raw;
Location = propcase(strip(Location));
Completion_Status = propcase(strip(Completion_Status));
if Actual_Duration < 0 then Actual_Duration = .;
Derived_Duration = intck('day', Start_Date, End_Date);
if missing(Actual_Duration) then Actual_Duration = Derived_Duration;
Cost_Overrun = Actual_Cost - Planned_Cost;
Cost_Overrun_Pct = (Cost_Overrun / Planned_Cost) * 100;
if Actual_Duration > Planned_Duration then Utilization="Over Utilized";
else if Actual_Duration = Planned_Duration then Utilization="On Time";
else Utilization="Under Utilized";
if Cost_Overrun_Pct > 50 or
(Actual_Duration - Planned_Duration) > 100 or
Safety_Issues > 5 then Fraud_Flag="Yes";
else Fraud_Flag="No";
run;
proc print data=construction_master;
run;
OUTPUT:
| Obs | Project_ID | Location | Completion_Status | Planned_Duration | Actual_Duration | Planned_Cost | Actual_Cost | Safety_Issues | Start_Date | End_Date | Derived_Duration | Cost_Overrun | Cost_Overrun_Pct | Utilization | Fraud_Flag |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | P001 | Hyderabad | Completed | 180 | 200 | 5000000 | 6500000 | 2 | 01JAN2024 | 20JUL2024 | 201 | 1500000 | 30.000 | Over Utilized | No |
| 2 | P002 | Mumbai | Completed | 150 | 120 | 4000000 | 3900000 | 0 | 15FEB2024 | 15JUN2024 | 121 | -100000 | -2.500 | Under Utilize | No |
| 3 | P003 | Bangalore | Completed | 200 | 250 | 6000000 | 8500000 | 3 | 01MAR2024 | 10NOV2024 | 254 | 2500000 | 41.667 | Over Utilized | No |
| 4 | P004 | Delhi | Ongoing | 100 | . | 3000000 | 4500000 | 1 | 01APR2024 | . | . | 1500000 | 50.000 | Under Utilize | No |
| 5 | P005 | Chennai | Completed | 220 | 210 | 7500000 | 7200000 | 0 | 01JAN2024 | 01SEP2024 | 244 | -300000 | -4.000 | Under Utilize | No |
| 6 | P006 | Hyderabad | Completed | 180 | 31 | 5000000 | 5200000 | 2 | 01JAN2024 | 01FEB2024 | 31 | 200000 | 4.000 | Under Utilize | No |
| 7 | P007 | Mumbai | Completed | 120 | 140 | 3500000 | 8000000 | 4 | 01MAY2024 | 15SEP2024 | 137 | 4500000 | 128.571 | Over Utilized | Yes |
| 8 | P008 | Bangalore | Completed | 160 | 160 | 4500000 | 4500000 | 1 | 01JUN2024 | 08NOV2024 | 160 | 0 | 0.000 | On Time | No |
| 9 | P009 | Delhi | Completed | 140 | 300 | 4200000 | 9000000 | 5 | 01JAN2024 | 28OCT2024 | 301 | 4800000 | 114.286 | Over Utilized | Yes |
| 10 | P010 | Chennai | Completed | 90 | 95 | 2000000 | 5000000 | 3 | 01JUL2024 | 05OCT2024 | 96 | 3000000 | 150.000 | Over Utilized | Yes |
| 11 | P011 | Hyderabad | Completed | 110 | 100 | 2500000 | 2000000 | 0 | 01MAR2024 | 09JUN2024 | 100 | -500000 | -20.000 | Under Utilize | No |
| 12 | P012 | Mumbai | Completed | 200 | 220 | 7000000 | 10000000 | 2 | 01JAN2024 | 09AUG2024 | 221 | 3000000 | 42.857 | Over Utilized | No |
| 13 | P013 | Bangalore | Completed | 180 | 500 | 8000000 | 20000000 | 6 | 01JAN2024 | 15DEC2025 | 714 | 12000000 | 150.000 | Over Utilized | Yes |
| 14 | P014 | Delhi | Completed | 160 | 150 | 6000000 | 5800000 | 1 | 01FEB2024 | 30JUN2024 | 150 | -200000 | -3.333 | Under Utilize | No |
| 15 | P015 | Chennai | Completed | 210 | 215 | 7500000 | 7400000 | 0 | 01JAN2024 | 04AUG2024 | 216 | -100000 | -1.333 | Over Utilized | No |
| 16 | P016 | Hyderabad | Completed | 170 | 180 | 5200000 | 9000000 | 3 | 01APR2024 | 01OCT2024 | 183 | 3800000 | 73.077 | Over Utilized | Yes |
| 17 | P017 | Mumbai | Completed | 190 | 180 | 6800000 | 6700000 | 1 | 01JAN2024 | 01JUL2024 | 182 | -100000 | -1.471 | Under Utilize | No |
| 18 | P018 | Bangalore | Completed | 100 | 95 | 3000000 | 2800000 | 0 | 01MAR2024 | 04JUN2024 | 95 | -200000 | -6.667 | Under Utilize | No |
| 19 | P019 | Delhi | Completed | 150 | 160 | 4000000 | 4500000 | 2 | 01APR2024 | 09SEP2024 | 161 | 500000 | 12.500 | Over Utilized | No |
| 20 | P020 | Chennai | Completed | 130 | 140 | 3500000 | 3600000 | 1 | 01MAY2024 | 18SEP2024 | 140 | 100000 | 2.857 | Over Utilized | No |
Why Each Procedure Was Used
|
Procedure |
Purpose |
|
PROC SQL |
Error detection & filtering |
|
PROC FREQ |
Frequency of fraud |
|
PROC MEANS |
Summary stats |
|
PROC UNIVARIATE |
Distribution analysis |
|
PROC CORR |
Relationship study |
|
PROC SGPLOT |
Visualization |
|
MACROS |
Automation |
|
APPEND |
Combine datasets |
|
TRANSPOSE |
Reshape data |
|
MERGE |
Join datasets |
|
PROC DATASETS DELETE |
Cleanup |
20 Key Points About The Project
1.The
project simulates a real-world construction project monitoring environment
across multiple cities, making it practical, operational, and business-oriented
rather than purely theoretical.
2.A
structured dataset was created with key variables such as Project_ID, Location,
Planned_Duration, Actual_Duration, Planned_Cost, Actual_Cost, Cost_Overrun,
Cost_Overrun_Percentage, Safety_Issues, Completion_Status, Start_Date,
End_Date, Utilization_Category, and Fraud_Flag to reflect real infrastructure
reporting systems.
3.More
than 18 project observations were included to represent diverse project types
and regional variations, ensuring analytical depth and realistic scenario
modeling.
4.Multiple
intentional data issues were introduced, including negative durations, missing
actual durations, inconsistent location formats, incorrect completion status
cases, unrealistic cost overruns, and abnormal safety counts to demonstrate
structured error detection.
5.Data cleaning
was performed using character functions such as STRIP( ), PROPCASE( ), UPCASE( ),
LOWCASE( ), CAT( ), CATX( ), and COALESCE( ) to standardize location names,
project labels, and reporting fields.
6.Numeric
validation logic was applied to correct unrealistic values, such as converting
negative durations to missing, recalculating missing durations using date
differences, and verifying cost overrun computations.
7.Derived
financial metrics such as Cost_Overrun (Actual − Planned) and
Cost_Overrun_Percentage were calculated to measure budget performance and
detect excessive variance.
8.Date
intelligence functions including MDY(), INTCK(), and INTNX() were used to
compute project timelines, validate duration accuracy, and align reporting
periods.
9.Project
utilization was automated through a macro that classified projects into Over
Utilized, On Time, and Under Utilized categories based on planned versus actual
duration comparisons.
10.Fraud
detection logic was implemented using a macro that flagged projects with
extreme cost overruns, excessive duration deviations, or unusually high safety
incidents.
11.PROC
SQL was used for conditional validation, anomaly detection, and filtered
reporting, simulating audit-style query logic.
12.PROC
MEANS generated descriptive statistics such as mean, standard deviation,
minimum, and maximum values for cost and duration variables.
13.PROC
UNIVARIATE was used to analyze distribution patterns, detect skewness, and
identify outliers in cost overrun percentages.
14.PROC
FREQ provided categorical analysis for variables such as Completion_Status,
Fraud_Flag, and Utilization_Category.
15.PROC
CORR evaluated relationships between project delays, cost overruns, and safety
issues to uncover risk patterns.
16.Visualization
using PROC SGPLOT helped validate relationships between Planned_Duration and
Actual_Duration through scatter plots for performance comparison.
17.Dataset
restructuring techniques such as SET were used for sequential data processing
and transformation.
18.MERGE
logic was applied to combine fraud-flagged projects with the master dataset for
consolidated reporting.
19.PROC
APPEND enabled scalable expansion of project records without rewriting the
original data structure.
20.PROC
TRANSPOSE and PROC DATASETS DELETE ensured reporting flexibility, workspace
optimization, and production-ready data management practices.
Project Summary
This project builds a smart Construction Project Monitoring & Fraud Detection System
in SAS designed to simulate a real-world infrastructure management
environment. The objective is to monitor project timelines, financial
performance, safety compliance, and detect abnormal or potentially fraudulent
activities using structured analytics.
A comprehensive dataset was created with key variables
such as Project_ID, Location, Planned_Duration, Actual_Duration, Planned_Cost,
Actual_Cost, Safety_Issues, Completion_Status, Start_Date, and End_Date.
Derived metrics like Cost_Overrun, Cost_Overrun_Percentage,
Utilization_Category, and Fraud_Flag were calculated to measure operational
efficiency and financial deviation.
To make the system realistic, multiple
intentional errors were introduced, including negative durations, missing
values, inconsistent text formatting, abnormal cost spikes, and exaggerated
safety counts. These issues were identified and corrected using SAS data step
logic, numeric validation rules, and character standardization functions.
Date intelligence functions such as INTCK( ),
INTNX( ), and MDY( ) were used to validate timelines and compute accurate
project durations. Automated macros classified project utilization (Over
Utilized, On Time, Under Utilized) and implemented fraud detection rules based
on excessive cost overruns, extreme delays, and abnormal safety incidents.
Analytical procedures including PROC SQL, PROC
MEANS, PROC UNIVARIATE, PROC FREQ, and PROC CORR generated statistical
summaries, distribution insights, and relationship analysis. Visualization
through PROC SGPLOT helped validate performance trends and anomaly patterns.
Overall,
the project demonstrates how SAS can be used to build a scalable, audit-ready,
and business-oriented construction monitoring system that integrates data
cleaning, automation, statistical analysis, fraud detection, and performance
reporting in a structured and production-level workflow.
Conclusion
This project successfully demonstrates how a
smart Construction Project Monitoring & Fraud Detection System can be built
in SAS while systematically identifying and correcting intentional data errors.
By designing a realistic, business-oriented dataset and introducing controlled
inconsistencies, the project reflects practical challenges faced in real-world
infrastructure management environments.
Through structured data validation, character
and numeric standardization, and intelligent date handling using functions like
INTCK(), INTNX(), and MDY(), data quality issues were effectively detected and
resolved. Financial performance was evaluated using derived metrics such as
Cost_Overrun and Cost_Overrun_Percentage, while operational efficiency was
measured through utilization classification logic implemented via macros.
Fraud detection mechanisms were built using
rule-based thresholds to flag abnormal cost spikes, excessive delays, and
unusually high safety incidents. Statistical procedures including PROC SQL,
PROC MEANS, PROC UNIVARIATE, PROC FREQ, and PROC CORR provided analytical depth
by generating summaries, identifying outliers, and analyzing variable
relationships. Visualization using PROC SGPLOT further strengthened performance
validation and anomaly detection.
SAS INTERVIEW QUESTIONS
·
Explain the difference between %LET and
CALL SYMPUT.
·
What are local and global macro
variables?
·
What is the difference between %MACRO
and macro function?
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
About the Author:
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
About the Author:
SAS Learning Hub is a data analytics and SAS programming platform focused on clinical, financial, and real-world data analysis. The content is created by professionals with academic training in Pharmaceutics and hands-on experience in Base SAS, PROC SQL, Macros, SDTM, and ADaM, providing practical and industry-relevant SAS learning resources.
Disclaimer:
The datasets and analysis in this article are created for educational and demonstration purposes only. They do not represent CONSTRUCTION data.
Our Mission:
This blog provides industry-focused SAS programming tutorials and analytics projects covering finance, healthcare, and technology.
This project is suitable for:
· Students learning SAS
· Data analysts building portfolios
· Professionals preparing for SAS interviews
· Bloggers writing about analytics
· Clinical SAS Programmer
· Research Data Analyst
· Regulatory Data Validator
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Comments
Post a Comment