423.Can We Detect and Correct Intentional Errors in Industrial Equipment Maintenance Data Using Advanced SAS Programming?

Detecting and Correcting Intentional Errors in Industrial Equipment Maintenance Data Using Advanced SAS Programming

--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

HERE IN THIS PROJECT WE USED THESE SAS STATEMENTS —DATA STEP | LENGTH | INPUT | DATALINES | SET | MERGE  | PROC SORT | PROC PRINT | PROC TRANSPOSE | PROC APPEND | PROC DATASETS DELETE | MACRO / %MACRO / %MEND | NUMERIC AND CHARACTER FUNCTIONS

--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

1. Introduction

In modern industrial environments, maintenance analytics plays a critical role in ensuring operational efficiency, equipment reliability, and cost optimization. Manufacturing plants, oil refineries, pharmaceutical facilities, and logistics warehouses all rely heavily on machinery that must operate continuously with minimal downtime.

However, industrial maintenance datasets often contain numerous data quality issues such as:

·  Incorrect maintenance costs

·  Invalid equipment IDs

·  Missing dates

·  Incorrect percentages

·  Fraudulent maintenance fees

·  Duplicate maintenance records

·  Incorrect downtime calculations

Data engineers and SAS programmers are responsible for detecting and correcting these errors using structured data validation techniques.

In this project, we simulate an Industrial Equipment Maintenance dataset containing:

·  equipment usage data

·  maintenance records

·  operational failures

·  downtime analytics

·  cost tracking

·  fraud detection indicators

The dataset intentionally includes multiple errors, which will later be identified and corrected using advanced SAS programming techniques.

2. Table of Contents

  1. Introduction
  2. Business Context
  3. Dataset Structure
  4. Raw Dataset Creation (with errors)
  5. Intentional Error Injection
  6. Data Cleaning Process
  7. SAS Macro for Fraud Detection
  8. Date Handling Functions
  9. Dataset Combination Techniques
  10. Data Transformation using PROC TRANSPOSE
  11. Numeric Function Applications
  12. Character Function Applications
  13. Utilization Classification
  14. Dataset Appending and Merging
  15. Corrected Dataset Generation
  16. 20 Key Points About This Project
  17. Conclusion

3. Business Context

Industrial companies track maintenance performance using key variables such as:

Variable

Description

Equipment_ID

Unique machine identifier

Usage_Hours

Total operating hours

Failure_Count

Number of breakdown events

Maintenance_Cost

Cost of repair

Downtime_Hours

Machine downtime

Condition_Index

Equipment health score

Remaining_Life

Estimated life remaining

Percentage

Maintenance completion %

Fees

Service fees

Utilization_Class

Equipment utilization category

Maintenance_Date

Date of service

Maintenance fraud or errors may occur in cases such as:

• Excessive maintenance cost compared to usage
• Fake downtime reporting
• Repeated maintenance billing
• Missing equipment identifiers
• Incorrect life estimation

Detecting such anomalies is an important task for SAS programmers working in industrial analytics.

1. Raw Dataset Creation (With Intentional Errors)

title "Industrial Equipment Maintenance Dataset With Intentional Errors";

data equipment_raw;

length Equipment_ID $10 Equipment_Type $20 Location $20 Technician $20;

input Equipment_ID $ Equipment_Type $ Usage_Hours

      Failure_Count Maintenance_Cost Downtime_Hours

      Condition_Index Remaining_Life Percentage

      Fees Location $ Technician $ Maintenance_Date :ddmmyy10.;

format Maintenance_Date date9.;

datalines;

EQ001 Pump 1200 2 5000 10 85 24 90 250 Hyderabad Ravi 12-01-2025

EQ002 Motor 900 5 12000 30 60 18 85 500 Chennai Kiran 14-02-2025

EQ003 Compressor 1500 3 8000 15 75 20 88 350 Mumbai Suresh 21-03-2025

EQ004 Pump 2000 7 20000 45 40 10 95 800 Pune Ramesh 11-04-2025

EQ005 Turbine 500 1 3000 5 92 30 101 200 Delhi Ajay 22-05-2025

EQ006 Pump 700 2 -500 8 88 26 85 150 Hyderabad Ravi 15-06-2025

EQ007 Motor 1100 0 2500 0 95 35 80 100 Chennai Kiran 18-07-2025

EQ008 Compressor 1600 6 15000 40 55 15 92 650 Mumbai Suresh 25-08-2025

EQ009 Pump . 3 6000 12 70 20 88 300 Pune Ramesh 19-09-2025

EQ010 Turbine 1300 4 9000 22 65 18 89 400 Delhi Ajay 02-10-2025

EQ011 Pump 1000 2 5500 11 80 23 87 250 Hyderabad Ravi 17-11-2025

EQ012 Motor 1400 3 9500 20 72 19 84 450 Chennai Kiran 09-12-2025

EQ013 Compressor 1700 8 21000 60 35 8 99 900 Mumbai Suresh 01-01-2026

EQ014 Pump 850 2 4500 9 82 24 86 230 Pune Ramesh 14-02-2026

EQ015 Turbine 600 1 3200 6 90 28 83 180 Delhi Ajay 05-03-2026

;

run;

proc print data=equipment_raw;

run;

title;

OUTPUT:

Industrial Equipment Maintenance Dataset With Intentional Errors

ObsEquipment_IDEquipment_TypeLocationTechnicianUsage_HoursFailure_CountMaintenance_CostDowntime_HoursCondition_IndexRemaining_LifePercentageFeesMaintenance_Date
1EQ001PumpHyderabadRavi1200250001085249025012JAN2025
2EQ002MotorChennaiKiran9005120003060188550014FEB2025
3EQ003CompressorMumbaiSuresh1500380001575208835021MAR2025
4EQ004PumpPuneRamesh20007200004540109580011APR2025
5EQ005TurbineDelhiAjay500130005923010120022MAY2025
6EQ006PumpHyderabadRavi7002-500888268515015JUN2025
7EQ007MotorChennaiKiran110002500095358010018JUL2025
8EQ008CompressorMumbaiSuresh16006150004055159265025AUG2025
9EQ009PumpPuneRamesh.360001270208830019SEP2025
10EQ010TurbineDelhiAjay1300490002265188940002OCT2025
11EQ011PumpHyderabadRavi1000255001180238725017NOV2025
12EQ012MotorChennaiKiran1400395002072198445009DEC2025
13EQ013CompressorMumbaiSuresh1700821000603589990001JAN2026
14EQ014PumpPuneRamesh85024500982248623014FEB2026
15EQ015TurbineDelhiAjay60013200690288318005MAR2026

Intentional Errors Introduced

This dataset deliberately contains multiple issues:

Error 1 — Invalid Percentage

Percentage = 101

Percentage cannot exceed 100.

Error 2 — Negative Maintenance Cost

Maintenance_Cost = -500

Maintenance costs cannot be negative.

Error 3 — Missing Usage Hours

Usage_Hours = .

This affects utilization calculations.

Error 4 — Abnormally High Maintenance Cost

21000

Possible fraud.

2. Fraud Detection Macro

%macro fraud_detection;

data fraud_flag;

set equipment_raw;

if Maintenance_Cost > 20000 then Fraud_Flag = "YES";

else Fraud_Flag = "NO";

run;

proc print data=fraud_flag;

run;

%mend;


%fraud_detection;

OUTPUT:

ObsEquipment_IDEquipment_TypeLocationTechnicianUsage_HoursFailure_CountMaintenance_CostDowntime_HoursCondition_IndexRemaining_LifePercentageFeesMaintenance_DateFraud_Flag
1EQ001PumpHyderabadRavi1200250001085249025012JAN2025NO
2EQ002MotorChennaiKiran9005120003060188550014FEB2025NO
3EQ003CompressorMumbaiSuresh1500380001575208835021MAR2025NO
4EQ004PumpPuneRamesh20007200004540109580011APR2025NO
5EQ005TurbineDelhiAjay500130005923010120022MAY2025NO
6EQ006PumpHyderabadRavi7002-500888268515015JUN2025NO
7EQ007MotorChennaiKiran110002500095358010018JUL2025NO
8EQ008CompressorMumbaiSuresh16006150004055159265025AUG2025NO
9EQ009PumpPuneRamesh.360001270208830019SEP2025NO
10EQ010TurbineDelhiAjay1300490002265188940002OCT2025NO
11EQ011PumpHyderabadRavi1000255001180238725017NOV2025NO
12EQ012MotorChennaiKiran1400395002072198445009DEC2025NO
13EQ013CompressorMumbaiSuresh1700821000603589990001JAN2026YES
14EQ014PumpPuneRamesh85024500982248623014FEB2026NO
15EQ015TurbineDelhiAjay60013200690288318005MAR2026NO

3. Utilization Classification

data utilization_class;

set equipment_raw;

length Utilization_Class $8.;

if Usage_Hours >=1500 then Utilization_Class="HIGH";

else if Usage_Hours >=800 then Utilization_Class="MEDIUM";

else Utilization_Class="LOW";

run;

proc print data=utilization_class;

run;

OUTPUT:

ObsEquipment_IDEquipment_TypeLocationTechnicianUsage_HoursFailure_CountMaintenance_CostDowntime_HoursCondition_IndexRemaining_LifePercentageFeesMaintenance_DateUtilization_Class
1EQ001PumpHyderabadRavi1200250001085249025012JAN2025MEDIUM
2EQ002MotorChennaiKiran9005120003060188550014FEB2025MEDIUM
3EQ003CompressorMumbaiSuresh1500380001575208835021MAR2025HIGH
4EQ004PumpPuneRamesh20007200004540109580011APR2025HIGH
5EQ005TurbineDelhiAjay500130005923010120022MAY2025LOW
6EQ006PumpHyderabadRavi7002-500888268515015JUN2025LOW
7EQ007MotorChennaiKiran110002500095358010018JUL2025MEDIUM
8EQ008CompressorMumbaiSuresh16006150004055159265025AUG2025HIGH
9EQ009PumpPuneRamesh.360001270208830019SEP2025LOW
10EQ010TurbineDelhiAjay1300490002265188940002OCT2025MEDIUM
11EQ011PumpHyderabadRavi1000255001180238725017NOV2025MEDIUM
12EQ012MotorChennaiKiran1400395002072198445009DEC2025MEDIUM
13EQ013CompressorMumbaiSuresh1700821000603589990001JAN2026HIGH
14EQ014PumpPuneRamesh85024500982248623014FEB2026MEDIUM
15EQ015TurbineDelhiAjay60013200690288318005MAR2026LOW

4. Date Functions

data MDY;

set equipment_raw;

Next_Service_Date = mdy(month(Maintenance_Date),day(Maintenance_Date)

                    ,year(Maintenance_Date)+1);

run;

proc print data=MDY;

run;

OUTPUT:

ObsEquipment_IDEquipment_TypeLocationTechnicianUsage_HoursFailure_CountMaintenance_CostDowntime_HoursCondition_IndexRemaining_LifePercentageFeesMaintenance_DateNext_Service_Date
1EQ001PumpHyderabadRavi1200250001085249025012JAN202524118
2EQ002MotorChennaiKiran9005120003060188550014FEB202524151
3EQ003CompressorMumbaiSuresh1500380001575208835021MAR202524186
4EQ004PumpPuneRamesh20007200004540109580011APR202524207
5EQ005TurbineDelhiAjay500130005923010120022MAY202524248
6EQ006PumpHyderabadRavi7002-500888268515015JUN202524272
7EQ007MotorChennaiKiran110002500095358010018JUL202524305
8EQ008CompressorMumbaiSuresh16006150004055159265025AUG202524343
9EQ009PumpPuneRamesh.360001270208830019SEP202524368
10EQ010TurbineDelhiAjay1300490002265188940002OCT202524381
11EQ011PumpHyderabadRavi1000255001180238725017NOV202524427
12EQ012MotorChennaiKiran1400395002072198445009DEC202524449
13EQ013CompressorMumbaiSuresh1700821000603589990001JAN202624472
14EQ014PumpPuneRamesh85024500982248623014FEB202624516
15EQ015TurbineDelhiAjay60013200690288318005MAR202624535

Creates next service schedule.

data INTCK;

set equipment_raw;

Service_Age = intck('month',Maintenance_Date,today());

run;

proc print data=INTCK;

run;

OUTPUT:

ObsEquipment_IDEquipment_TypeLocationTechnicianUsage_HoursFailure_CountMaintenance_CostDowntime_HoursCondition_IndexRemaining_LifePercentageFeesMaintenance_DateService_Age
1EQ001PumpHyderabadRavi1200250001085249025012JAN202514
2EQ002MotorChennaiKiran9005120003060188550014FEB202513
3EQ003CompressorMumbaiSuresh1500380001575208835021MAR202512
4EQ004PumpPuneRamesh20007200004540109580011APR202511
5EQ005TurbineDelhiAjay500130005923010120022MAY202510
6EQ006PumpHyderabadRavi7002-500888268515015JUN20259
7EQ007MotorChennaiKiran110002500095358010018JUL20258
8EQ008CompressorMumbaiSuresh16006150004055159265025AUG20257
9EQ009PumpPuneRamesh.360001270208830019SEP20256
10EQ010TurbineDelhiAjay1300490002265188940002OCT20255
11EQ011PumpHyderabadRavi1000255001180238725017NOV20254
12EQ012MotorChennaiKiran1400395002072198445009DEC20253
13EQ013CompressorMumbaiSuresh1700821000603589990001JAN20262
14EQ014PumpPuneRamesh85024500982248623014FEB20261
15EQ015TurbineDelhiAjay60013200690288318005MAR20260

Counts months since last service.

data INTNX;

set equipment_raw;

Service_Age = intck('month',Maintenance_Date,today());

run;

proc print data=INTNX;

run;

OUTPUT:

ObsEquipment_IDEquipment_TypeLocationTechnicianUsage_HoursFailure_CountMaintenance_CostDowntime_HoursCondition_IndexRemaining_LifePercentageFeesMaintenance_DateService_Age
1EQ001PumpHyderabadRavi1200250001085249025012JAN202514
2EQ002MotorChennaiKiran9005120003060188550014FEB202513
3EQ003CompressorMumbaiSuresh1500380001575208835021MAR202512
4EQ004PumpPuneRamesh20007200004540109580011APR202511
5EQ005TurbineDelhiAjay500130005923010120022MAY202510
6EQ006PumpHyderabadRavi7002-500888268515015JUN20259
7EQ007MotorChennaiKiran110002500095358010018JUL20258
8EQ008CompressorMumbaiSuresh16006150004055159265025AUG20257
9EQ009PumpPuneRamesh.360001270208830019SEP20256
10EQ010TurbineDelhiAjay1300490002265188940002OCT20255
11EQ011PumpHyderabadRavi1000255001180238725017NOV20254
12EQ012MotorChennaiKiran1400395002072198445009DEC20253
13EQ013CompressorMumbaiSuresh1700821000603589990001JAN20262
14EQ014PumpPuneRamesh85024500982248623014FEB20261
15EQ015TurbineDelhiAjay60013200690288318005MAR20260

Calculates next month service.

5. Character Functions

data Char;

set equipment_raw;

Technician_Name = propcase(Technician);

Location_Upper = upcase(Location);

Location_Lower = lowcase(Location);

Clean_ID = strip(trim(Equipment_ID));

Full_ID = catx("-",Equipment_ID,Equipment_Type);

run;

proc print data=Char;

run;

OUTPUT:

ObsEquipment_IDEquipment_TypeLocationTechnicianUsage_HoursFailure_CountMaintenance_CostDowntime_HoursCondition_IndexRemaining_LifePercentageFeesMaintenance_DateTechnician_NameLocation_UpperLocation_LowerClean_IDFull_ID
1EQ001PumpHyderabadRavi1200250001085249025012JAN2025RaviHYDERABADhyderabadEQ001EQ001-Pump
2EQ002MotorChennaiKiran9005120003060188550014FEB2025KiranCHENNAIchennaiEQ002EQ002-Motor
3EQ003CompressorMumbaiSuresh1500380001575208835021MAR2025SureshMUMBAImumbaiEQ003EQ003-Compressor
4EQ004PumpPuneRamesh20007200004540109580011APR2025RameshPUNEpuneEQ004EQ004-Pump
5EQ005TurbineDelhiAjay500130005923010120022MAY2025AjayDELHIdelhiEQ005EQ005-Turbine
6EQ006PumpHyderabadRavi7002-500888268515015JUN2025RaviHYDERABADhyderabadEQ006EQ006-Pump
7EQ007MotorChennaiKiran110002500095358010018JUL2025KiranCHENNAIchennaiEQ007EQ007-Motor
8EQ008CompressorMumbaiSuresh16006150004055159265025AUG2025SureshMUMBAImumbaiEQ008EQ008-Compressor
9EQ009PumpPuneRamesh.360001270208830019SEP2025RameshPUNEpuneEQ009EQ009-Pump
10EQ010TurbineDelhiAjay1300490002265188940002OCT2025AjayDELHIdelhiEQ010EQ010-Turbine
11EQ011PumpHyderabadRavi1000255001180238725017NOV2025RaviHYDERABADhyderabadEQ011EQ011-Pump
12EQ012MotorChennaiKiran1400395002072198445009DEC2025KiranCHENNAIchennaiEQ012EQ012-Motor
13EQ013CompressorMumbaiSuresh1700821000603589990001JAN2026SureshMUMBAImumbaiEQ013EQ013-Compressor
14EQ014PumpPuneRamesh85024500982248623014FEB2026RameshPUNEpuneEQ014EQ014-Pump
15EQ015TurbineDelhiAjay60013200690288318005MAR2026AjayDELHIdelhiEQ015EQ015-Turbine

6. Numeric Functions

data Num;

set equipment_raw;

Cost_per_Hour = Maintenance_Cost / Usage_Hours;

Rounded_Cost = round(Maintenance_Cost);

Log_Cost = log(Maintenance_Cost);

run;

proc print data=Num;

run;

OUTPUT:

ObsEquipment_IDEquipment_TypeLocationTechnicianUsage_HoursFailure_CountMaintenance_CostDowntime_HoursCondition_IndexRemaining_LifePercentageFeesMaintenance_DateCost_per_HourRounded_CostLog_Cost
1EQ001PumpHyderabadRavi1200250001085249025012JAN20254.166750008.51719
2EQ002MotorChennaiKiran9005120003060188550014FEB202513.3333120009.39266
3EQ003CompressorMumbaiSuresh1500380001575208835021MAR20255.333380008.98720
4EQ004PumpPuneRamesh20007200004540109580011APR202510.0000200009.90349
5EQ005TurbineDelhiAjay500130005923010120022MAY20256.000030008.00637
6EQ006PumpHyderabadRavi7002-500888268515015JUN2025-0.7143-500.
7EQ007MotorChennaiKiran110002500095358010018JUL20252.272725007.82405
8EQ008CompressorMumbaiSuresh16006150004055159265025AUG20259.3750150009.61581
9EQ009PumpPuneRamesh.360001270208830019SEP2025.60008.69951
10EQ010TurbineDelhiAjay1300490002265188940002OCT20256.923190009.10498
11EQ011PumpHyderabadRavi1000255001180238725017NOV20255.500055008.61250
12EQ012MotorChennaiKiran1400395002072198445009DEC20256.785795009.15905
13EQ013CompressorMumbaiSuresh1700821000603589990001JAN202612.3529210009.95228
14EQ014PumpPuneRamesh85024500982248623014FEB20265.294145008.41183
15EQ015TurbineDelhiAjay60013200690288318005MAR20265.333332008.07091

7. PROC TRANSPOSE

proc transpose data=equipment_raw out=maintenance_transpose;

var Usage_Hours Maintenance_Cost Downtime_Hours;

run;

proc print data=maintenance_transpose;

run;

OUTPUT:

Obs_NAME_COL1COL2COL3COL4COL5COL6COL7COL8COL9COL10COL11COL12COL13COL14COL15
1Usage_Hours12009001500200050070011001600.1300100014001700850600
2Maintenance_Cost5000120008000200003000-50025001500060009000550095002100045003200
3Downtime_Hours1030154558040122211206096

8. Dataset Append

proc append base=equipment_raw

            data=utilization_class force;

run;

proc print data=equipment_raw;

run;

OUTPUT:

ObsEquipment_IDEquipment_TypeLocationTechnicianUsage_HoursFailure_CountMaintenance_CostDowntime_HoursCondition_IndexRemaining_LifePercentageFeesMaintenance_Date
1EQ001PumpHyderabadRavi1200250001085249025012JAN2025
2EQ002MotorChennaiKiran9005120003060188550014FEB2025
3EQ003CompressorMumbaiSuresh1500380001575208835021MAR2025
4EQ004PumpPuneRamesh20007200004540109580011APR2025
5EQ005TurbineDelhiAjay500130005923010120022MAY2025
6EQ006PumpHyderabadRavi7002-500888268515015JUN2025
7EQ007MotorChennaiKiran110002500095358010018JUL2025
8EQ008CompressorMumbaiSuresh16006150004055159265025AUG2025
9EQ009PumpPuneRamesh.360001270208830019SEP2025
10EQ010TurbineDelhiAjay1300490002265188940002OCT2025
11EQ011PumpHyderabadRavi1000255001180238725017NOV2025
12EQ012MotorChennaiKiran1400395002072198445009DEC2025
13EQ013CompressorMumbaiSuresh1700821000603589990001JAN2026
14EQ014PumpPuneRamesh85024500982248623014FEB2026
15EQ015TurbineDelhiAjay60013200690288318005MAR2026
16EQ001PumpHyderabadRavi1200250001085249025012JAN2025
17EQ002MotorChennaiKiran9005120003060188550014FEB2025
18EQ003CompressorMumbaiSuresh1500380001575208835021MAR2025
19EQ004PumpPuneRamesh20007200004540109580011APR2025
20EQ005TurbineDelhiAjay500130005923010120022MAY2025
21EQ006PumpHyderabadRavi7002-500888268515015JUN2025
22EQ007MotorChennaiKiran110002500095358010018JUL2025
23EQ008CompressorMumbaiSuresh16006150004055159265025AUG2025
24EQ009PumpPuneRamesh.360001270208830019SEP2025
25EQ010TurbineDelhiAjay1300490002265188940002OCT2025
26EQ011PumpHyderabadRavi1000255001180238725017NOV2025
27EQ012MotorChennaiKiran1400395002072198445009DEC2025
28EQ013CompressorMumbaiSuresh1700821000603589990001JAN2026
29EQ014PumpPuneRamesh85024500982248623014FEB2026
30EQ015TurbineDelhiAjay60013200690288318005MAR2026

9. Dataset Merge

proc sort data=equipment_raw;by Equipment_ID;run;

proc print data=equipment_raw;

run;

OUTPUT:

ObsEquipment_IDEquipment_TypeLocationTechnicianUsage_HoursFailure_CountMaintenance_CostDowntime_HoursCondition_IndexRemaining_LifePercentageFeesMaintenance_Date
1EQ001PumpHyderabadRavi1200250001085249025012JAN2025
2EQ001PumpHyderabadRavi1200250001085249025012JAN2025
3EQ002MotorChennaiKiran9005120003060188550014FEB2025
4EQ002MotorChennaiKiran9005120003060188550014FEB2025
5EQ003CompressorMumbaiSuresh1500380001575208835021MAR2025
6EQ003CompressorMumbaiSuresh1500380001575208835021MAR2025
7EQ004PumpPuneRamesh20007200004540109580011APR2025
8EQ004PumpPuneRamesh20007200004540109580011APR2025
9EQ005TurbineDelhiAjay500130005923010120022MAY2025
10EQ005TurbineDelhiAjay500130005923010120022MAY2025
11EQ006PumpHyderabadRavi7002-500888268515015JUN2025
12EQ006PumpHyderabadRavi7002-500888268515015JUN2025
13EQ007MotorChennaiKiran110002500095358010018JUL2025
14EQ007MotorChennaiKiran110002500095358010018JUL2025
15EQ008CompressorMumbaiSuresh16006150004055159265025AUG2025
16EQ008CompressorMumbaiSuresh16006150004055159265025AUG2025
17EQ009PumpPuneRamesh.360001270208830019SEP2025
18EQ009PumpPuneRamesh.360001270208830019SEP2025
19EQ010TurbineDelhiAjay1300490002265188940002OCT2025
20EQ010TurbineDelhiAjay1300490002265188940002OCT2025
21EQ011PumpHyderabadRavi1000255001180238725017NOV2025
22EQ011PumpHyderabadRavi1000255001180238725017NOV2025
23EQ012MotorChennaiKiran1400395002072198445009DEC2025
24EQ012MotorChennaiKiran1400395002072198445009DEC2025
25EQ013CompressorMumbaiSuresh1700821000603589990001JAN2026
26EQ013CompressorMumbaiSuresh1700821000603589990001JAN2026
27EQ014PumpPuneRamesh85024500982248623014FEB2026
28EQ014PumpPuneRamesh85024500982248623014FEB2026
29EQ015TurbineDelhiAjay60013200690288318005MAR2026
30EQ015TurbineDelhiAjay60013200690288318005MAR2026

proc sort data=fraud_flag;by Equipment_ID;run;

proc print data=fraud_flag;

run;

OUTPUT:

ObsEquipment_IDEquipment_TypeLocationTechnicianUsage_HoursFailure_CountMaintenance_CostDowntime_HoursCondition_IndexRemaining_LifePercentageFeesMaintenance_DateFraud_Flag
1EQ001PumpHyderabadRavi1200250001085249025012JAN2025NO
2EQ002MotorChennaiKiran9005120003060188550014FEB2025NO
3EQ003CompressorMumbaiSuresh1500380001575208835021MAR2025NO
4EQ004PumpPuneRamesh20007200004540109580011APR2025NO
5EQ005TurbineDelhiAjay500130005923010120022MAY2025NO
6EQ006PumpHyderabadRavi7002-500888268515015JUN2025NO
7EQ007MotorChennaiKiran110002500095358010018JUL2025NO
8EQ008CompressorMumbaiSuresh16006150004055159265025AUG2025NO
9EQ009PumpPuneRamesh.360001270208830019SEP2025NO
10EQ010TurbineDelhiAjay1300490002265188940002OCT2025NO
11EQ011PumpHyderabadRavi1000255001180238725017NOV2025NO
12EQ012MotorChennaiKiran1400395002072198445009DEC2025NO
13EQ013CompressorMumbaiSuresh1700821000603589990001JAN2026YES
14EQ014PumpPuneRamesh85024500982248623014FEB2026NO
15EQ015TurbineDelhiAjay60013200690288318005MAR2026NO

data final_maintenance;

merge equipment_raw

      fraud_flag;

by Equipment_ID;

run;

proc print data=final_maintenance;

run;

OUTPUT:

ObsEquipment_IDEquipment_TypeLocationTechnicianUsage_HoursFailure_CountMaintenance_CostDowntime_HoursCondition_IndexRemaining_LifePercentageFeesMaintenance_DateFraud_Flag
1EQ001PumpHyderabadRavi1200250001085249025012JAN2025NO
2EQ001PumpHyderabadRavi1200250001085249025012JAN2025NO
3EQ002MotorChennaiKiran9005120003060188550014FEB2025NO
4EQ002MotorChennaiKiran9005120003060188550014FEB2025NO
5EQ003CompressorMumbaiSuresh1500380001575208835021MAR2025NO
6EQ003CompressorMumbaiSuresh1500380001575208835021MAR2025NO
7EQ004PumpPuneRamesh20007200004540109580011APR2025NO
8EQ004PumpPuneRamesh20007200004540109580011APR2025NO
9EQ005TurbineDelhiAjay500130005923010120022MAY2025NO
10EQ005TurbineDelhiAjay500130005923010120022MAY2025NO
11EQ006PumpHyderabadRavi7002-500888268515015JUN2025NO
12EQ006PumpHyderabadRavi7002-500888268515015JUN2025NO
13EQ007MotorChennaiKiran110002500095358010018JUL2025NO
14EQ007MotorChennaiKiran110002500095358010018JUL2025NO
15EQ008CompressorMumbaiSuresh16006150004055159265025AUG2025NO
16EQ008CompressorMumbaiSuresh16006150004055159265025AUG2025NO
17EQ009PumpPuneRamesh.360001270208830019SEP2025NO
18EQ009PumpPuneRamesh.360001270208830019SEP2025NO
19EQ010TurbineDelhiAjay1300490002265188940002OCT2025NO
20EQ010TurbineDelhiAjay1300490002265188940002OCT2025NO
21EQ011PumpHyderabadRavi1000255001180238725017NOV2025NO
22EQ011PumpHyderabadRavi1000255001180238725017NOV2025NO
23EQ012MotorChennaiKiran1400395002072198445009DEC2025NO
24EQ012MotorChennaiKiran1400395002072198445009DEC2025NO
25EQ013CompressorMumbaiSuresh1700821000603589990001JAN2026YES
26EQ013CompressorMumbaiSuresh1700821000603589990001JAN2026YES
27EQ014PumpPuneRamesh85024500982248623014FEB2026NO
28EQ014PumpPuneRamesh85024500982248623014FEB2026NO
29EQ015TurbineDelhiAjay60013200690288318005MAR2026NO
30EQ015TurbineDelhiAjay60013200690288318005MAR2026NO

10. PROC DATASETS Delete

proc datasets library=work;

delete maintenance_transpose;

quit;

LOG:

NOTE: Deleting WORK.MAINTENANCE_TRANSPOSE (memtype=DATA).

11. Corrected Dataset Code (Full Program)

data equipment_clean;

set equipment_raw;

if Percentage>100 then Percentage=100;

if Maintenance_Cost<0 then Maintenance_Cost=abs(Maintenance_Cost);

if Usage_Hours=. then Usage_Hours=1000;

if Maintenance_Cost>20000 then Fraud_Flag="YES";

else Fraud_Flag="NO";

Cost_per_Hour = Maintenance_Cost / Usage_Hours;

Next_Service_Date = intnx('year',Maintenance_Date,1,'same');

format Next_Service_Date date9.;

run;

proc print data=equipment_clean;

run;

OUTPUT:
ObsEquipment_IDEquipment_TypeLocationTechnicianUsage_HoursFailure_CountMaintenance_CostDowntime_HoursCondition_IndexRemaining_LifePercentageFeesMaintenance_DateFraud_FlagCost_per_HourNext_Service_Date
1EQ001PumpHyderabadRavi1200250001085249025012JAN2025NO4.166712JAN2026
2EQ001PumpHyderabadRavi1200250001085249025012JAN2025NO4.166712JAN2026
3EQ002MotorChennaiKiran9005120003060188550014FEB2025NO13.333314FEB2026
4EQ002MotorChennaiKiran9005120003060188550014FEB2025NO13.333314FEB2026
5EQ003CompressorMumbaiSuresh1500380001575208835021MAR2025NO5.333321MAR2026
6EQ003CompressorMumbaiSuresh1500380001575208835021MAR2025NO5.333321MAR2026
7EQ004PumpPuneRamesh20007200004540109580011APR2025NO10.000011APR2026
8EQ004PumpPuneRamesh20007200004540109580011APR2025NO10.000011APR2026
9EQ005TurbineDelhiAjay500130005923010020022MAY2025NO6.000022MAY2026
10EQ005TurbineDelhiAjay500130005923010020022MAY2025NO6.000022MAY2026
11EQ006PumpHyderabadRavi7002500888268515015JUN2025NO0.714315JUN2026
12EQ006PumpHyderabadRavi7002500888268515015JUN2025NO0.714315JUN2026
13EQ007MotorChennaiKiran110002500095358010018JUL2025NO2.272718JUL2026
14EQ007MotorChennaiKiran110002500095358010018JUL2025NO2.272718JUL2026
15EQ008CompressorMumbaiSuresh16006150004055159265025AUG2025NO9.375025AUG2026
16EQ008CompressorMumbaiSuresh16006150004055159265025AUG2025NO9.375025AUG2026
17EQ009PumpPuneRamesh1000360001270208830019SEP2025NO6.000019SEP2026
18EQ009PumpPuneRamesh1000360001270208830019SEP2025NO6.000019SEP2026
19EQ010TurbineDelhiAjay1300490002265188940002OCT2025NO6.923102OCT2026
20EQ010TurbineDelhiAjay1300490002265188940002OCT2025NO6.923102OCT2026
21EQ011PumpHyderabadRavi1000255001180238725017NOV2025NO5.500017NOV2026
22EQ011PumpHyderabadRavi1000255001180238725017NOV2025NO5.500017NOV2026
23EQ012MotorChennaiKiran1400395002072198445009DEC2025NO6.785709DEC2026
24EQ012MotorChennaiKiran1400395002072198445009DEC2025NO6.785709DEC2026
25EQ013CompressorMumbaiSuresh1700821000603589990001JAN2026YES12.352901JAN2027
26EQ013CompressorMumbaiSuresh1700821000603589990001JAN2026YES12.352901JAN2027
27EQ014PumpPuneRamesh85024500982248623014FEB2026NO5.294114FEB2027
28EQ014PumpPuneRamesh85024500982248623014FEB2026NO5.294114FEB2027
29EQ015TurbineDelhiAjay60013200690288318005MAR2026NO5.333305MAR2027
30EQ015TurbineDelhiAjay60013200690288318005MAR2026NO5.333305MAR2027

20 Key Points About This Project

1.     The project begins by designing a realistic industrial equipment maintenance dataset.

2.     It includes operational variables such as Usage Hours, Failure Count, and Downtime.

3.     Financial metrics such as Maintenance Cost and Service Fees are incorporated.

4.     Equipment health is represented using Condition Index and Remaining Life.

5.     The dataset intentionally introduces multiple data quality errors.

6.     Examples include negative maintenance cost and missing usage hours.

7.     Invalid percentages greater than 100% were deliberately inserted.

8.     SAS DATA step programming was used to build the dataset.

9.     Character cleaning functions such as STRIP and TRIM were applied.

10.  Formatting improvements were performed using PROPCASE, UPCASE, and LOWCASE.

11.  Numeric functions were used to derive metrics like cost per hour.

12.  Date calculations were performed using MDY, INTCK, and INTNX.

13.  A fraud detection macro was implemented to identify abnormal maintenance costs.

14.  Equipment utilization was categorized as LOW, MEDIUM, or HIGH.

15.  PROC TRANSPOSE was used to restructure maintenance variables.

16.  Multiple datasets were combined using SET, MERGE, and APPEND.

17.  Incorrect values were corrected using conditional logic in DATA steps.

18.  Temporary datasets were cleaned using PROC DATASETS DELETE.

19.  The final dataset supports maintenance analytics and fraud detection.

20.  The completed project demonstrates advanced SAS data engineering techniques used in real industrial analytics environments.

Project Summary

This project demonstrates how SAS programming can be used to manage and analyze industrial equipment maintenance data while detecting and correcting data quality issues. A simulated dataset was created containing variables such as Equipment_ID, Usage_Hours, Failure_Count, Maintenance_Cost, Downtime_Hours, Condition_Index, Remaining_Life, Percentage, Fees, and Maintenance_Date. Several intentional errors were introduced into the dataset, including negative maintenance costs, missing usage hours, invalid percentages greater than 100, and suspiciously high maintenance expenses. These errors represent common real-world data issues encountered in industrial maintenance analytics.

Using SAS DATA step programming, macros, character functions, numeric functions, and date functions such as MDY, INTCK, and INTNX, the dataset was cleaned and transformed. Fraud detection logic was implemented to flag abnormal maintenance costs, and utilization classification was created to categorize equipment usage levels. Additional dataset manipulation techniques such as SET, MERGE, APPEND, and PROC TRANSPOSE were applied. The final cleaned dataset provides reliable maintenance insights for operational decision-making and predictive equipment management.

Conclusion

This project demonstrates a complete industrial equipment maintenance analytics workflow using SAS programming. The program begins with the creation of a raw dataset that intentionally contains multiple data errors, such as missing values, invalid percentages, negative maintenance costs, and suspiciously high repair expenses.

Through the use of SAS DATA steps, macros, dataset manipulation techniques, and built-in functions, the program systematically detects these issues, flags fraudulent transactions, and corrects invalid data.

Key SAS techniques include:

·  Dataset creation using DATA step

·  Fraud detection using macros

·  Data cleaning and validation

·  Date calculations using MDY, INTCK, and INTNX

·  Character manipulation using STRIP, TRIM, CATX, PROPCASE

·  Numeric analysis using mathematical functions

·  Dataset transformation using PROC TRANSPOSE

·  Dataset merging and appending

·  Dataset management using PROC DATASETS

The final cleaned dataset provides reliable information for industrial maintenance decision-making, enabling organizations to reduce downtime, detect fraudulent maintenance activities, and improve equipment life cycle management.

SAS INTERVIEW QUESTIONS

1. Explain ARRAY processing in SAS?

An ARRAY in SAS is used to process multiple variables together using a single loop instead of writing repetitive code.

Example use:

·  Apply calculations to many variables

·  Data cleaning across multiple columns

Example:

array scores score1-score5;

do i=1 to 5;

   scores{i}=scores{i}+10;

end;

Short: ARRAY allows efficient processing of multiple variables using loops.

 2. What is the purpose of DO UNTIL and DO WHILE?

Both are looping statements used to repeat a block of code.

Statement

Condition Check

DO WHILE

Condition checked before execution

DO UNTIL

Condition checked after execution

Short:

 ·  DO WHILE runs while condition is true.

 ·  DO UNTIL runs until condition becomes true.

3. How do you remove duplicates in SAS?

Duplicates are usually removed using PROC SORT with NODUPKEY.

Example:

proc sort data=dataset nodupkey;

by id;

run;

Short: NODUPKEY removes observations with duplicate values of the BY variable.

--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

About the Author:

SAS Learning Hub is a data analytics and SAS programming platform focused on clinical, financial, and real-world data analysis. The content is created by professionals with academic training in Pharmaceutics and hands-on experience in Base SAS, PROC SQL, Macros, SDTM, and ADaM, providing practical and industry-relevant SAS learning resources.


Disclaimer:

The datasets and analysis in this article are created for educational and demonstration purposes only. They do not represent EQUIPMENT MAINTENANCE data.


Our Mission:

This blog provides industry-focused SAS programming tutorials and analytics projects covering finance, healthcare, and technology.


This project is suitable for:

·  Students learning SAS

·  Data analysts building portfolios

·  Professionals preparing for SAS interviews

·  Bloggers writing about analytics 

·  Clinical SAS Programmer

·  Research Data Analyst

·  Regulatory Data Validator

--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Follow Us On : 


 
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

--->Follow our blog for more SAS-based analytics projects and industry data models.

---> Support Us By Following Our Blog..

To deepen your understanding of SAS analytics, please refer to our other data science and industry-focused projects listed below:

1.Which Country Truly Dominates the Olympics? – A Complete SAS Medal Efficiency Analytics Project

2.Which Airports Are Really the Busiest? – An End-to-End SAS Airport Traffic Analytics Project

3.Can Data Predict Election Outcomes? – A Complete SAS Voting Analytics Project

--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

About Us | Contact Privacy Policy


Comments

Popular posts from this blog

458.Data Cleaning Secrets Using Famous Food Dataset:Handling Duplicate Records in SAS

453.Global AI Trends Unlocked Through SCAN and SUBSTR Precision in SAS

441.Fixing Negative Data Errors Like A Pro Using SAS ABS Function