404.What happens if maintenance cost is accidentally stored as character data, and how does SAS expose this error?
What happens if maintenance cost is accidentally stored as character data, and how does SAS expose this error?
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
HERE WE USED THESE SAS STATEMENTS AND
FUNCTIONS FOR THIS PROJECT:
DATA STEP | SET | INPUT | DATALINES | FORMAT | KEEP |
IF-THEN-ELSE | LENGTH | PUT | PROC CONTENTS | PROC DATASETS | PROC APPEND |
PROC SQL | PROC MEANS | PROC UNIVARIATE | PROC FREQ | PROC CORR | PROC
TRANSPOSE | PROC SGPLOT | MACRO | %MACRO / %MEND | INTCK | INTNX | MDY | CAT |
CATX | STRIP | TRIM | UPCASE | LOWCASE | PROPCASE | COALESCE
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
TABLE OF CONTENTS
1.
Introduction
2.
Business Context
3.
Project Objectives
4.
Dataset Design & Variables
5.
Raw Dataset Creation (with Intentional Error)
6.
Identifying & Explaining the Error
7.
Corrected Dataset – Full SAS Code
8.
Data Preparation & Formatting
9.
Date Handling (MDY, INTCK, INTNX)
10. Character
& Numeric Functions Usage
11. Reliability
& Utilization Classification (Macros)
12. Fraud
Detection Logic (Macros)
13. Descriptive
Analytics (PROC MEANS, UNIVARIATE)
14. Frequency
Analysis (PROC FREQ)
15. Correlation
Analysis (PROC CORR)
16. SQL-Based
Analytics
17. Data
Reshaping (SET, MERGE, APPEND, TRANSPOSE)
18. Visualization
(PROC SGPLOT)
19. Dataset
Cleanup (PROC DATASETS DELETE)
20. Business
Insights
21. 21
Key Points About This Project
22. Conclusion
1.INTRODUCTION
Lighthouses are critical maritime safety infrastructure.
They guide ships, prevent accidents, and ensure smooth navigation during
storms, fog, and night operations.
In this project, we treat lighthouses like operational assets, similar to data centers, telecom towers, or hospitals.
We analyze:
·
Reliability
·
Maintenance costs
·
Failure incidents
·
Power usage
·
Fraud or abnormal behavior
·
Operational efficiency
2. BUSINESS CONTEXT
Maritime authorities face challenges like:
·
High maintenance expenses
·
Power failures
·
Equipment aging
·
False maintenance claims
·
Inconsistent reliability
Using SAS, we can:
·
Detect risky lighthouses
·
Identify fraud-prone maintenance records
·
Optimize maintenance planning
·
Improve safety compliance
3. PROJECT OBJECTIVES
1. Create a
lighthouse dataset (15+ observations)
2. Introduce intentional data error
3. Detect and
fix the error
4. Perform full analytics using SAS
5. Use macros, SQL, dates, functions
6. Generate visual insights
7. Prepare interview-grade explanations
4. DATASET DESIGN
Variables Used
|
Variable |
Description |
|
Lighthouse_ID |
Unique ID |
|
Lighthouse_Name |
Name |
|
Coastline |
East / West / South |
|
Visibility_Range |
Nautical miles |
|
Maintenance_Cost |
Annual cost |
|
Failure_Incidents |
Yearly failures |
|
Power_Source |
Solar / Electric / Hybrid |
|
Installation_Date |
Commission date |
|
Last_Service_Date |
Last maintenance |
|
Reliability_Index |
Calculated percentage |
|
Utilization_Class |
High / Medium / Low |
|
Fraud_Flag |
Yes / No |
5. RAW DATASET (WITH INTENTIONAL ERROR)
Intentional Error:
Maintenance_Cost entered as character ("$45000") instead of numeric
data lighthouse_raw;
input Lighthouse_ID Lighthouse_Name:$25. Coastline:$10.
Visibility_Range Maintenance_Cost $ Failure_Incidents
Power_Source:$10. Installation_Date : date9.
Last_Service_Date : date9.;
format Installation_Date Last_Service_Date date9.;
datalines;
101 Alpha_Point East 25 $45000 2 Solar 15JAN2015 12JAN2025
102 Beacon_Rock West 30 $52000 1 Electric 10FEB2016 05JAN2025
103 Coral_Light South 18 $38000 4 Hybrid 22MAR2014 20DEC2024
;
run;
proc print data=lighthouse_raw;
run;
OUTPUT:
| Obs | Lighthouse_ID | Lighthouse_Name | Coastline | Visibility_Range | Maintenance_Cost | Failure_Incidents | Power_Source | Installation_Date | Last_Service_Date |
|---|---|---|---|---|---|---|---|---|---|
| 1 | 101 | Alpha_Point | East | 25 | $45000 | 2 | Solar | 15JAN2015 | 12JAN2025 |
| 2 | 102 | Beacon_Rock | West | 30 | $52000 | 1 | Electric | 10FEB2016 | 05JAN2025 |
| 3 | 103 | Coral_Light | South | 18 | $38000 | 4 | Hybrid | 22MAR2014 | 20DEC2024 |
6.IDENTIFYING THE ERROR
·
Maintenance_Cost should be numeric
·
Stored as character
·
Causes:
·
PROC MEANS failure
·
Incorrect calculations
·
SQL aggregation errors
Detection
proc contents data=lighthouse_raw;
run;
OUTPUT:
The CONTENTS Procedure
| Data Set Name | WORK.LIGHTHOUSE_RAW | Observations | 3 |
|---|---|---|---|
| Member Type | DATA | Variables | 9 |
| Engine | V9 | Indexes | 0 |
| Created | 02/17/2026 20:00:36 | Observation Length | 96 |
| Last Modified | 02/17/2026 20:00:36 | Deleted Observations | 0 |
| Protection | Compressed | NO | |
| Data Set Type | Sorted | NO | |
| Label | |||
| Data Representation | SOLARIS_X86_64, LINUX_X86_64, ALPHA_TRU64, LINUX_IA64 | ||
| Encoding | utf-8 Unicode (UTF-8) |
| Engine/Host Dependent Information | |
|---|---|
| Data Set Page Size | 131072 |
| Number of Data Set Pages | 1 |
| First Data Page | 1 |
| Max Obs per Page | 1363 |
| Obs in First Data Page | 3 |
| Number of Data Set Repairs | 0 |
| Filename | /saswork/SAS_workFFCC00019F1A_odaws02-apse1-2.oda.sas.com/SAS_work313F00019F1A_odaws02-apse1-2.oda.sas.com/lighthouse_raw.sas7bdat |
| Release Created | 9.0401M8 |
| Host Created | Linux |
| Inode Number | 67111648 |
| Access Permission | rw-r--r-- |
| Owner Name | u63247146 |
| File Size | 256KB |
| File Size (bytes) | 262144 |
| Alphabetic List of Variables and Attributes | ||||
|---|---|---|---|---|
| # | Variable | Type | Len | Format |
| 3 | Coastline | Char | 10 | |
| 6 | Failure_Incidents | Num | 8 | |
| 8 | Installation_Date | Num | 8 | DATE9. |
| 9 | Last_Service_Date | Num | 8 | DATE9. |
| 1 | Lighthouse_ID | Num | 8 | |
| 2 | Lighthouse_Name | Char | 25 | |
| 5 | Maintenance_Cost | Char | 8 | |
| 7 | Power_Source | Char | 10 | |
| 4 | Visibility_Range | Num | 8 | |
data lighthouse;
input Lighthouse_ID Lighthouse_Name:$25. Coastline:$10.
Visibility_Range Maintenance_Cost Failure_Incidents
Power_Source:$10. Installation_Date : date9.
Last_Service_Date : date9.;
format Installation_Date Last_Service_Date date9.;
datalines;
101 Alpha_Point East 25 45000 2 Solar 15JAN2015 12JAN2025
102 Beacon_Rock West 30 52000 1 Electric 10FEB2016 05JAN2025
103 Coral_Light South 18 38000 4 Hybrid 22MAR2014 20DEC2024
104 Delta_Beam East 22 41000 3 Solar 11APR2017 18JAN2025
105 Eagle_Eye West 35 61000 0 Electric 09MAY2018 01JAN2025
106 Falcon_Tower South 20 39500 2 Hybrid 15JUN2016 10DEC2024
107 Gulf_Guide East 28 47000 1 Solar 12JUL2019 02JAN2025
108 Harbor_Light West 32 58000 0 Electric 18AUG2020 05JAN2025
109 Island_Beam South 16 36000 5 Hybrid 21SEP2013 22DEC2024
110 Jetty_Light East 26 45500 2 Solar 10OCT2015 12JAN2025
111 Keystone West 34 60000 1 Electric 15NOV2017 05JAN2025
112 Lagoon_Light South 19 39000 3 Hybrid 01DEC2016 20DEC2024
113 Marina_Beam East 27 46500 1 Solar 25JAN2019 02JAN2025
114 Nautical_Point West 33 59000 0 Electric 11FEB2021 05JAN2025
115 Ocean_Guard South 21 40500 4 Hybrid 18MAR2014 18DEC2024
;
run;
proc print data=lighthouse;
run;
OUTPUT:
| Obs | Lighthouse_ID | Lighthouse_Name | Coastline | Visibility_Range | Maintenance_Cost | Failure_Incidents | Power_Source | Installation_Date | Last_Service_Date |
|---|---|---|---|---|---|---|---|---|---|
| 1 | 101 | Alpha_Point | East | 25 | 45000 | 2 | Solar | 15JAN2015 | 12JAN2025 |
| 2 | 102 | Beacon_Rock | West | 30 | 52000 | 1 | Electric | 10FEB2016 | 05JAN2025 |
| 3 | 103 | Coral_Light | South | 18 | 38000 | 4 | Hybrid | 22MAR2014 | 20DEC2024 |
| 4 | 104 | Delta_Beam | East | 22 | 41000 | 3 | Solar | 11APR2017 | 18JAN2025 |
| 5 | 105 | Eagle_Eye | West | 35 | 61000 | 0 | Electric | 09MAY2018 | 01JAN2025 |
| 6 | 106 | Falcon_Tower | South | 20 | 39500 | 2 | Hybrid | 15JUN2016 | 10DEC2024 |
| 7 | 107 | Gulf_Guide | East | 28 | 47000 | 1 | Solar | 12JUL2019 | 02JAN2025 |
| 8 | 108 | Harbor_Light | West | 32 | 58000 | 0 | Electric | 18AUG2020 | 05JAN2025 |
| 9 | 109 | Island_Beam | South | 16 | 36000 | 5 | Hybrid | 21SEP2013 | 22DEC2024 |
| 10 | 110 | Jetty_Light | East | 26 | 45500 | 2 | Solar | 10OCT2015 | 12JAN2025 |
| 11 | 111 | Keystone | West | 34 | 60000 | 1 | Electric | 15NOV2017 | 05JAN2025 |
| 12 | 112 | Lagoon_Light | South | 19 | 39000 | 3 | Hybrid | 01DEC2016 | 20DEC2024 |
| 13 | 113 | Marina_Beam | East | 27 | 46500 | 1 | Solar | 25JAN2019 | 02JAN2025 |
| 14 | 114 | Nautical_Point | West | 33 | 59000 | 0 | Electric | 11FEB2021 | 05JAN2025 |
| 15 | 115 | Ocean_Guard | South | 21 | 40500 | 4 | Hybrid | 18MAR2014 | 18DEC2024 |
8. RELIABILITY INDEX CALCULATION
data lighthouse_enriched;
set lighthouse;
Reliability_Index = (1 - (Failure_Incidents/10)) * 100;
run;
proc print data=lighthouse_enriched;
run;
OUTPUT:
| Obs | Lighthouse_ID | Lighthouse_Name | Coastline | Visibility_Range | Maintenance_Cost | Failure_Incidents | Power_Source | Installation_Date | Last_Service_Date | Reliability_Index |
|---|---|---|---|---|---|---|---|---|---|---|
| 1 | 101 | Alpha_Point | East | 25 | 45000 | 2 | Solar | 15JAN2015 | 12JAN2025 | 80 |
| 2 | 102 | Beacon_Rock | West | 30 | 52000 | 1 | Electric | 10FEB2016 | 05JAN2025 | 90 |
| 3 | 103 | Coral_Light | South | 18 | 38000 | 4 | Hybrid | 22MAR2014 | 20DEC2024 | 60 |
| 4 | 104 | Delta_Beam | East | 22 | 41000 | 3 | Solar | 11APR2017 | 18JAN2025 | 70 |
| 5 | 105 | Eagle_Eye | West | 35 | 61000 | 0 | Electric | 09MAY2018 | 01JAN2025 | 100 |
| 6 | 106 | Falcon_Tower | South | 20 | 39500 | 2 | Hybrid | 15JUN2016 | 10DEC2024 | 80 |
| 7 | 107 | Gulf_Guide | East | 28 | 47000 | 1 | Solar | 12JUL2019 | 02JAN2025 | 90 |
| 8 | 108 | Harbor_Light | West | 32 | 58000 | 0 | Electric | 18AUG2020 | 05JAN2025 | 100 |
| 9 | 109 | Island_Beam | South | 16 | 36000 | 5 | Hybrid | 21SEP2013 | 22DEC2024 | 50 |
| 10 | 110 | Jetty_Light | East | 26 | 45500 | 2 | Solar | 10OCT2015 | 12JAN2025 | 80 |
| 11 | 111 | Keystone | West | 34 | 60000 | 1 | Electric | 15NOV2017 | 05JAN2025 | 90 |
| 12 | 112 | Lagoon_Light | South | 19 | 39000 | 3 | Hybrid | 01DEC2016 | 20DEC2024 | 70 |
| 13 | 113 | Marina_Beam | East | 27 | 46500 | 1 | Solar | 25JAN2019 | 02JAN2025 | 90 |
| 14 | 114 | Nautical_Point | West | 33 | 59000 | 0 | Electric | 11FEB2021 | 05JAN2025 | 100 |
| 15 | 115 | Ocean_Guard | South | 21 | 40500 | 4 | Hybrid | 18MAR2014 | 18DEC2024 | 60 |
· Converts failures into percentage
reliability
· Standard KPI used in asset management
9. DATE FUNCTIONS (MDY, INTCK, INTNX)
data lighthouse_dates;
set lighthouse_enriched;
Next_Service_Date = intnx('month', Last_Service_Date, 6);
Years_In_Service = intck('year', Installation_Date, today());
format Next_Service_Date date9.;
run;
proc print data=lighthouse_dates;
var Lighthouse_ID Lighthouse_Name Coastline Next_Service_Date Years_In_Service
Installation_Date;
run;
OUTPUT:
| Obs | Lighthouse_ID | Lighthouse_Name | Coastline | Next_Service_Date | Years_In_Service | Installation_Date |
|---|---|---|---|---|---|---|
| 1 | 101 | Alpha_Point | East | 01JUL2025 | 11 | 15JAN2015 |
| 2 | 102 | Beacon_Rock | West | 01JUL2025 | 10 | 10FEB2016 |
| 3 | 103 | Coral_Light | South | 01JUN2025 | 12 | 22MAR2014 |
| 4 | 104 | Delta_Beam | East | 01JUL2025 | 9 | 11APR2017 |
| 5 | 105 | Eagle_Eye | West | 01JUL2025 | 8 | 09MAY2018 |
| 6 | 106 | Falcon_Tower | South | 01JUN2025 | 10 | 15JUN2016 |
| 7 | 107 | Gulf_Guide | East | 01JUL2025 | 7 | 12JUL2019 |
| 8 | 108 | Harbor_Light | West | 01JUL2025 | 6 | 18AUG2020 |
| 9 | 109 | Island_Beam | South | 01JUN2025 | 13 | 21SEP2013 |
| 10 | 110 | Jetty_Light | East | 01JUL2025 | 11 | 10OCT2015 |
| 11 | 111 | Keystone | West | 01JUL2025 | 9 | 15NOV2017 |
| 12 | 112 | Lagoon_Light | South | 01JUN2025 | 10 | 01DEC2016 |
| 13 | 113 | Marina_Beam | East | 01JUL2025 | 7 | 25JAN2019 |
| 14 | 114 | Nautical_Point | West | 01JUL2025 | 5 | 11FEB2021 |
| 15 | 115 | Ocean_Guard | South | 01JUN2025 | 12 | 18MAR2014 |
10. CHARACTER & NUMERIC FUNCTIONS
data lighthouse_clean;
set lighthouse_dates;
Lighthouse_Name = propcase(strip(Lighthouse_Name));
Coastline = upcase(Coastline);
Power_Source = lowcase(Power_Source);
Full_Name = catx(' - ', Lighthouse_Name, Coastline);
Cost_Filled = coalesce(Maintenance_Cost,0);
run;
proc print data=lighthouse_clean;
var Lighthouse_ID Lighthouse_Name Coastline Power_Source Full_Name Cost_Filled
Maintenance_Cost;
run;
OUTPUT:
| Obs | Lighthouse_ID | Lighthouse_Name | Coastline | Power_Source | Full_Name | Cost_Filled | Maintenance_Cost |
|---|---|---|---|---|---|---|---|
| 1 | 101 | Alpha_point | EAST | solar | Alpha_point - EAST | 45000 | 45000 |
| 2 | 102 | Beacon_rock | WEST | electric | Beacon_rock - WEST | 52000 | 52000 |
| 3 | 103 | Coral_light | SOUTH | hybrid | Coral_light - SOUTH | 38000 | 38000 |
| 4 | 104 | Delta_beam | EAST | solar | Delta_beam - EAST | 41000 | 41000 |
| 5 | 105 | Eagle_eye | WEST | electric | Eagle_eye - WEST | 61000 | 61000 |
| 6 | 106 | Falcon_tower | SOUTH | hybrid | Falcon_tower - SOUTH | 39500 | 39500 |
| 7 | 107 | Gulf_guide | EAST | solar | Gulf_guide - EAST | 47000 | 47000 |
| 8 | 108 | Harbor_light | WEST | electric | Harbor_light - WEST | 58000 | 58000 |
| 9 | 109 | Island_beam | SOUTH | hybrid | Island_beam - SOUTH | 36000 | 36000 |
| 10 | 110 | Jetty_light | EAST | solar | Jetty_light - EAST | 45500 | 45500 |
| 11 | 111 | Keystone | WEST | electric | Keystone - WEST | 60000 | 60000 |
| 12 | 112 | Lagoon_light | SOUTH | hybrid | Lagoon_light - SOUTH | 39000 | 39000 |
| 13 | 113 | Marina_beam | EAST | solar | Marina_beam - EAST | 46500 | 46500 |
| 14 | 114 | Nautical_point | WEST | electric | Nautical_point - WEST | 59000 | 59000 |
| 15 | 115 | Ocean_guard | SOUTH | hybrid | Ocean_guard - SOUTH | 40500 | 40500 |
11. UTILIZATION CLASSIFICATION (MACRO)
%macro utilization;
data lighthouse_util;
set lighthouse_clean;
length Utilization_Class $8.;
if Reliability_Index >= 90 then Utilization_Class='HIGH';
else if Reliability_Index >= 75 then Utilization_Class='MEDIUM';
else Utilization_Class='LOW';
run;
proc print data=lighthouse_util;
run;
%mend;
%utilization;
OUTPUT:
| Obs | Lighthouse_ID | Lighthouse_Name | Coastline | Visibility_Range | Maintenance_Cost | Failure_Incidents | Power_Source | Installation_Date | Last_Service_Date | Reliability_Index | Next_Service_Date | Years_In_Service | Full_Name | Cost_Filled | Utilization_Class |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | 101 | Alpha_point | EAST | 25 | 45000 | 2 | solar | 15JAN2015 | 12JAN2025 | 80 | 01JUL2025 | 11 | Alpha_point - EAST | 45000 | MEDIUM |
| 2 | 102 | Beacon_rock | WEST | 30 | 52000 | 1 | electric | 10FEB2016 | 05JAN2025 | 90 | 01JUL2025 | 10 | Beacon_rock - WEST | 52000 | HIGH |
| 3 | 103 | Coral_light | SOUTH | 18 | 38000 | 4 | hybrid | 22MAR2014 | 20DEC2024 | 60 | 01JUN2025 | 12 | Coral_light - SOUTH | 38000 | LOW |
| 4 | 104 | Delta_beam | EAST | 22 | 41000 | 3 | solar | 11APR2017 | 18JAN2025 | 70 | 01JUL2025 | 9 | Delta_beam - EAST | 41000 | LOW |
| 5 | 105 | Eagle_eye | WEST | 35 | 61000 | 0 | electric | 09MAY2018 | 01JAN2025 | 100 | 01JUL2025 | 8 | Eagle_eye - WEST | 61000 | HIGH |
| 6 | 106 | Falcon_tower | SOUTH | 20 | 39500 | 2 | hybrid | 15JUN2016 | 10DEC2024 | 80 | 01JUN2025 | 10 | Falcon_tower - SOUTH | 39500 | MEDIUM |
| 7 | 107 | Gulf_guide | EAST | 28 | 47000 | 1 | solar | 12JUL2019 | 02JAN2025 | 90 | 01JUL2025 | 7 | Gulf_guide - EAST | 47000 | HIGH |
| 8 | 108 | Harbor_light | WEST | 32 | 58000 | 0 | electric | 18AUG2020 | 05JAN2025 | 100 | 01JUL2025 | 6 | Harbor_light - WEST | 58000 | HIGH |
| 9 | 109 | Island_beam | SOUTH | 16 | 36000 | 5 | hybrid | 21SEP2013 | 22DEC2024 | 50 | 01JUN2025 | 13 | Island_beam - SOUTH | 36000 | LOW |
| 10 | 110 | Jetty_light | EAST | 26 | 45500 | 2 | solar | 10OCT2015 | 12JAN2025 | 80 | 01JUL2025 | 11 | Jetty_light - EAST | 45500 | MEDIUM |
| 11 | 111 | Keystone | WEST | 34 | 60000 | 1 | electric | 15NOV2017 | 05JAN2025 | 90 | 01JUL2025 | 9 | Keystone - WEST | 60000 | HIGH |
| 12 | 112 | Lagoon_light | SOUTH | 19 | 39000 | 3 | hybrid | 01DEC2016 | 20DEC2024 | 70 | 01JUN2025 | 10 | Lagoon_light - SOUTH | 39000 | LOW |
| 13 | 113 | Marina_beam | EAST | 27 | 46500 | 1 | solar | 25JAN2019 | 02JAN2025 | 90 | 01JUL2025 | 7 | Marina_beam - EAST | 46500 | HIGH |
| 14 | 114 | Nautical_point | WEST | 33 | 59000 | 0 | electric | 11FEB2021 | 05JAN2025 | 100 | 01JUL2025 | 5 | Nautical_point - WEST | 59000 | HIGH |
| 15 | 115 | Ocean_guard | SOUTH | 21 | 40500 | 4 | hybrid | 18MAR2014 | 18DEC2024 | 60 | 01JUN2025 | 12 | Ocean_guard - SOUTH | 40500 | LOW |
12. FRAUD DETECTION LOGIC (MACRO)
%macro fraud_check;
data lighthouse_fraud;
set lighthouse_util;
if Maintenance_Cost > 55000 and Failure_Incidents < 2 then Fraud_Flag='YES';
else Fraud_Flag='NO';
run;
proc print data=lighthouse_fraud;
run;
%mend;
%fraud_check;
OUTPUT:
| Obs | Lighthouse_ID | Lighthouse_Name | Coastline | Visibility_Range | Maintenance_Cost | Failure_Incidents | Power_Source | Installation_Date | Last_Service_Date | Reliability_Index | Next_Service_Date | Years_In_Service | Full_Name | Cost_Filled | Utilization_Class | Fraud_Flag |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | 101 | Alpha_point | EAST | 25 | 45000 | 2 | solar | 15JAN2015 | 12JAN2025 | 80 | 01JUL2025 | 11 | Alpha_point - EAST | 45000 | MEDIUM | NO |
| 2 | 102 | Beacon_rock | WEST | 30 | 52000 | 1 | electric | 10FEB2016 | 05JAN2025 | 90 | 01JUL2025 | 10 | Beacon_rock - WEST | 52000 | HIGH | NO |
| 3 | 103 | Coral_light | SOUTH | 18 | 38000 | 4 | hybrid | 22MAR2014 | 20DEC2024 | 60 | 01JUN2025 | 12 | Coral_light - SOUTH | 38000 | LOW | NO |
| 4 | 104 | Delta_beam | EAST | 22 | 41000 | 3 | solar | 11APR2017 | 18JAN2025 | 70 | 01JUL2025 | 9 | Delta_beam - EAST | 41000 | LOW | NO |
| 5 | 105 | Eagle_eye | WEST | 35 | 61000 | 0 | electric | 09MAY2018 | 01JAN2025 | 100 | 01JUL2025 | 8 | Eagle_eye - WEST | 61000 | HIGH | YES |
| 6 | 106 | Falcon_tower | SOUTH | 20 | 39500 | 2 | hybrid | 15JUN2016 | 10DEC2024 | 80 | 01JUN2025 | 10 | Falcon_tower - SOUTH | 39500 | MEDIUM | NO |
| 7 | 107 | Gulf_guide | EAST | 28 | 47000 | 1 | solar | 12JUL2019 | 02JAN2025 | 90 | 01JUL2025 | 7 | Gulf_guide - EAST | 47000 | HIGH | NO |
| 8 | 108 | Harbor_light | WEST | 32 | 58000 | 0 | electric | 18AUG2020 | 05JAN2025 | 100 | 01JUL2025 | 6 | Harbor_light - WEST | 58000 | HIGH | YES |
| 9 | 109 | Island_beam | SOUTH | 16 | 36000 | 5 | hybrid | 21SEP2013 | 22DEC2024 | 50 | 01JUN2025 | 13 | Island_beam - SOUTH | 36000 | LOW | NO |
| 10 | 110 | Jetty_light | EAST | 26 | 45500 | 2 | solar | 10OCT2015 | 12JAN2025 | 80 | 01JUL2025 | 11 | Jetty_light - EAST | 45500 | MEDIUM | NO |
| 11 | 111 | Keystone | WEST | 34 | 60000 | 1 | electric | 15NOV2017 | 05JAN2025 | 90 | 01JUL2025 | 9 | Keystone - WEST | 60000 | HIGH | YES |
| 12 | 112 | Lagoon_light | SOUTH | 19 | 39000 | 3 | hybrid | 01DEC2016 | 20DEC2024 | 70 | 01JUN2025 | 10 | Lagoon_light - SOUTH | 39000 | LOW | NO |
| 13 | 113 | Marina_beam | EAST | 27 | 46500 | 1 | solar | 25JAN2019 | 02JAN2025 | 90 | 01JUL2025 | 7 | Marina_beam - EAST | 46500 | HIGH | NO |
| 14 | 114 | Nautical_point | WEST | 33 | 59000 | 0 | electric | 11FEB2021 | 05JAN2025 | 100 | 01JUL2025 | 5 | Nautical_point - WEST | 59000 | HIGH | YES |
| 15 | 115 | Ocean_guard | SOUTH | 21 | 40500 | 4 | hybrid | 18MAR2014 | 18DEC2024 | 60 | 01JUN2025 | 12 | Ocean_guard - SOUTH | 40500 | LOW | NO |
13. DESCRIPTIVE STATISTICS
proc means data=lighthouse_fraud mean min max;
var Maintenance_Cost Reliability_Index Failure_Incidents;
run;
OUTPUT:
The MEANS Procedure
| Variable | Mean | Minimum | Maximum |
|---|---|---|---|
Maintenance_Cost Reliability_Index Failure_Incidents | 47200.00 80.6666667 1.9333333 | 36000.00 50.0000000 0 | 61000.00 100.0000000 5.0000000 |
proc univariate data=lighthouse_fraud;
var Reliability_Index;
run;
OUTPUT:
The UNIVARIATE Procedure
Variable: Reliability_Index
| Moments | |||
|---|---|---|---|
| N | 15 | Sum Weights | 15 |
| Mean | 80.6666667 | Sum Observations | 1210 |
| Std Deviation | 15.7963227 | Variance | 249.52381 |
| Skewness | -0.5016516 | Kurtosis | -0.7252805 |
| Uncorrected SS | 101100 | Corrected SS | 3493.33333 |
| Coeff Variation | 19.5822182 | Std Error Mean | 4.07859297 |
| Basic Statistical Measures | |||
|---|---|---|---|
| Location | Variability | ||
| Mean | 80.66667 | Std Deviation | 15.79632 |
| Median | 80.00000 | Variance | 249.52381 |
| Mode | 90.00000 | Range | 50.00000 |
| Interquartile Range | 20.00000 | ||
| Tests for Location: Mu0=0 | ||||
|---|---|---|---|---|
| Test | Statistic | p Value | ||
| Student's t | t | 19.77806 | Pr > |t| | <.0001 |
| Sign | M | 7.5 | Pr >= |M| | <.0001 |
| Signed Rank | S | 60 | Pr >= |S| | <.0001 |
| Quantiles (Definition 5) | |
|---|---|
| Level | Quantile |
| 100% Max | 100 |
| 99% | 100 |
| 95% | 100 |
| 90% | 100 |
| 75% Q3 | 90 |
| 50% Median | 80 |
| 25% Q1 | 70 |
| 10% | 60 |
| 5% | 50 |
| 1% | 50 |
| 0% Min | 50 |
| Extreme Observations | |||
|---|---|---|---|
| Lowest | Highest | ||
| Value | Obs | Value | Obs |
| 50 | 9 | 90 | 11 |
| 60 | 15 | 90 | 13 |
| 60 | 3 | 100 | 5 |
| 70 | 12 | 100 | 8 |
| 70 | 4 | 100 | 14 |
14. FREQUENCY ANALYSIS
proc freq data=lighthouse_fraud;
tables Coastline Power_Source Utilization_Class Fraud_Flag;
run;
OUTPUT:
The FREQ Procedure
| Coastline | Frequency | Percent | Cumulative Frequency | Cumulative Percent |
|---|---|---|---|---|
| EAST | 5 | 33.33 | 5 | 33.33 |
| SOUTH | 5 | 33.33 | 10 | 66.67 |
| WEST | 5 | 33.33 | 15 | 100.00 |
| Power_Source | Frequency | Percent | Cumulative Frequency | Cumulative Percent |
|---|---|---|---|---|
| electric | 5 | 33.33 | 5 | 33.33 |
| hybrid | 5 | 33.33 | 10 | 66.67 |
| solar | 5 | 33.33 | 15 | 100.00 |
| Utilization_Class | Frequency | Percent | Cumulative Frequency | Cumulative Percent |
|---|---|---|---|---|
| HIGH | 7 | 46.67 | 7 | 46.67 |
| LOW | 5 | 33.33 | 12 | 80.00 |
| MEDIUM | 3 | 20.00 | 15 | 100.00 |
| Fraud_Flag | Frequency | Percent | Cumulative Frequency | Cumulative Percent |
|---|---|---|---|---|
| NO | 11 | 73.33 | 11 | 73.33 |
| YES | 4 | 26.67 | 15 | 100.00 |
15. CORRELATION
proc corr data=lighthouse_fraud;
var Maintenance_Cost Failure_Incidents Reliability_Index;
run;
OUTPUT:
The CORR Procedure
| 3 Variables: | Maintenance_Cost Failure_Incidents Reliability_Index |
|---|
| Simple Statistics | ||||||
|---|---|---|---|---|---|---|
| Variable | N | Mean | Std Dev | Sum | Minimum | Maximum |
| Maintenance_Cost | 15 | 47200 | 8707 | 708000 | 36000 | 61000 |
| Failure_Incidents | 15 | 1.93333 | 1.57963 | 29.00000 | 0 | 5.00000 |
| Reliability_Index | 15 | 80.66667 | 15.79632 | 1210 | 50.00000 | 100.00000 |
| Pearson Correlation Coefficients, N = 15 Prob > |r| under H0: Rho=0 | |||
|---|---|---|---|
| Maintenance_Cost | Failure_Incidents | Reliability_Index | |
| Maintenance_Cost | 1.00000 | -0.87922 <.0001 | 0.87922 <.0001 |
| Failure_Incidents | -0.87922 <.0001 | 1.00000 | -1.00000 <.0001 |
| Reliability_Index | 0.87922 <.0001 | -1.00000 <.0001 | 1.00000 |
16. SQL ANALYTICS
proc sql;
select Coastline,
avg(Maintenance_Cost) as Avg_Cost,
avg(Reliability_Index) as Avg_Reliability
from lighthouse_fraud
group by Coastline;
quit;
OUTPUT:
| Coastline | Avg_Cost | Avg_Reliability |
|---|---|---|
| EAST | 45000 | 82 |
| SOUTH | 38600 | 64 |
| WEST | 58000 | 96 |
17. APPEND, TRANSPOSE
WHAT IS WRONG WITH THIS CODE?
Why this is incorrect
|
Issue |
Explanation |
|
Same
BASE & DATA |
You are
appending a dataset into itself |
|
Redundant |
No new
observations are added |
|
Risky |
In some
cases, causes duplicate inflation |
|
Interview
red flag |
Shows
lack of data-flow understanding |
**PROC APPEND is meant to combine TWO DIFFERENT datasets
· Monthly → Yearly
· Current → Historical
· New batch → Master table
proc append base=lighthouse_fraud
data=lighthouse_fraud force;
run;
proc print data=lighthouse_fraud;
run;
OUTPUT:
| Obs | Lighthouse_ID | Lighthouse_Name | Coastline | Visibility_Range | Maintenance_Cost | Failure_Incidents | Power_Source | Installation_Date | Last_Service_Date | Reliability_Index | Next_Service_Date | Years_In_Service | Full_Name | Cost_Filled | Utilization_Class | Fraud_Flag |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | 101 | Alpha_point | EAST | 25 | 45000 | 2 | solar | 15JAN2015 | 12JAN2025 | 80 | 01JUL2025 | 11 | Alpha_point - EAST | 45000 | MEDIUM | NO |
| 2 | 102 | Beacon_rock | WEST | 30 | 52000 | 1 | electric | 10FEB2016 | 05JAN2025 | 90 | 01JUL2025 | 10 | Beacon_rock - WEST | 52000 | HIGH | NO |
| 3 | 103 | Coral_light | SOUTH | 18 | 38000 | 4 | hybrid | 22MAR2014 | 20DEC2024 | 60 | 01JUN2025 | 12 | Coral_light - SOUTH | 38000 | LOW | NO |
| 4 | 104 | Delta_beam | EAST | 22 | 41000 | 3 | solar | 11APR2017 | 18JAN2025 | 70 | 01JUL2025 | 9 | Delta_beam - EAST | 41000 | LOW | NO |
| 5 | 105 | Eagle_eye | WEST | 35 | 61000 | 0 | electric | 09MAY2018 | 01JAN2025 | 100 | 01JUL2025 | 8 | Eagle_eye - WEST | 61000 | HIGH | YES |
| 6 | 106 | Falcon_tower | SOUTH | 20 | 39500 | 2 | hybrid | 15JUN2016 | 10DEC2024 | 80 | 01JUN2025 | 10 | Falcon_tower - SOUTH | 39500 | MEDIUM | NO |
| 7 | 107 | Gulf_guide | EAST | 28 | 47000 | 1 | solar | 12JUL2019 | 02JAN2025 | 90 | 01JUL2025 | 7 | Gulf_guide - EAST | 47000 | HIGH | NO |
| 8 | 108 | Harbor_light | WEST | 32 | 58000 | 0 | electric | 18AUG2020 | 05JAN2025 | 100 | 01JUL2025 | 6 | Harbor_light - WEST | 58000 | HIGH | YES |
| 9 | 109 | Island_beam | SOUTH | 16 | 36000 | 5 | hybrid | 21SEP2013 | 22DEC2024 | 50 | 01JUN2025 | 13 | Island_beam - SOUTH | 36000 | LOW | NO |
| 10 | 110 | Jetty_light | EAST | 26 | 45500 | 2 | solar | 10OCT2015 | 12JAN2025 | 80 | 01JUL2025 | 11 | Jetty_light - EAST | 45500 | MEDIUM | NO |
| 11 | 111 | Keystone | WEST | 34 | 60000 | 1 | electric | 15NOV2017 | 05JAN2025 | 90 | 01JUL2025 | 9 | Keystone - WEST | 60000 | HIGH | YES |
| 12 | 112 | Lagoon_light | SOUTH | 19 | 39000 | 3 | hybrid | 01DEC2016 | 20DEC2024 | 70 | 01JUN2025 | 10 | Lagoon_light - SOUTH | 39000 | LOW | NO |
| 13 | 113 | Marina_beam | EAST | 27 | 46500 | 1 | solar | 25JAN2019 | 02JAN2025 | 90 | 01JUL2025 | 7 | Marina_beam - EAST | 46500 | HIGH | NO |
| 14 | 114 | Nautical_point | WEST | 33 | 59000 | 0 | electric | 11FEB2021 | 05JAN2025 | 100 | 01JUL2025 | 5 | Nautical_point - WEST | 59000 | HIGH | YES |
| 15 | 115 | Ocean_guard | SOUTH | 21 | 40500 | 4 | hybrid | 18MAR2014 | 18DEC2024 | 60 | 01JUN2025 | 12 | Ocean_guard - SOUTH | 40500 | LOW | NO |
| 16 | 101 | Alpha_point | EAST | 25 | 45000 | 2 | solar | 15JAN2015 | 12JAN2025 | 80 | 01JUL2025 | 11 | Alpha_point - EAST | 45000 | MEDIUM | NO |
| 17 | 102 | Beacon_rock | WEST | 30 | 52000 | 1 | electric | 10FEB2016 | 05JAN2025 | 90 | 01JUL2025 | 10 | Beacon_rock - WEST | 52000 | HIGH | NO |
| 18 | 103 | Coral_light | SOUTH | 18 | 38000 | 4 | hybrid | 22MAR2014 | 20DEC2024 | 60 | 01JUN2025 | 12 | Coral_light - SOUTH | 38000 | LOW | NO |
| 19 | 104 | Delta_beam | EAST | 22 | 41000 | 3 | solar | 11APR2017 | 18JAN2025 | 70 | 01JUL2025 | 9 | Delta_beam - EAST | 41000 | LOW | NO |
| 20 | 105 | Eagle_eye | WEST | 35 | 61000 | 0 | electric | 09MAY2018 | 01JAN2025 | 100 | 01JUL2025 | 8 | Eagle_eye - WEST | 61000 | HIGH | YES |
| 21 | 106 | Falcon_tower | SOUTH | 20 | 39500 | 2 | hybrid | 15JUN2016 | 10DEC2024 | 80 | 01JUN2025 | 10 | Falcon_tower - SOUTH | 39500 | MEDIUM | NO |
| 22 | 107 | Gulf_guide | EAST | 28 | 47000 | 1 | solar | 12JUL2019 | 02JAN2025 | 90 | 01JUL2025 | 7 | Gulf_guide - EAST | 47000 | HIGH | NO |
| 23 | 108 | Harbor_light | WEST | 32 | 58000 | 0 | electric | 18AUG2020 | 05JAN2025 | 100 | 01JUL2025 | 6 | Harbor_light - WEST | 58000 | HIGH | YES |
| 24 | 109 | Island_beam | SOUTH | 16 | 36000 | 5 | hybrid | 21SEP2013 | 22DEC2024 | 50 | 01JUN2025 | 13 | Island_beam - SOUTH | 36000 | LOW | NO |
| 25 | 110 | Jetty_light | EAST | 26 | 45500 | 2 | solar | 10OCT2015 | 12JAN2025 | 80 | 01JUL2025 | 11 | Jetty_light - EAST | 45500 | MEDIUM | NO |
| 26 | 111 | Keystone | WEST | 34 | 60000 | 1 | electric | 15NOV2017 | 05JAN2025 | 90 | 01JUL2025 | 9 | Keystone - WEST | 60000 | HIGH | YES |
| 27 | 112 | Lagoon_light | SOUTH | 19 | 39000 | 3 | hybrid | 01DEC2016 | 20DEC2024 | 70 | 01JUN2025 | 10 | Lagoon_light - SOUTH | 39000 | LOW | NO |
| 28 | 113 | Marina_beam | EAST | 27 | 46500 | 1 | solar | 25JAN2019 | 02JAN2025 | 90 | 01JUL2025 | 7 | Marina_beam - EAST | 46500 | HIGH | NO |
| 29 | 114 | Nautical_point | WEST | 33 | 59000 | 0 | electric | 11FEB2021 | 05JAN2025 | 100 | 01JUL2025 | 5 | Nautical_point - WEST | 59000 | HIGH | YES |
| 30 | 115 | Ocean_guard | SOUTH | 21 | 40500 | 4 | hybrid | 18MAR2014 | 18DEC2024 | 60 | 01JUN2025 | 12 | Ocean_guard - SOUTH | 40500 | LOW | NO |
Approach 1:APPEND NEW MAINTENANCE DATA
Why
this is correct
· base=
→ existing master dataset
· data=
→ new incoming records
· force
→ allows structure mismatch
Step 1: Create a NEW dataset
data lighthouse_new;
input Lighthouse_ID Lighthouse_Name:$25. Coastline:$10. Visibility_Range
Maintenance_Cost Failure_Incidents Power_Source:$10.
Installation_Date : date9. Last_Service_Date : date9.
Reliability_Index Utilization_Class:$6. Fraud_Flag:$3.;
format Installation_Date Last_Service_Date date9.;
datalines;
116 Pearl_Light East 29 48000 1 solar 20APR2019 10JAN2026 90 HIGH NO
117 Reef_Beam West 36 62000 4 electric 15MAY2018 12JAN2026 60 LOW YES
;
run;
proc print data=lighthouse_new;
run;
OUTPUT:
| Obs | Lighthouse_ID | Lighthouse_Name | Coastline | Visibility_Range | Maintenance_Cost | Failure_Incidents | Power_Source | Installation_Date | Last_Service_Date | Reliability_Index | Utilization_Class | Fraud_Flag |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | 116 | Pearl_Light | East | 29 | 48000 | 1 | solar | 20APR2019 | 10JAN2026 | 90 | HIGH | NO |
| 2 | 117 | Reef_Beam | West | 36 | 62000 | 4 | electric | 15MAY2018 | 12JAN2026 | 60 | LOW | YES |
Step 2: Append CORRECTLY
proc append base=lighthouse_fraud
data=lighthouse_new force;
run;
proc print data=lighthouse_fraud;
run;
OUTPUT:
| Obs | Lighthouse_ID | Lighthouse_Name | Coastline | Visibility_Range | Maintenance_Cost | Failure_Incidents | Power_Source | Installation_Date | Last_Service_Date | Reliability_Index | Next_Service_Date | Years_In_Service | Full_Name | Cost_Filled | Utilization_Class | Fraud_Flag |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | 101 | Alpha_point | EAST | 25 | 45000 | 2 | solar | 15JAN2015 | 12JAN2025 | 80 | 01JUL2025 | 11 | Alpha_point - EAST | 45000 | MEDIUM | NO |
| 2 | 102 | Beacon_rock | WEST | 30 | 52000 | 1 | electric | 10FEB2016 | 05JAN2025 | 90 | 01JUL2025 | 10 | Beacon_rock - WEST | 52000 | HIGH | NO |
| 3 | 103 | Coral_light | SOUTH | 18 | 38000 | 4 | hybrid | 22MAR2014 | 20DEC2024 | 60 | 01JUN2025 | 12 | Coral_light - SOUTH | 38000 | LOW | NO |
| 4 | 104 | Delta_beam | EAST | 22 | 41000 | 3 | solar | 11APR2017 | 18JAN2025 | 70 | 01JUL2025 | 9 | Delta_beam - EAST | 41000 | LOW | NO |
| 5 | 105 | Eagle_eye | WEST | 35 | 61000 | 0 | electric | 09MAY2018 | 01JAN2025 | 100 | 01JUL2025 | 8 | Eagle_eye - WEST | 61000 | HIGH | YES |
| 6 | 106 | Falcon_tower | SOUTH | 20 | 39500 | 2 | hybrid | 15JUN2016 | 10DEC2024 | 80 | 01JUN2025 | 10 | Falcon_tower - SOUTH | 39500 | MEDIUM | NO |
| 7 | 107 | Gulf_guide | EAST | 28 | 47000 | 1 | solar | 12JUL2019 | 02JAN2025 | 90 | 01JUL2025 | 7 | Gulf_guide - EAST | 47000 | HIGH | NO |
| 8 | 108 | Harbor_light | WEST | 32 | 58000 | 0 | electric | 18AUG2020 | 05JAN2025 | 100 | 01JUL2025 | 6 | Harbor_light - WEST | 58000 | HIGH | YES |
| 9 | 109 | Island_beam | SOUTH | 16 | 36000 | 5 | hybrid | 21SEP2013 | 22DEC2024 | 50 | 01JUN2025 | 13 | Island_beam - SOUTH | 36000 | LOW | NO |
| 10 | 110 | Jetty_light | EAST | 26 | 45500 | 2 | solar | 10OCT2015 | 12JAN2025 | 80 | 01JUL2025 | 11 | Jetty_light - EAST | 45500 | MEDIUM | NO |
| 11 | 111 | Keystone | WEST | 34 | 60000 | 1 | electric | 15NOV2017 | 05JAN2025 | 90 | 01JUL2025 | 9 | Keystone - WEST | 60000 | HIGH | YES |
| 12 | 112 | Lagoon_light | SOUTH | 19 | 39000 | 3 | hybrid | 01DEC2016 | 20DEC2024 | 70 | 01JUN2025 | 10 | Lagoon_light - SOUTH | 39000 | LOW | NO |
| 13 | 113 | Marina_beam | EAST | 27 | 46500 | 1 | solar | 25JAN2019 | 02JAN2025 | 90 | 01JUL2025 | 7 | Marina_beam - EAST | 46500 | HIGH | NO |
| 14 | 114 | Nautical_point | WEST | 33 | 59000 | 0 | electric | 11FEB2021 | 05JAN2025 | 100 | 01JUL2025 | 5 | Nautical_point - WEST | 59000 | HIGH | YES |
| 15 | 115 | Ocean_guard | SOUTH | 21 | 40500 | 4 | hybrid | 18MAR2014 | 18DEC2024 | 60 | 01JUN2025 | 12 | Ocean_guard - SOUTH | 40500 | LOW | NO |
| 16 | 116 | Pearl_Light | East | 29 | 48000 | 1 | solar | 20APR2019 | 10JAN2026 | 90 | . | . | . | HIGH | NO | |
| 17 | 117 | Reef_Beam | West | 36 | 62000 | 4 | electric | 15MAY2018 | 12JAN2026 | 60 | . | . | . | LOW | YES |
APPROACH 2: SPLIT → APPEND
Step 1: Split
data east west south;
set lighthouse_fraud;
if Coastline='EAST' then output east;
else if Coastline='WEST' then output west;
else if Coastline='SOUTH' then output south;
run;
proc print data=east;
run;
OUTPUT:
| Obs | Lighthouse_ID | Lighthouse_Name | Coastline | Visibility_Range | Maintenance_Cost | Failure_Incidents | Power_Source | Installation_Date | Last_Service_Date | Reliability_Index | Next_Service_Date | Years_In_Service | Full_Name | Cost_Filled | Utilization_Class | Fraud_Flag |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | 101 | Alpha_point | EAST | 25 | 45000 | 2 | solar | 15JAN2015 | 12JAN2025 | 80 | 01JUL2025 | 11 | Alpha_point - EAST | 45000 | MEDIUM | NO |
| 2 | 104 | Delta_beam | EAST | 22 | 41000 | 3 | solar | 11APR2017 | 18JAN2025 | 70 | 01JUL2025 | 9 | Delta_beam - EAST | 41000 | LOW | NO |
| 3 | 107 | Gulf_guide | EAST | 28 | 47000 | 1 | solar | 12JUL2019 | 02JAN2025 | 90 | 01JUL2025 | 7 | Gulf_guide - EAST | 47000 | HIGH | NO |
| 4 | 110 | Jetty_light | EAST | 26 | 45500 | 2 | solar | 10OCT2015 | 12JAN2025 | 80 | 01JUL2025 | 11 | Jetty_light - EAST | 45500 | MEDIUM | NO |
| 5 | 113 | Marina_beam | EAST | 27 | 46500 | 1 | solar | 25JAN2019 | 02JAN2025 | 90 | 01JUL2025 | 7 | Marina_beam - EAST | 46500 | HIGH | NO |
proc print data=west;
run;
OUTPUT:
| Obs | Lighthouse_ID | Lighthouse_Name | Coastline | Visibility_Range | Maintenance_Cost | Failure_Incidents | Power_Source | Installation_Date | Last_Service_Date | Reliability_Index | Next_Service_Date | Years_In_Service | Full_Name | Cost_Filled | Utilization_Class | Fraud_Flag |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | 102 | Beacon_rock | WEST | 30 | 52000 | 1 | electric | 10FEB2016 | 05JAN2025 | 90 | 01JUL2025 | 10 | Beacon_rock - WEST | 52000 | HIGH | NO |
| 2 | 105 | Eagle_eye | WEST | 35 | 61000 | 0 | electric | 09MAY2018 | 01JAN2025 | 100 | 01JUL2025 | 8 | Eagle_eye - WEST | 61000 | HIGH | YES |
| 3 | 108 | Harbor_light | WEST | 32 | 58000 | 0 | electric | 18AUG2020 | 05JAN2025 | 100 | 01JUL2025 | 6 | Harbor_light - WEST | 58000 | HIGH | YES |
| 4 | 111 | Keystone | WEST | 34 | 60000 | 1 | electric | 15NOV2017 | 05JAN2025 | 90 | 01JUL2025 | 9 | Keystone - WEST | 60000 | HIGH | YES |
| 5 | 114 | Nautical_point | WEST | 33 | 59000 | 0 | electric | 11FEB2021 | 05JAN2025 | 100 | 01JUL2025 | 5 | Nautical_point - WEST | 59000 | HIGH | YES |
proc print data=south;
run;
OUTPUT:
| Obs | Lighthouse_ID | Lighthouse_Name | Coastline | Visibility_Range | Maintenance_Cost | Failure_Incidents | Power_Source | Installation_Date | Last_Service_Date | Reliability_Index | Next_Service_Date | Years_In_Service | Full_Name | Cost_Filled | Utilization_Class | Fraud_Flag |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | 103 | Coral_light | SOUTH | 18 | 38000 | 4 | hybrid | 22MAR2014 | 20DEC2024 | 60 | 01JUN2025 | 12 | Coral_light - SOUTH | 38000 | LOW | NO |
| 2 | 106 | Falcon_tower | SOUTH | 20 | 39500 | 2 | hybrid | 15JUN2016 | 10DEC2024 | 80 | 01JUN2025 | 10 | Falcon_tower - SOUTH | 39500 | MEDIUM | NO |
| 3 | 109 | Island_beam | SOUTH | 16 | 36000 | 5 | hybrid | 21SEP2013 | 22DEC2024 | 50 | 01JUN2025 | 13 | Island_beam - SOUTH | 36000 | LOW | NO |
| 4 | 112 | Lagoon_light | SOUTH | 19 | 39000 | 3 | hybrid | 01DEC2016 | 20DEC2024 | 70 | 01JUN2025 | 10 | Lagoon_light - SOUTH | 39000 | LOW | NO |
| 5 | 115 | Ocean_guard | SOUTH | 21 | 40500 | 4 | hybrid | 18MAR2014 | 18DEC2024 | 60 | 01JUN2025 | 12 | Ocean_guard - SOUTH | 40500 | LOW | NO |
Step 2: Append Back
proc append base=east
data=west force;
run;
proc print data=east;
run;
OUTPUT:
| Obs | Lighthouse_ID | Lighthouse_Name | Coastline | Visibility_Range | Maintenance_Cost | Failure_Incidents | Power_Source | Installation_Date | Last_Service_Date | Reliability_Index | Next_Service_Date | Years_In_Service | Full_Name | Cost_Filled | Utilization_Class | Fraud_Flag |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | 101 | Alpha_point | EAST | 25 | 45000 | 2 | solar | 15JAN2015 | 12JAN2025 | 80 | 01JUL2025 | 11 | Alpha_point - EAST | 45000 | MEDIUM | NO |
| 2 | 104 | Delta_beam | EAST | 22 | 41000 | 3 | solar | 11APR2017 | 18JAN2025 | 70 | 01JUL2025 | 9 | Delta_beam - EAST | 41000 | LOW | NO |
| 3 | 107 | Gulf_guide | EAST | 28 | 47000 | 1 | solar | 12JUL2019 | 02JAN2025 | 90 | 01JUL2025 | 7 | Gulf_guide - EAST | 47000 | HIGH | NO |
| 4 | 110 | Jetty_light | EAST | 26 | 45500 | 2 | solar | 10OCT2015 | 12JAN2025 | 80 | 01JUL2025 | 11 | Jetty_light - EAST | 45500 | MEDIUM | NO |
| 5 | 113 | Marina_beam | EAST | 27 | 46500 | 1 | solar | 25JAN2019 | 02JAN2025 | 90 | 01JUL2025 | 7 | Marina_beam - EAST | 46500 | HIGH | NO |
| 6 | 102 | Beacon_rock | WEST | 30 | 52000 | 1 | electric | 10FEB2016 | 05JAN2025 | 90 | 01JUL2025 | 10 | Beacon_rock - WEST | 52000 | HIGH | NO |
| 7 | 105 | Eagle_eye | WEST | 35 | 61000 | 0 | electric | 09MAY2018 | 01JAN2025 | 100 | 01JUL2025 | 8 | Eagle_eye - WEST | 61000 | HIGH | YES |
| 8 | 108 | Harbor_light | WEST | 32 | 58000 | 0 | electric | 18AUG2020 | 05JAN2025 | 100 | 01JUL2025 | 6 | Harbor_light - WEST | 58000 | HIGH | YES |
| 9 | 111 | Keystone | WEST | 34 | 60000 | 1 | electric | 15NOV2017 | 05JAN2025 | 90 | 01JUL2025 | 9 | Keystone - WEST | 60000 | HIGH | YES |
| 10 | 114 | Nautical_point | WEST | 33 | 59000 | 0 | electric | 11FEB2021 | 05JAN2025 | 100 | 01JUL2025 | 5 | Nautical_point - WEST | 59000 | HIGH | YES |
proc append base=east
data=south force;
run;
proc print data=east;
run;
OUTPUT:
| Obs | Lighthouse_ID | Lighthouse_Name | Coastline | Visibility_Range | Maintenance_Cost | Failure_Incidents | Power_Source | Installation_Date | Last_Service_Date | Reliability_Index | Next_Service_Date | Years_In_Service | Full_Name | Cost_Filled | Utilization_Class | Fraud_Flag |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | 101 | Alpha_point | EAST | 25 | 45000 | 2 | solar | 15JAN2015 | 12JAN2025 | 80 | 01JUL2025 | 11 | Alpha_point - EAST | 45000 | MEDIUM | NO |
| 2 | 104 | Delta_beam | EAST | 22 | 41000 | 3 | solar | 11APR2017 | 18JAN2025 | 70 | 01JUL2025 | 9 | Delta_beam - EAST | 41000 | LOW | NO |
| 3 | 107 | Gulf_guide | EAST | 28 | 47000 | 1 | solar | 12JUL2019 | 02JAN2025 | 90 | 01JUL2025 | 7 | Gulf_guide - EAST | 47000 | HIGH | NO |
| 4 | 110 | Jetty_light | EAST | 26 | 45500 | 2 | solar | 10OCT2015 | 12JAN2025 | 80 | 01JUL2025 | 11 | Jetty_light - EAST | 45500 | MEDIUM | NO |
| 5 | 113 | Marina_beam | EAST | 27 | 46500 | 1 | solar | 25JAN2019 | 02JAN2025 | 90 | 01JUL2025 | 7 | Marina_beam - EAST | 46500 | HIGH | NO |
| 6 | 102 | Beacon_rock | WEST | 30 | 52000 | 1 | electric | 10FEB2016 | 05JAN2025 | 90 | 01JUL2025 | 10 | Beacon_rock - WEST | 52000 | HIGH | NO |
| 7 | 105 | Eagle_eye | WEST | 35 | 61000 | 0 | electric | 09MAY2018 | 01JAN2025 | 100 | 01JUL2025 | 8 | Eagle_eye - WEST | 61000 | HIGH | YES |
| 8 | 108 | Harbor_light | WEST | 32 | 58000 | 0 | electric | 18AUG2020 | 05JAN2025 | 100 | 01JUL2025 | 6 | Harbor_light - WEST | 58000 | HIGH | YES |
| 9 | 111 | Keystone | WEST | 34 | 60000 | 1 | electric | 15NOV2017 | 05JAN2025 | 90 | 01JUL2025 | 9 | Keystone - WEST | 60000 | HIGH | YES |
| 10 | 114 | Nautical_point | WEST | 33 | 59000 | 0 | electric | 11FEB2021 | 05JAN2025 | 100 | 01JUL2025 | 5 | Nautical_point - WEST | 59000 | HIGH | YES |
| 11 | 103 | Coral_light | SOUTH | 18 | 38000 | 4 | hybrid | 22MAR2014 | 20DEC2024 | 60 | 01JUN2025 | 12 | Coral_light - SOUTH | 38000 | LOW | NO |
| 12 | 106 | Falcon_tower | SOUTH | 20 | 39500 | 2 | hybrid | 15JUN2016 | 10DEC2024 | 80 | 01JUN2025 | 10 | Falcon_tower - SOUTH | 39500 | MEDIUM | NO |
| 13 | 109 | Island_beam | SOUTH | 16 | 36000 | 5 | hybrid | 21SEP2013 | 22DEC2024 | 50 | 01JUN2025 | 13 | Island_beam - SOUTH | 36000 | LOW | NO |
| 14 | 112 | Lagoon_light | SOUTH | 19 | 39000 | 3 | hybrid | 01DEC2016 | 20DEC2024 | 70 | 01JUN2025 | 10 | Lagoon_light - SOUTH | 39000 | LOW | NO |
| 15 | 115 | Ocean_guard | SOUTH | 21 | 40500 | 4 | hybrid | 18MAR2014 | 18DEC2024 | 60 | 01JUN2025 | 12 | Ocean_guard - SOUTH | 40500 | LOW | NO |
Final combined dataset is east
TRANSPOSE
proc transpose data=lighthouse_fraud out=lighthouse_t;
var Maintenance_Cost Reliability_Index;
run;
proc print data=lighthouse_t;
run;
OUTPUT:
| Obs | _NAME_ | COL1 | COL2 | COL3 | COL4 | COL5 | COL6 | COL7 | COL8 | COL9 | COL10 | COL11 | COL12 | COL13 | COL14 | COL15 | COL16 | COL17 |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | Maintenance_Cost | 45000 | 52000 | 38000 | 41000 | 61000 | 39500 | 47000 | 58000 | 36000 | 45500 | 60000 | 39000 | 46500 | 59000 | 40500 | 48000 | 62000 |
| 2 | Reliability_Index | 80 | 90 | 60 | 70 | 100 | 80 | 90 | 100 | 50 | 80 | 90 | 70 | 90 | 100 | 60 | 90 | 60 |
18. VISUALIZATION
proc sgplot data=lighthouse_fraud;
vbar Coastline / response=Maintenance_Cost stat=mean;
run;
OUTPUT:
19. DATASET CLEANUP
proc datasets lib=work;
delete lighthouse_raw lighthouse_dates lighthouse_clean;
quit;
LOG:
20.
BUSINESS INSIGHTS
· West coast lighthouses cost more but are more
reliable
· High maintenance cost with frequent failures
indicates fraud risk
· Solar-powered units show balanced
cost-efficiency
· Aging lighthouses require proactive servicing
21. 21 KEY POINTS ABOUT THIS PROJECT
1. This
project treats lighthouses as operational assets, similar to real-world
infrastructure like data centers or telecom towers.
2.
A realistic dataset with more than 15 lighthouse
records was created instead of using dummy or textbook data.
3.
Each lighthouse is uniquely identified, enabling
tracking across multiple analytical steps.
4.
Operational variables such as visibility range, maintenance
cost, power source, and failure incidents were carefully designed.
5.
Reliability Index was derived as a percentage to
reflect operational health in a business-friendly metric.
6.
Intentional data errors were introduced to simulate
real-world data quality issues.
7.
The project demonstrates how such errors are identified
using PROC
CONTENTS and logic checks.
8.
Error correction is explained step by step, mirroring
actual industry practice.
9.
SAS Data Step programming is used extensively for data
transformation and derivations.
10. PROC SQL
is applied for grouping, aggregation, and management-level summaries.
11. Date
functions such as MDY, INTCK, and INTNX are used to
calculate service cycles and asset age.
12. Character
functions like STRIP,
TRIM,
UPCASE,
LOWCASE,
and CATX
ensure clean and standardized data.
13. Numeric
functions such as COALESCE help manage missing and inconsistent values.
14. Macros
are used to automate utilization classification, improving code reusability and
scalability.
15. A
rule-based fraud detection macro flags abnormal cost and failure combinations.
16. PROC FREQ
provides insight into coastline-wise and power-source-wise distributions.
17. PROC MEANS
and PROC
UNIVARIATE support descriptive and distribution analysis.
18. PROC CORR
identifies relationships between maintenance cost, failures, and reliability.
19. Data
reshaping techniques using SET, MERGE, APPEND,
and TRANSPOSE
reflect real project workflows.
20. Visualization
using PROC
SGPLOT converts analytics into decision-ready insights.
21. The entire project is structured in an interview-ready, business-oriented format.
22. Conclusion
The Lighthouse
Operations, Reliability & Fraud Analytics Using SAS project
demonstrates a complete, real-world analytics lifecycle using SAS in a simple
and understandable manner. Instead of focusing only on coding syntax, the
project emphasizes business thinking, data quality awareness, and analytical
reasoning. By modeling lighthouses as operational assets, the project mirrors
how organizations monitor and manage critical infrastructure in industries such
as transportation, energy, and public safety.
One of the strongest aspects of this project
is the intentional inclusion of data errors. Real-world datasets are rarely
clean, and the ability to identify, explain, and correct such issues is a key skill
expected from a SAS programmer. The project clearly shows how incorrect data
types or logic errors can affect analysis and how SAS tools can be used to
resolve them effectively.
The use of macros adds automation and
scalability, making the analysis efficient and reusable. Fraud detection logic
highlights how analytics can go beyond reporting and support governance and
compliance. Date functions, character handling, and numeric transformations
further strengthen the project by addressing common data preparation
challenges.
Overall,
this project is not just a technical exercise but a practical demonstration of
how SAS can be used to convert raw operational data into meaningful insights.
It is especially valuable for interview preparation, as it showcases both
programming proficiency and real-world problem-solving ability in a structured
and professional way.
SAS INTERVIEW QUESTIONS
1.Why use INTCK vs INTNX?
2.How do macros help operational classification?
3.How do you detect data quality issues in SAS?
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
About the Author:
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
About the Author:
SAS Learning Hub is a data analytics and SAS programming platform focused on clinical, financial, and real-world data analysis. The content is created by professionals with academic training in Pharmaceutics and hands-on experience in Base SAS, PROC SQL, Macros, SDTM, and ADaM, providing practical and industry-relevant SAS learning resources.
Disclaimer:
The datasets and analysis in this article are created for educational and demonstration purposes only. They do not represent LIGHT HOUSE data.
Our Mission:
This blog provides industry-focused SAS programming tutorials and analytics projects covering finance, healthcare, and technology.
This project is suitable for:
· Students learning SAS
· Data analysts building portfolios
· Professionals preparing for SAS interviews
· Bloggers writing about analytics and smart cities
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Comments
Post a Comment