387.Can SAS Predict Railway Signal Failures Before Disaster Strikes?
Can SAS Predict Railway Signal Failures Before Disaster Strikes?
HERE IN THIS PROJECT WE USED THESE SAS STATEMENTS —
DATA STEP | PROC SQL | PROC PRINT | PROC SGPLOT | MACROS | PROC CORR | PROC MEANS | PROC FREQ | PROC UNIVARIATE | APPEND | PROC DATASETS DELETE | DATA FUNCTIONS
INTRODUCTION
Railway signals are one of the most critical safety systems in any railway network.
Every green or red signal decides whether a train moves or stops. A single wrong signal can lead to massive delays, accidents, or even loss of lives.
But in real life, railway departments manage thousands of signal systems across multiple routes. It is impossible to manually monitor:
· Which signals fail frequently
· Which routes are risky
· Which systems cost too much but fail too often
· And which maintenance records look suspicious
So the real question becomes:
Can data analytics help railway authorities predict failures, improve safety, and even detect fraud?
This project answers that question using SAS analytics.
We create a realistic railway signal dataset and apply:
· Statistical analysis
· SQL reporting
· Visual dashboards
· Automation macros
· Fraud detection logic
BUSINESS CONTEXT
Railway signaling systems are the backbone of railway safety.
A single signal failure can cause:
- Train delays
- Collisions
- Passenger safety risk
- Huge financial losses
Railway authorities maintain thousands of signal systems across routes.
They track:
- How often signals fail
- How long repairs take
- Which routes are most risky
- Which systems show suspicious patterns (possible fraud / negligence)
This project simulates how a Railway Operations Analytics Team uses SAS to:
- Monitor system reliability
- Identify high-risk signals
- Detect abnormal maintenance behavior
- Generate safety dashboards
TABLE OF CONTENTS
- Dataset Creation
- Data Engineering (dates, functions)
- PROC SQL Analysis
- PROC MEANS
- PROC UNIVARIATE
- PROC FREQ
- PROC CORR
- PROC SGPLOT
- Macros
- Fraud Detection Logic
- SET / MERGE / APPEND / TRANSPOSE
- Character & Numeric Functions
- PROC DATASETS DELETE
1. DATASET CREATION
data railway_signals;
input Signal_ID $ Route:$15. Failure_Count Downtime_Minutes
Safety_Impact $ Maintenance_Cost Reliability_Index
Install_Date : date9. Last_Service_Date : date9.;
format Install_Date Last_Service_Date date9.;
datalines;
SIG001 SouthLine 5 180 High 12000 82 01JAN2020 10DEC2023
SIG002 EastLine 2 60 Low 4000 95 15MAR2019 05NOV2023
SIG003 NorthLine 9 300 Critical 20000 65 10FEB2018 20OCT2023
SIG004 WestLine 1 20 Low 2000 98 05MAY2021 01JAN2024
SIG005 SouthLine 7 240 High 15000 70 12JUL2017 15SEP2023
SIG006 EastLine 4 120 Medium 8000 85 09AUG2019 30NOV2023
SIG007 NorthLine 10 360 Critical 25000 60 01JAN2016 05OCT2023
SIG008 WestLine 3 90 Medium 6000 88 18DEC2020 12DEC2023
SIG009 CentralLine 6 210 High 13000 75 02FEB2018 22NOV2023
SIG010 CentralLine 1 30 Low 3000 97 15JUN2022 05JAN2024
SIG011 MetroLine 8 260 High 17000 68 20MAR2017 18SEP2023
SIG012 MetroLine 2 70 Medium 5000 90 10JAN2021 02DEC2023
SIG013 FreightLine 11 400 Critical 30000 55 01JAN2015 15OCT2023
SIG014 FreightLine 3 100 Medium 7000 86 05MAY2020 01DEC2023
SIG015 ExpressLine 4 110 Medium 9000 83 12APR2019 28NOV2023
;
run;
proc print data=railway_signals;
run;
OUTPUT:
| Obs | Signal_ID | Route | Failure_Count | Downtime_Minutes | Safety_Impact | Maintenance_Cost | Reliability_Index | Install_Date | Last_Service_Date |
|---|---|---|---|---|---|---|---|---|---|
| 1 | SIG001 | SouthLine | 5 | 180 | High | 12000 | 82 | 01JAN2020 | 10DEC2023 |
| 2 | SIG002 | EastLine | 2 | 60 | Low | 4000 | 95 | 15MAR2019 | 05NOV2023 |
| 3 | SIG003 | NorthLine | 9 | 300 | Critical | 20000 | 65 | 10FEB2018 | 20OCT2023 |
| 4 | SIG004 | WestLine | 1 | 20 | Low | 2000 | 98 | 05MAY2021 | 01JAN2024 |
| 5 | SIG005 | SouthLine | 7 | 240 | High | 15000 | 70 | 12JUL2017 | 15SEP2023 |
| 6 | SIG006 | EastLine | 4 | 120 | Medium | 8000 | 85 | 09AUG2019 | 30NOV2023 |
| 7 | SIG007 | NorthLine | 10 | 360 | Critical | 25000 | 60 | 01JAN2016 | 05OCT2023 |
| 8 | SIG008 | WestLine | 3 | 90 | Medium | 6000 | 88 | 18DEC2020 | 12DEC2023 |
| 9 | SIG009 | CentralLine | 6 | 210 | High | 13000 | 75 | 02FEB2018 | 22NOV2023 |
| 10 | SIG010 | CentralLine | 1 | 30 | Low | 3000 | 97 | 15JUN2022 | 05JAN2024 |
| 11 | SIG011 | MetroLine | 8 | 260 | High | 17000 | 68 | 20MAR2017 | 18SEP2023 |
| 12 | SIG012 | MetroLine | 2 | 70 | Medium | 5000 | 90 | 10JAN2021 | 02DEC2023 |
| 13 | SIG013 | FreightLine | 11 | 400 | Critical | 30000 | 55 | 01JAN2015 | 15OCT2023 |
| 14 | SIG014 | FreightLine | 3 | 100 | Medium | 7000 | 86 | 05MAY2020 | 01DEC2023 |
| 15 | SIG015 | ExpressLine | 4 | 110 | Medium | 9000 | 83 | 12APR2019 | 28NOV2023 |
Used for raw data ingestion from operational systems.
2. DATE ENGINEERING (MDY, INTCK, INTNX)
data railway_dates;
set railway_signals;
Years_In_Service = intck('year', Install_Date, today());
Next_Service_Date = intnx('month', Last_Service_Date, 6);
run;
proc print data=railway_dates;
var Signal_ID Route Install_Date Last_Service_Date Years_In_Service Next_Service_Date;
run;
OUTPUT:
| Obs | Signal_ID | Route | Install_Date | Last_Service_Date | Years_In_Service | Next_Service_Date |
|---|---|---|---|---|---|---|
| 1 | SIG001 | SouthLine | 01JAN2020 | 10DEC2023 | 6 | 23528 |
| 2 | SIG002 | EastLine | 15MAR2019 | 05NOV2023 | 7 | 23497 |
| 3 | SIG003 | NorthLine | 10FEB2018 | 20OCT2023 | 8 | 23467 |
| 4 | SIG004 | WestLine | 05MAY2021 | 01JAN2024 | 5 | 23558 |
| 5 | SIG005 | SouthLine | 12JUL2017 | 15SEP2023 | 9 | 23436 |
| 6 | SIG006 | EastLine | 09AUG2019 | 30NOV2023 | 7 | 23497 |
| 7 | SIG007 | NorthLine | 01JAN2016 | 05OCT2023 | 10 | 23467 |
| 8 | SIG008 | WestLine | 18DEC2020 | 12DEC2023 | 6 | 23528 |
| 9 | SIG009 | CentralLine | 02FEB2018 | 22NOV2023 | 8 | 23497 |
| 10 | SIG010 | CentralLine | 15JUN2022 | 05JAN2024 | 4 | 23558 |
| 11 | SIG011 | MetroLine | 20MAR2017 | 18SEP2023 | 9 | 23436 |
| 12 | SIG012 | MetroLine | 10JAN2021 | 02DEC2023 | 5 | 23528 |
| 13 | SIG013 | FreightLine | 01JAN2015 | 15OCT2023 | 11 | 23467 |
| 14 | SIG014 | FreightLine | 05MAY2020 | 01DEC2023 | 6 | 23528 |
| 15 | SIG015 | ExpressLine | 12APR2019 | 28NOV2023 | 7 | 23497 |
Used in real projects to:
· Calculate service life
· Predict next maintenance
· Build SLA reports
3. PROC SQL
proc sql;
create table route_summary as
select Route,
count(Signal_ID) as Total_Signals,
avg(Failure_Count) as Avg_Failures,
sum(Maintenance_Cost) as Total_Cost
from railway_dates
group by Route;
quit;
proc print data=route_summary;
var Route Total_Signals Avg_Failures Total_Cost;
run;
OUTPUT:
| Obs | Route | Total_Signals | Avg_Failures | Total_Cost |
|---|---|---|---|---|
| 1 | CentralLine | 2 | 3.5 | 16000 |
| 2 | EastLine | 2 | 3.0 | 12000 |
| 3 | ExpressLine | 1 | 4.0 | 9000 |
| 4 | FreightLine | 2 | 7.0 | 37000 |
| 5 | MetroLine | 2 | 5.0 | 22000 |
| 6 | NorthLine | 2 | 9.5 | 45000 |
| 7 | SouthLine | 2 | 6.0 | 27000 |
| 8 | WestLine | 2 | 2.0 | 8000 |
Used for:
· Aggregations
· Reporting dashboards
· KPI summaries
4. PROC MEANS
proc means data=railway_dates mean min max;
var Failure_Count Downtime_Minutes Maintenance_Cost Reliability_Index;
run;
OUTPUT:
The MEANS Procedure
| Variable | Mean | Minimum | Maximum |
|---|---|---|---|
Failure_Count Downtime_Minutes Maintenance_Cost Reliability_Index | 5.0666667 170.0000000 11733.33 79.8000000 | 1.0000000 20.0000000 2000.00 55.0000000 | 11.0000000 400.0000000 30000.00 98.0000000 |
Used by managers to:
· Understand average failures
· Identify worst-performing systems
5. PROC UNIVARIATE
proc univariate data=railway_dates;
var Downtime_Minutes;
histogram Downtime_Minutes;
run;
OUTPUT:
The UNIVARIATE Procedure
Variable: Downtime_Minutes
| Moments | |||
|---|---|---|---|
| N | 15 | Sum Weights | 15 |
| Mean | 170 | Sum Observations | 2550 |
| Std Deviation | 120.178439 | Variance | 14442.8571 |
| Skewness | 0.60370171 | Kurtosis | -0.7662038 |
| Uncorrected SS | 635700 | Corrected SS | 202200 |
| Coeff Variation | 70.6931993 | Std Error Mean | 31.0299395 |
| Basic Statistical Measures | |||
|---|---|---|---|
| Location | Variability | ||
| Mean | 170.0000 | Std Deviation | 120.17844 |
| Median | 120.0000 | Variance | 14443 |
| Mode | . | Range | 380.00000 |
| Interquartile Range | 190.00000 | ||
| Tests for Location: Mu0=0 | ||||
|---|---|---|---|---|
| Test | Statistic | p Value | ||
| Student's t | t | 5.47858 | Pr > |t| | <.0001 |
| Sign | M | 7.5 | Pr >= |M| | <.0001 |
| Signed Rank | S | 60 | Pr >= |S| | <.0001 |
| Quantiles (Definition 5) | |
|---|---|
| Level | Quantile |
| 100% Max | 400 |
| 99% | 400 |
| 95% | 400 |
| 90% | 360 |
| 75% Q3 | 260 |
| 50% Median | 120 |
| 25% Q1 | 70 |
| 10% | 30 |
| 5% | 20 |
| 1% | 20 |
| 0% Min | 20 |
| Extreme Observations | |||
|---|---|---|---|
| Lowest | Highest | ||
| Value | Obs | Value | Obs |
| 20 | 4 | 240 | 5 |
| 30 | 10 | 260 | 11 |
| 60 | 2 | 300 | 3 |
| 70 | 12 | 360 | 7 |
| 90 | 8 | 400 | 13 |
The UNIVARIATE Procedure
Used by QA teams to:
· Detect extreme downtime
· Identify outliers
6. PROC FREQ
proc freq data=railway_dates;
tables Safety_Impact Route;
run;
OUTPUT:
The FREQ Procedure
| Safety_Impact | Frequency | Percent | Cumulative Frequency | Cumulative Percent |
|---|---|---|---|---|
| Critical | 3 | 20.00 | 3 | 20.00 |
| High | 4 | 26.67 | 7 | 46.67 |
| Low | 3 | 20.00 | 10 | 66.67 |
| Medium | 5 | 33.33 | 15 | 100.00 |
| Route | Frequency | Percent | Cumulative Frequency | Cumulative Percent |
|---|---|---|---|---|
| CentralLine | 2 | 13.33 | 2 | 13.33 |
| EastLine | 2 | 13.33 | 4 | 26.67 |
| ExpressLine | 1 | 6.67 | 5 | 33.33 |
| FreightLine | 2 | 13.33 | 7 | 46.67 |
| MetroLine | 2 | 13.33 | 9 | 60.00 |
| NorthLine | 2 | 13.33 | 11 | 73.33 |
| SouthLine | 2 | 13.33 | 13 | 86.67 |
| WestLine | 2 | 13.33 | 15 | 100.00 |
Used to:
· Count safety risks
· Compare routes
7. PROC CORR
proc corr data=railway_dates;
var Failure_Count Downtime_Minutes Maintenance_Cost Reliability_Index;
run;
OUTPUT:
The CORR Procedure
| 4 Variables: | Failure_Count Downtime_Minutes Maintenance_Cost Reliability_Index |
|---|
| Simple Statistics | ||||||
|---|---|---|---|---|---|---|
| Variable | N | Mean | Std Dev | Sum | Minimum | Maximum |
| Failure_Count | 15 | 5.06667 | 3.28344 | 76.00000 | 1.00000 | 11.00000 |
| Downtime_Minutes | 15 | 170.00000 | 120.17844 | 2550 | 20.00000 | 400.00000 |
| Maintenance_Cost | 15 | 11733 | 8328 | 176000 | 2000 | 30000 |
| Reliability_Index | 15 | 79.80000 | 13.61302 | 1197 | 55.00000 | 98.00000 |
| Pearson Correlation Coefficients, N = 15 Prob > |r| under H0: Rho=0 | ||||
|---|---|---|---|---|
| Failure_Count | Downtime_Minutes | Maintenance_Cost | Reliability_Index | |
| Failure_Count | 1.00000 | 0.99559 <.0001 | 0.98551 <.0001 | -0.99366 <.0001 |
| Downtime_Minutes | 0.99559 <.0001 | 1.00000 | 0.99204 <.0001 | -0.99023 <.0001 |
| Maintenance_Cost | 0.98551 <.0001 | 0.99204 <.0001 | 1.00000 | -0.98026 <.0001 |
| Reliability_Index | -0.99366 <.0001 | -0.99023 <.0001 | -0.98026 <.0001 | 1.00000 |
Used to:
· Check if more failures → lower reliability
· Predict future risk
8. PROC SGPLOT
proc sgplot data=railway_dates;
scatter x=Failure_Count y=Reliability_Index;
run;
OUTPUT:
Used for:
· Executive dashboards
· Visual inspection
9. UTILIZATION MACRO
%macro classify;
data utilization;
set railway_dates;
if Failure_Count > 7 then Utilization = "Overloaded";
else if 4 < Failure_Count < 7 then Utilization = "Moderate";
else Utilization = "Stable";
run;
proc print data=utilization;
run;
%mend;
%classify;
OUTPUT:
| Obs | Signal_ID | Route | Failure_Count | Downtime_Minutes | Safety_Impact | Maintenance_Cost | Reliability_Index | Install_Date | Last_Service_Date | Years_In_Service | Next_Service_Date | Utilization |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | SIG001 | SouthLine | 5 | 180 | High | 12000 | 82 | 01JAN2020 | 10DEC2023 | 6 | 23528 | Moderate |
| 2 | SIG002 | EastLine | 2 | 60 | Low | 4000 | 95 | 15MAR2019 | 05NOV2023 | 7 | 23497 | Stable |
| 3 | SIG003 | NorthLine | 9 | 300 | Critical | 20000 | 65 | 10FEB2018 | 20OCT2023 | 8 | 23467 | Overloaded |
| 4 | SIG004 | WestLine | 1 | 20 | Low | 2000 | 98 | 05MAY2021 | 01JAN2024 | 5 | 23558 | Stable |
| 5 | SIG005 | SouthLine | 7 | 240 | High | 15000 | 70 | 12JUL2017 | 15SEP2023 | 9 | 23436 | Stable |
| 6 | SIG006 | EastLine | 4 | 120 | Medium | 8000 | 85 | 09AUG2019 | 30NOV2023 | 7 | 23497 | Stable |
| 7 | SIG007 | NorthLine | 10 | 360 | Critical | 25000 | 60 | 01JAN2016 | 05OCT2023 | 10 | 23467 | Overloaded |
| 8 | SIG008 | WestLine | 3 | 90 | Medium | 6000 | 88 | 18DEC2020 | 12DEC2023 | 6 | 23528 | Stable |
| 9 | SIG009 | CentralLine | 6 | 210 | High | 13000 | 75 | 02FEB2018 | 22NOV2023 | 8 | 23497 | Moderate |
| 10 | SIG010 | CentralLine | 1 | 30 | Low | 3000 | 97 | 15JUN2022 | 05JAN2024 | 4 | 23558 | Stable |
| 11 | SIG011 | MetroLine | 8 | 260 | High | 17000 | 68 | 20MAR2017 | 18SEP2023 | 9 | 23436 | Overloaded |
| 12 | SIG012 | MetroLine | 2 | 70 | Medium | 5000 | 90 | 10JAN2021 | 02DEC2023 | 5 | 23528 | Stable |
| 13 | SIG013 | FreightLine | 11 | 400 | Critical | 30000 | 55 | 01JAN2015 | 15OCT2023 | 11 | 23467 | Overloaded |
| 14 | SIG014 | FreightLine | 3 | 100 | Medium | 7000 | 86 | 05MAY2020 | 01DEC2023 | 6 | 23528 | Stable |
| 15 | SIG015 | ExpressLine | 4 | 110 | Medium | 9000 | 83 | 12APR2019 | 28NOV2023 | 7 | 23497 | Stable |
Used in industry for:
· Reusable logic
· Automation
· Batch processing
10. FRAUD DETECTION MACRO
%macro fraud;
data fraud_flags;
set utilization;
if Maintenance_Cost > 20000 and Failure_Count < 3 then Fraud_Flag = "Yes";
else Fraud_Flag = "No";
run;
proc print data=fraud_flags;
run;
%mend;
%fraud;
OUTPUT:
| Obs | Signal_ID | Route | Failure_Count | Downtime_Minutes | Safety_Impact | Maintenance_Cost | Reliability_Index | Install_Date | Last_Service_Date | Years_In_Service | Next_Service_Date | Utilization | Fraud_Flag |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | SIG001 | SouthLine | 5 | 180 | High | 12000 | 82 | 01JAN2020 | 10DEC2023 | 6 | 23528 | Moderate | No |
| 2 | SIG002 | EastLine | 2 | 60 | Low | 4000 | 95 | 15MAR2019 | 05NOV2023 | 7 | 23497 | Stable | No |
| 3 | SIG003 | NorthLine | 9 | 300 | Critical | 20000 | 65 | 10FEB2018 | 20OCT2023 | 8 | 23467 | Overloaded | No |
| 4 | SIG004 | WestLine | 1 | 20 | Low | 2000 | 98 | 05MAY2021 | 01JAN2024 | 5 | 23558 | Stable | No |
| 5 | SIG005 | SouthLine | 7 | 240 | High | 15000 | 70 | 12JUL2017 | 15SEP2023 | 9 | 23436 | Stable | No |
| 6 | SIG006 | EastLine | 4 | 120 | Medium | 8000 | 85 | 09AUG2019 | 30NOV2023 | 7 | 23497 | Stable | No |
| 7 | SIG007 | NorthLine | 10 | 360 | Critical | 25000 | 60 | 01JAN2016 | 05OCT2023 | 10 | 23467 | Overloaded | No |
| 8 | SIG008 | WestLine | 3 | 90 | Medium | 6000 | 88 | 18DEC2020 | 12DEC2023 | 6 | 23528 | Stable | No |
| 9 | SIG009 | CentralLine | 6 | 210 | High | 13000 | 75 | 02FEB2018 | 22NOV2023 | 8 | 23497 | Moderate | No |
| 10 | SIG010 | CentralLine | 1 | 30 | Low | 3000 | 97 | 15JUN2022 | 05JAN2024 | 4 | 23558 | Stable | No |
| 11 | SIG011 | MetroLine | 8 | 260 | High | 17000 | 68 | 20MAR2017 | 18SEP2023 | 9 | 23436 | Overloaded | No |
| 12 | SIG012 | MetroLine | 2 | 70 | Medium | 5000 | 90 | 10JAN2021 | 02DEC2023 | 5 | 23528 | Stable | No |
| 13 | SIG013 | FreightLine | 11 | 400 | Critical | 30000 | 55 | 01JAN2015 | 15OCT2023 | 11 | 23467 | Overloaded | No |
| 14 | SIG014 | FreightLine | 3 | 100 | Medium | 7000 | 86 | 05MAY2020 | 01DEC2023 | 6 | 23528 | Stable | No |
| 15 | SIG015 | ExpressLine | 4 | 110 | Medium | 9000 | 83 | 12APR2019 | 28NOV2023 | 7 | 23497 | Stable | No |
Very common in:
· Finance
· Maintenance
· Insurance analytics
Flags cases like:
Low failures but very high cost → possible fake billing.
11. SET / APPEND
data backup;
set fraud_flags;
run;
proc print data=backup;
run;
OUTPUT:
| Obs | Signal_ID | Route | Failure_Count | Downtime_Minutes | Safety_Impact | Maintenance_Cost | Reliability_Index | Install_Date | Last_Service_Date | Years_In_Service | Next_Service_Date | Utilization | Fraud_Flag |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | SIG001 | SouthLine | 5 | 180 | High | 12000 | 82 | 01JAN2020 | 10DEC2023 | 6 | 23528 | Moderate | No |
| 2 | SIG002 | EastLine | 2 | 60 | Low | 4000 | 95 | 15MAR2019 | 05NOV2023 | 7 | 23497 | Stable | No |
| 3 | SIG003 | NorthLine | 9 | 300 | Critical | 20000 | 65 | 10FEB2018 | 20OCT2023 | 8 | 23467 | Overloaded | No |
| 4 | SIG004 | WestLine | 1 | 20 | Low | 2000 | 98 | 05MAY2021 | 01JAN2024 | 5 | 23558 | Stable | No |
| 5 | SIG005 | SouthLine | 7 | 240 | High | 15000 | 70 | 12JUL2017 | 15SEP2023 | 9 | 23436 | Stable | No |
| 6 | SIG006 | EastLine | 4 | 120 | Medium | 8000 | 85 | 09AUG2019 | 30NOV2023 | 7 | 23497 | Stable | No |
| 7 | SIG007 | NorthLine | 10 | 360 | Critical | 25000 | 60 | 01JAN2016 | 05OCT2023 | 10 | 23467 | Overloaded | No |
| 8 | SIG008 | WestLine | 3 | 90 | Medium | 6000 | 88 | 18DEC2020 | 12DEC2023 | 6 | 23528 | Stable | No |
| 9 | SIG009 | CentralLine | 6 | 210 | High | 13000 | 75 | 02FEB2018 | 22NOV2023 | 8 | 23497 | Moderate | No |
| 10 | SIG010 | CentralLine | 1 | 30 | Low | 3000 | 97 | 15JUN2022 | 05JAN2024 | 4 | 23558 | Stable | No |
| 11 | SIG011 | MetroLine | 8 | 260 | High | 17000 | 68 | 20MAR2017 | 18SEP2023 | 9 | 23436 | Overloaded | No |
| 12 | SIG012 | MetroLine | 2 | 70 | Medium | 5000 | 90 | 10JAN2021 | 02DEC2023 | 5 | 23528 | Stable | No |
| 13 | SIG013 | FreightLine | 11 | 400 | Critical | 30000 | 55 | 01JAN2015 | 15OCT2023 | 11 | 23467 | Overloaded | No |
| 14 | SIG014 | FreightLine | 3 | 100 | Medium | 7000 | 86 | 05MAY2020 | 01DEC2023 | 6 | 23528 | Stable | No |
| 15 | SIG015 | ExpressLine | 4 | 110 | Medium | 9000 | 83 | 12APR2019 | 28NOV2023 | 7 | 23497 | Stable | No |
proc append base=railway_dates
data=backup force;
run;
proc print data=railway_dates;
run;
OUTPUT:
| Obs | Signal_ID | Route | Failure_Count | Downtime_Minutes | Safety_Impact | Maintenance_Cost | Reliability_Index | Install_Date | Last_Service_Date | Years_In_Service | Next_Service_Date |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | SIG001 | SouthLine | 5 | 180 | High | 12000 | 82 | 01JAN2020 | 10DEC2023 | 6 | 23528 |
| 2 | SIG002 | EastLine | 2 | 60 | Low | 4000 | 95 | 15MAR2019 | 05NOV2023 | 7 | 23497 |
| 3 | SIG003 | NorthLine | 9 | 300 | Critical | 20000 | 65 | 10FEB2018 | 20OCT2023 | 8 | 23467 |
| 4 | SIG004 | WestLine | 1 | 20 | Low | 2000 | 98 | 05MAY2021 | 01JAN2024 | 5 | 23558 |
| 5 | SIG005 | SouthLine | 7 | 240 | High | 15000 | 70 | 12JUL2017 | 15SEP2023 | 9 | 23436 |
| 6 | SIG006 | EastLine | 4 | 120 | Medium | 8000 | 85 | 09AUG2019 | 30NOV2023 | 7 | 23497 |
| 7 | SIG007 | NorthLine | 10 | 360 | Critical | 25000 | 60 | 01JAN2016 | 05OCT2023 | 10 | 23467 |
| 8 | SIG008 | WestLine | 3 | 90 | Medium | 6000 | 88 | 18DEC2020 | 12DEC2023 | 6 | 23528 |
| 9 | SIG009 | CentralLine | 6 | 210 | High | 13000 | 75 | 02FEB2018 | 22NOV2023 | 8 | 23497 |
| 10 | SIG010 | CentralLine | 1 | 30 | Low | 3000 | 97 | 15JUN2022 | 05JAN2024 | 4 | 23558 |
| 11 | SIG011 | MetroLine | 8 | 260 | High | 17000 | 68 | 20MAR2017 | 18SEP2023 | 9 | 23436 |
| 12 | SIG012 | MetroLine | 2 | 70 | Medium | 5000 | 90 | 10JAN2021 | 02DEC2023 | 5 | 23528 |
| 13 | SIG013 | FreightLine | 11 | 400 | Critical | 30000 | 55 | 01JAN2015 | 15OCT2023 | 11 | 23467 |
| 14 | SIG014 | FreightLine | 3 | 100 | Medium | 7000 | 86 | 05MAY2020 | 01DEC2023 | 6 | 23528 |
| 15 | SIG015 | ExpressLine | 4 | 110 | Medium | 9000 | 83 | 12APR2019 | 28NOV2023 | 7 | 23497 |
| 16 | SIG001 | SouthLine | 5 | 180 | High | 12000 | 82 | 01JAN2020 | 10DEC2023 | 6 | 23528 |
| 17 | SIG002 | EastLine | 2 | 60 | Low | 4000 | 95 | 15MAR2019 | 05NOV2023 | 7 | 23497 |
| 18 | SIG003 | NorthLine | 9 | 300 | Critical | 20000 | 65 | 10FEB2018 | 20OCT2023 | 8 | 23467 |
| 19 | SIG004 | WestLine | 1 | 20 | Low | 2000 | 98 | 05MAY2021 | 01JAN2024 | 5 | 23558 |
| 20 | SIG005 | SouthLine | 7 | 240 | High | 15000 | 70 | 12JUL2017 | 15SEP2023 | 9 | 23436 |
| 21 | SIG006 | EastLine | 4 | 120 | Medium | 8000 | 85 | 09AUG2019 | 30NOV2023 | 7 | 23497 |
| 22 | SIG007 | NorthLine | 10 | 360 | Critical | 25000 | 60 | 01JAN2016 | 05OCT2023 | 10 | 23467 |
| 23 | SIG008 | WestLine | 3 | 90 | Medium | 6000 | 88 | 18DEC2020 | 12DEC2023 | 6 | 23528 |
| 24 | SIG009 | CentralLine | 6 | 210 | High | 13000 | 75 | 02FEB2018 | 22NOV2023 | 8 | 23497 |
| 25 | SIG010 | CentralLine | 1 | 30 | Low | 3000 | 97 | 15JUN2022 | 05JAN2024 | 4 | 23558 |
| 26 | SIG011 | MetroLine | 8 | 260 | High | 17000 | 68 | 20MAR2017 | 18SEP2023 | 9 | 23436 |
| 27 | SIG012 | MetroLine | 2 | 70 | Medium | 5000 | 90 | 10JAN2021 | 02DEC2023 | 5 | 23528 |
| 28 | SIG013 | FreightLine | 11 | 400 | Critical | 30000 | 55 | 01JAN2015 | 15OCT2023 | 11 | 23467 |
| 29 | SIG014 | FreightLine | 3 | 100 | Medium | 7000 | 86 | 05MAY2020 | 01DEC2023 | 6 | 23528 |
| 30 | SIG015 | ExpressLine | 4 | 110 | Medium | 9000 | 83 | 12APR2019 | 28NOV2023 | 7 | 23497 |
Used for:
· Combining months
· Merging systems
· Creating history tables
12. TRANSPOSE
proc transpose data=route_summary out=transposed;
by Route;
run;
proc print data=transposed;
run;
OUTPUT:
| Obs | Route | _NAME_ | COL1 |
|---|---|---|---|
| 1 | CentralLine | Total_Signals | 2.0 |
| 2 | CentralLine | Avg_Failures | 3.5 |
| 3 | CentralLine | Total_Cost | 16000.0 |
| 4 | EastLine | Total_Signals | 2.0 |
| 5 | EastLine | Avg_Failures | 3.0 |
| 6 | EastLine | Total_Cost | 12000.0 |
| 7 | ExpressLine | Total_Signals | 1.0 |
| 8 | ExpressLine | Avg_Failures | 4.0 |
| 9 | ExpressLine | Total_Cost | 9000.0 |
| 10 | FreightLine | Total_Signals | 2.0 |
| 11 | FreightLine | Avg_Failures | 7.0 |
| 12 | FreightLine | Total_Cost | 37000.0 |
| 13 | MetroLine | Total_Signals | 2.0 |
| 14 | MetroLine | Avg_Failures | 5.0 |
| 15 | MetroLine | Total_Cost | 22000.0 |
| 16 | NorthLine | Total_Signals | 2.0 |
| 17 | NorthLine | Avg_Failures | 9.5 |
| 18 | NorthLine | Total_Cost | 45000.0 |
| 19 | SouthLine | Total_Signals | 2.0 |
| 20 | SouthLine | Avg_Failures | 6.0 |
| 21 | SouthLine | Total_Cost | 27000.0 |
| 22 | WestLine | Total_Signals | 2.0 |
| 23 | WestLine | Avg_Failures | 2.0 |
| 24 | WestLine | Total_Cost | 8000.0 |
Used for:
· Pivot reports
· Excel-style layouts
13. CHARACTER FUNCTIONS
data char_demo;
set railway_dates;
Route_Clean = propcase(strip(Route));
Signal_Upper = upcase(Signal_ID);
Safety_Lower = lowcase(Safety_Impact);
Full_Tag = catx("-", Signal_ID, Route);
run;
proc print data=char_demo;
var Signal_ID Route Route_Clean Signal_Upper Safety_Lower Safety_Impact Full_Tag;
run;
OUTPUT:
| Obs | Signal_ID | Route | Route_Clean | Signal_Upper | Safety_Lower | Safety_Impact | Full_Tag |
|---|---|---|---|---|---|---|---|
| 1 | SIG001 | SouthLine | Southline | SIG001 | high | High | SIG001-SouthLine |
| 2 | SIG002 | EastLine | Eastline | SIG002 | low | Low | SIG002-EastLine |
| 3 | SIG003 | NorthLine | Northline | SIG003 | critical | Critical | SIG003-NorthLine |
| 4 | SIG004 | WestLine | Westline | SIG004 | low | Low | SIG004-WestLine |
| 5 | SIG005 | SouthLine | Southline | SIG005 | high | High | SIG005-SouthLine |
| 6 | SIG006 | EastLine | Eastline | SIG006 | medium | Medium | SIG006-EastLine |
| 7 | SIG007 | NorthLine | Northline | SIG007 | critical | Critical | SIG007-NorthLine |
| 8 | SIG008 | WestLine | Westline | SIG008 | medium | Medium | SIG008-WestLine |
| 9 | SIG009 | CentralLine | Centralline | SIG009 | high | High | SIG009-CentralLine |
| 10 | SIG010 | CentralLine | Centralline | SIG010 | low | Low | SIG010-CentralLine |
| 11 | SIG011 | MetroLine | Metroline | SIG011 | high | High | SIG011-MetroLine |
| 12 | SIG012 | MetroLine | Metroline | SIG012 | medium | Medium | SIG012-MetroLine |
| 13 | SIG013 | FreightLine | Freightline | SIG013 | critical | Critical | SIG013-FreightLine |
| 14 | SIG014 | FreightLine | Freightline | SIG014 | medium | Medium | SIG014-FreightLine |
| 15 | SIG015 | ExpressLine | Expressline | SIG015 | medium | Medium | SIG015-ExpressLine |
| 16 | SIG001 | SouthLine | Southline | SIG001 | high | High | SIG001-SouthLine |
| 17 | SIG002 | EastLine | Eastline | SIG002 | low | Low | SIG002-EastLine |
| 18 | SIG003 | NorthLine | Northline | SIG003 | critical | Critical | SIG003-NorthLine |
| 19 | SIG004 | WestLine | Westline | SIG004 | low | Low | SIG004-WestLine |
| 20 | SIG005 | SouthLine | Southline | SIG005 | high | High | SIG005-SouthLine |
| 21 | SIG006 | EastLine | Eastline | SIG006 | medium | Medium | SIG006-EastLine |
| 22 | SIG007 | NorthLine | Northline | SIG007 | critical | Critical | SIG007-NorthLine |
| 23 | SIG008 | WestLine | Westline | SIG008 | medium | Medium | SIG008-WestLine |
| 24 | SIG009 | CentralLine | Centralline | SIG009 | high | High | SIG009-CentralLine |
| 25 | SIG010 | CentralLine | Centralline | SIG010 | low | Low | SIG010-CentralLine |
| 26 | SIG011 | MetroLine | Metroline | SIG011 | high | High | SIG011-MetroLine |
| 27 | SIG012 | MetroLine | Metroline | SIG012 | medium | Medium | SIG012-MetroLine |
| 28 | SIG013 | FreightLine | Freightline | SIG013 | critical | Critical | SIG013-FreightLine |
| 29 | SIG014 | FreightLine | Freightline | SIG014 | medium | Medium | SIG014-FreightLine |
| 30 | SIG015 | ExpressLine | Expressline | SIG015 | medium | Medium | SIG015-ExpressLine |
Used for:
· Cleaning messy data
· Standardizing names
· Generating IDs
14. NUMERIC FUNCTIONS
data numeric_demo;
set railway_dates;
Cost_Rounded = round(Maintenance_Cost,1000);
Downtime_Hours = Downtime_Minutes/60;
Safe_Reliability = coalesce(Reliability_Index, 0);
run;
proc print data=numeric_demo;
var Signal_ID Route Maintenance_Cost Cost_Rounded Downtime_Minutes Downtime_Hours
Reliability_Index Safe_Reliability;
run;
OUTPUT:
| Obs | Signal_ID | Route | Maintenance_Cost | Cost_Rounded | Downtime_Minutes | Downtime_Hours | Reliability_Index | Safe_Reliability |
|---|---|---|---|---|---|---|---|---|
| 1 | SIG001 | SouthLine | 12000 | 12000 | 180 | 3.00000 | 82 | 82 |
| 2 | SIG002 | EastLine | 4000 | 4000 | 60 | 1.00000 | 95 | 95 |
| 3 | SIG003 | NorthLine | 20000 | 20000 | 300 | 5.00000 | 65 | 65 |
| 4 | SIG004 | WestLine | 2000 | 2000 | 20 | 0.33333 | 98 | 98 |
| 5 | SIG005 | SouthLine | 15000 | 15000 | 240 | 4.00000 | 70 | 70 |
| 6 | SIG006 | EastLine | 8000 | 8000 | 120 | 2.00000 | 85 | 85 |
| 7 | SIG007 | NorthLine | 25000 | 25000 | 360 | 6.00000 | 60 | 60 |
| 8 | SIG008 | WestLine | 6000 | 6000 | 90 | 1.50000 | 88 | 88 |
| 9 | SIG009 | CentralLine | 13000 | 13000 | 210 | 3.50000 | 75 | 75 |
| 10 | SIG010 | CentralLine | 3000 | 3000 | 30 | 0.50000 | 97 | 97 |
| 11 | SIG011 | MetroLine | 17000 | 17000 | 260 | 4.33333 | 68 | 68 |
| 12 | SIG012 | MetroLine | 5000 | 5000 | 70 | 1.16667 | 90 | 90 |
| 13 | SIG013 | FreightLine | 30000 | 30000 | 400 | 6.66667 | 55 | 55 |
| 14 | SIG014 | FreightLine | 7000 | 7000 | 100 | 1.66667 | 86 | 86 |
| 15 | SIG015 | ExpressLine | 9000 | 9000 | 110 | 1.83333 | 83 | 83 |
| 16 | SIG001 | SouthLine | 12000 | 12000 | 180 | 3.00000 | 82 | 82 |
| 17 | SIG002 | EastLine | 4000 | 4000 | 60 | 1.00000 | 95 | 95 |
| 18 | SIG003 | NorthLine | 20000 | 20000 | 300 | 5.00000 | 65 | 65 |
| 19 | SIG004 | WestLine | 2000 | 2000 | 20 | 0.33333 | 98 | 98 |
| 20 | SIG005 | SouthLine | 15000 | 15000 | 240 | 4.00000 | 70 | 70 |
| 21 | SIG006 | EastLine | 8000 | 8000 | 120 | 2.00000 | 85 | 85 |
| 22 | SIG007 | NorthLine | 25000 | 25000 | 360 | 6.00000 | 60 | 60 |
| 23 | SIG008 | WestLine | 6000 | 6000 | 90 | 1.50000 | 88 | 88 |
| 24 | SIG009 | CentralLine | 13000 | 13000 | 210 | 3.50000 | 75 | 75 |
| 25 | SIG010 | CentralLine | 3000 | 3000 | 30 | 0.50000 | 97 | 97 |
| 26 | SIG011 | MetroLine | 17000 | 17000 | 260 | 4.33333 | 68 | 68 |
| 27 | SIG012 | MetroLine | 5000 | 5000 | 70 | 1.16667 | 90 | 90 |
| 28 | SIG013 | FreightLine | 30000 | 30000 | 400 | 6.66667 | 55 | 55 |
| 29 | SIG014 | FreightLine | 7000 | 7000 | 100 | 1.66667 | 86 | 86 |
| 30 | SIG015 | ExpressLine | 9000 | 9000 | 110 | 1.83333 | 83 | 83 |
15. PROC DATASETS DELETE
proc datasets library=work;
delete backup;
quit;
LOG:
Used to:
· Free memory
· Avoid confusion
· Clean temporary tables
WHY THIS PROJECT IS VERY STRONG
This single project demonstrates:
· Data creation
· Business thinking
· Statistical analysis
· Automation
· Fraud detection
· Visualization
CONCLUSION
Through this project, we proved that railway signal data is not just numbers — it is directly connected to human safety and operational efficiency.
Using SAS, we were able to:
· Identify high-risk signal systems
· Measure reliability across routes
· Visualize downtime and failure patterns
· Automatically classify utilization levels
· Detect suspicious maintenance behavior (fraud logic)
Instead of reacting after accidents happen, this kind of analytics allows organizations to:
Predict problems before they become disasters.
The biggest learning from this project is:
Data does not just support decisions — it can prevent real-world failures.
INTERVIEW QUESTIONS FOR YOU
1.What happens during the compilation and execution phase of a DATA step?
2.What is the difference between SET and MERGE?
3. What is HAVING clause?
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
About the Author:
SAS Learning Hub is a data analytics and SAS programming platform focused on clinical, financial, and real-world data analysis. The content is created by professionals with academic training in Pharmaceutics and hands-on experience in Base SAS, PROC SQL, Macros, SDTM, and ADaM, providing practical and industry-relevant SAS learning resources.
Disclaimer:
The datasets and analysis in this article are created for educational and demonstration purposes only. They do not represent Railway Signals data.
Our Mission:
This blog provides industry-focused SAS programming tutorials and analytics projects covering finance, healthcare, and technology.
This project is suitable for:
· Students learning SAS
· Data analysts building portfolios
· Professionals preparing for SAS interviews
· Bloggers writing about analytics and smart cities
· EV and energy industry professionals
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Comments
Post a Comment