390.Can SAS Predict Airline Disasters Before They Happen?
Can SAS Predict Airline Disasters Before They Happen?
HERE IN THIS PROJECT WE USED THESE SAS STATEMENTS —
DATA STEP | PROC SQL | PROC PRINT | PROC SGPLOT | MACROS | PROC CORR | PROC MEANS | PROC FREQ | PROC UNIVARIATE | APPEND | PROC SORT | MERGE | PROC DATASETS DELETE | DATA FUNCTIONS
INTRODUCTION
The airline industry operates in one of the most time-sensitive and safety-critical environments in the world. Every day, thousands of aircraft take off and land across global airspace, and behind each flight is a complex system of human scheduling, regulatory compliance, and fatigue management.
Among all operational components, airline crew scheduling is one of the most crucial and sensitive functions. Pilots, co-pilots, and cabin crew must follow strict regulations regarding:
· Maximum allowable flight hours
· Minimum mandatory rest hours
· Fatigue and stress thresholds
· Aircraft type certifications
· Regulatory compliance with aviation authorities
Any failure in this system can result in:
· Flight delays and cancellations
· Crew burnout and attrition
· Regulatory penalties and audits
· Severe safety risks and reputational damage
Therefore, modern airlines rely heavily on data analytics systems to monitor and optimize crew schedules in real time.
This project simulates a real-world airline crew analytics platform built using SAS, where we design and analyze a crew scheduling dataset using:
· PROC SQL
· PROC MEANS
· PROC UNIVARIATE
· PROC FREQ
· PROC CORR
· PROC SGPLOT
· SAS Macros
· Date functions (MDY, INTCK, INTNX)
· Data integration (SET, MERGE, APPEND, TRANSPOSE)
· Data quality functions (character and numeric functions)
The system not only performs statistical analysis but also implements business logic for fatigue detection, utilization classification, and fraud/risk flagging, exactly as used in real airline operations.
BUSINESS CONTEXT
In real airline companies such as Air India, Emirates, Indigo, Qatar Airways, Lufthansa, and British Airways, crew management departments continuously answer critical questions like:
· Which crews are overworked?
· Are minimum rest hours being violated?
· Which aircraft types generate more fatigue?
· Is any crew member manipulating duty logs?
· How does flight load impact human performance?
These questions cannot be answered manually. They require:
· Automated data pipelines
· Statistical monitoring
· Visualization dashboards
· Compliance alerts
· Risk analytics
This SAS project acts as a mini airline operations control center where management can:
· Monitor fatigue levels
· Detect unsafe schedules
· Compare utilization patterns
· Identify high-risk crews
· Ensure regulatory compliance
TABLE OF CONTENTS
1. Introduction
2. Business Context
3. Dataset Design
4. Variable Definitions
5. Data Creation in SAS
6. Date Handling and Formats
7. Utilization Classification Macro
8. Fraud and Fatigue Detection Macro
9. PROC SQL – Summary Analytics
10. PROC MEANS – Descriptive Statistics
11. PROC UNIVARIATE – Distribution Analysis
12. PROC FREQ – Categorical Insights
13. PROC CORR – Relationship Analysis
14. PROC SGPLOT – Visualization
15. Data Integration (SET, MERGE, APPEND)
16. Data Reshaping (TRANSPOSE)
17. Character Functions Usage
18. Numeric Functions Usage
19. Data Cleaning and Deletion
20. Conclusion
Why this project is realistic
This project exactly mirrors how airline analytics systems are built in real companies like:
· Emirates
· Indigo
· Air India
· Qatar Airways
Crew management teams track:
· Working hours
· Rest time
· Fatigue levels
· Regulatory compliance
· Fraud or data manipulation
Dataset design logic
Each variable has operational meaning:
Variable | Meaning |
Crew_ID | Unique employee ID |
Name | Crew member name |
Aircraft_Type | Aircraft certification |
Duty_Date | Date of duty |
Flight_Hours | Total flying hours |
Rest_Hours | Total rest hours |
Delay_Incidents | Delays caused |
Fatigue_Score | Human performance metric |
Utilization_Pct | Productivity ratio |
Utilization_Status | High/Medium/Low |
Fraud_Flag | Risk indicator |
STEP 1 – CREATE MAIN DATASET
data crew_schedule;
input Crew_ID $ Name:$13. Aircraft_Type $ Duty_Date :date9.
Flight_Hours Rest_Hours Delay_Incidents Fatigue_Score;
Utilization_Pct = Flight_Hours / 12;
format Duty_Date date9. Utilization_Pct percent8.2;
datalines;
C001 Ramesh_A320 A320 01JAN2026 8 10 1 60
C002 Sita_B737 B737 02JAN2026 10 8 2 75
C003 Arjun_A320 A320 03JAN2026 6 12 0 45
C004 Meera_A350 A350 04JAN2026 11 7 3 85
C005 Kiran_B787 B787 05JAN2026 9 9 1 65
C006 Pooja_A320 A320 06JAN2026 12 6 4 90
C007 Rahul_B737 B737 07JAN2026 7 11 0 50
C008 Neha_A350 A350 08JAN2026 10 8 2 70
C009 Vinay_B787 B787 09JAN2026 5 13 0 40
C010 Kavya_A320 A320 10JAN2026 9 9 1 68
C011 Ajay_B737 B737 11JAN2026 11 7 3 82
C012 Divya_A350 A350 12JAN2026 8 10 1 60
C013 Sanjay_B787 B787 13JAN2026 12 6 4 92
C014 Anu_A320 A320 14JAN2026 6 12 0 48
C015 Rohit_B737 B737 15JAN2026 10 8 2 75
C016 Sneha_A350 A350 16JAN2026 7 11 0 55
;
run;
proc print data=crew_schedule;
run;
OUTPUT:
| Obs | Crew_ID | Name | Aircraft_Type | Duty_Date | Flight_Hours | Rest_Hours | Delay_Incidents | Fatigue_Score | Utilization_Pct |
|---|---|---|---|---|---|---|---|---|---|
| 1 | C001 | Ramesh_A320 | A320 | 01JAN2026 | 8 | 10 | 1 | 60 | 66.67% |
| 2 | C002 | Sita_B737 | B737 | 02JAN2026 | 10 | 8 | 2 | 75 | 83.33% |
| 3 | C003 | Arjun_A320 | A320 | 03JAN2026 | 6 | 12 | 0 | 45 | 50.00% |
| 4 | C004 | Meera_A350 | A350 | 04JAN2026 | 11 | 7 | 3 | 85 | 91.67% |
| 5 | C005 | Kiran_B787 | B787 | 05JAN2026 | 9 | 9 | 1 | 65 | 75.00% |
| 6 | C006 | Pooja_A320 | A320 | 06JAN2026 | 12 | 6 | 4 | 90 | 100.0% |
| 7 | C007 | Rahul_B737 | B737 | 07JAN2026 | 7 | 11 | 0 | 50 | 58.33% |
| 8 | C008 | Neha_A350 | A350 | 08JAN2026 | 10 | 8 | 2 | 70 | 83.33% |
| 9 | C009 | Vinay_B787 | B787 | 09JAN2026 | 5 | 13 | 0 | 40 | 41.67% |
| 10 | C010 | Kavya_A320 | A320 | 10JAN2026 | 9 | 9 | 1 | 68 | 75.00% |
| 11 | C011 | Ajay_B737 | B737 | 11JAN2026 | 11 | 7 | 3 | 82 | 91.67% |
| 12 | C012 | Divya_A350 | A350 | 12JAN2026 | 8 | 10 | 1 | 60 | 66.67% |
| 13 | C013 | Sanjay_B787 | B787 | 13JAN2026 | 12 | 6 | 4 | 92 | 100.0% |
| 14 | C014 | Anu_A320 | A320 | 14JAN2026 | 6 | 12 | 0 | 48 | 50.00% |
| 15 | C015 | Rohit_B737 | B737 | 15JAN2026 | 10 | 8 | 2 | 75 | 83.33% |
| 16 | C016 | Sneha_A350 | A350 | 16JAN2026 | 7 | 11 | 0 | 55 | 58.33% |
STEP 2 – UTILIZATION CLASSIFICATION MACRO
%macro utilization_class;
data crew_schedule;
set crew_schedule;
length Utilization_Status $8.;
if Flight_Hours >= 10 then Utilization_Status = "HIGH";
else if Flight_Hours >= 7 then Utilization_Status = "MEDIUM";
else Utilization_Status = "LOW";
run;
proc print data=crew_schedule;
var Crew_ID Name Aircraft_Type Flight_Hours Utilization_Status;
run;
%mend;
%utilization_class;
OUTPUT:
| Obs | Crew_ID | Name | Aircraft_Type | Flight_Hours | Utilization_Status |
|---|---|---|---|---|---|
| 1 | C001 | Ramesh_A320 | A320 | 8 | MEDIUM |
| 2 | C002 | Sita_B737 | B737 | 10 | HIGH |
| 3 | C003 | Arjun_A320 | A320 | 6 | LOW |
| 4 | C004 | Meera_A350 | A350 | 11 | HIGH |
| 5 | C005 | Kiran_B787 | B787 | 9 | MEDIUM |
| 6 | C006 | Pooja_A320 | A320 | 12 | HIGH |
| 7 | C007 | Rahul_B737 | B737 | 7 | MEDIUM |
| 8 | C008 | Neha_A350 | A350 | 10 | HIGH |
| 9 | C009 | Vinay_B787 | B787 | 5 | LOW |
| 10 | C010 | Kavya_A320 | A320 | 9 | MEDIUM |
| 11 | C011 | Ajay_B737 | B737 | 11 | HIGH |
| 12 | C012 | Divya_A350 | A350 | 8 | MEDIUM |
| 13 | C013 | Sanjay_B787 | B787 | 12 | HIGH |
| 14 | C014 | Anu_A320 | A320 | 6 | LOW |
| 15 | C015 | Rohit_B737 | B737 | 10 | HIGH |
| 16 | C016 | Sneha_A350 | A350 | 7 | MEDIUM |
Airlines classify crew into:
· Overworked (HIGH)
· Normal (MEDIUM)
· Underused (LOW)
STEP 3 – FATIGUE / FRAUD DETECTION MACRO
%macro fraud_detection;
data crew_schedule;
set crew_schedule;
if Fatigue_Score > 80 and Rest_Hours < 8 then Fraud_Flag = "YES";
else Fraud_Flag = "NO";
run;
proc print data=crew_schedule;
var Crew_ID Name Aircraft_Type Fatigue_Score Rest_Hours Fraud_Flag;
run;
%mend;
%fraud_detection;
OUTPUT:
| Obs | Crew_ID | Name | Aircraft_Type | Fatigue_Score | Rest_Hours | Fraud_Flag |
|---|---|---|---|---|---|---|
| 1 | C001 | Ramesh_A320 | A320 | 60 | 10 | NO |
| 2 | C002 | Sita_B737 | B737 | 75 | 8 | NO |
| 3 | C003 | Arjun_A320 | A320 | 45 | 12 | NO |
| 4 | C004 | Meera_A350 | A350 | 85 | 7 | YES |
| 5 | C005 | Kiran_B787 | B787 | 65 | 9 | NO |
| 6 | C006 | Pooja_A320 | A320 | 90 | 6 | YES |
| 7 | C007 | Rahul_B737 | B737 | 50 | 11 | NO |
| 8 | C008 | Neha_A350 | A350 | 70 | 8 | NO |
| 9 | C009 | Vinay_B787 | B787 | 40 | 13 | NO |
| 10 | C010 | Kavya_A320 | A320 | 68 | 9 | NO |
| 11 | C011 | Ajay_B737 | B737 | 82 | 7 | YES |
| 12 | C012 | Divya_A350 | A350 | 60 | 10 | NO |
| 13 | C013 | Sanjay_B787 | B787 | 92 | 6 | YES |
| 14 | C014 | Anu_A320 | A320 | 48 | 12 | NO |
| 15 | C015 | Rohit_B737 | B737 | 75 | 8 | NO |
| 16 | C016 | Sneha_A350 | A350 | 55 | 11 | NO |
In real systems:
Some crew manipulate logs to:
· Avoid night shifts
· Claim overtime
· Hide fatigue
Hence:
If fatigue is high + rest is low → flag
STEP 4 – PROC SQL ANALYSIS
proc sql;
create table summary_sql as
select Aircraft_Type,
avg(Flight_Hours) as Avg_Flight,
avg(Rest_Hours) as Avg_Rest,
sum(Delay_Incidents) as Total_Delays
from crew_schedule
group by Aircraft_Type;
quit;
proc print data=summary_sql;
run;
OUTPUT:
| Obs | Aircraft_Type | Avg_Flight | Avg_Rest | Total_Delays |
|---|---|---|---|---|
| 1 | A320 | 8.20000 | 9.80000 | 6 |
| 2 | A350 | 9.00000 | 9.00000 | 6 |
| 3 | B737 | 9.50000 | 8.50000 | 7 |
| 4 | B787 | 8.66667 | 9.33333 | 5 |
SQL is used for:
· Dashboards
· KPI reports
· Management decisions
STEP 5 – PROC MEANS
proc means data=crew_schedule mean min max std;
var Flight_Hours Rest_Hours Fatigue_Score;
run;
OUTPUT:
The MEANS Procedure
| Variable | Mean | Minimum | Maximum | Std Dev |
|---|---|---|---|---|
Flight_Hours Rest_Hours Fatigue_Score | 8.8125000 9.1875000 66.2500000 | 5.0000000 6.0000000 40.0000000 | 12.0000000 13.0000000 92.0000000 | 2.1975365 2.1975365 16.2090510 |
These are risk models:
· High mean fatigue → unsafe operations
· High variance → inconsistent scheduling
STEP 6 – PROC UNIVARIATE
proc univariate data=crew_schedule;
var Fatigue_Score;
histogram;
run;
OUTPUT:
The UNIVARIATE Procedure
Variable: Fatigue_Score
| Moments | |||
|---|---|---|---|
| N | 16 | Sum Weights | 16 |
| Mean | 66.25 | Sum Observations | 1060 |
| Std Deviation | 16.209051 | Variance | 262.733333 |
| Skewness | 0.04293774 | Kurtosis | -1.0626313 |
| Uncorrected SS | 74166 | Corrected SS | 3941 |
| Coeff Variation | 24.466492 | Std Error Mean | 4.05226274 |
| Basic Statistical Measures | |||
|---|---|---|---|
| Location | Variability | ||
| Mean | 66.25000 | Std Deviation | 16.20905 |
| Median | 66.50000 | Variance | 262.73333 |
| Mode | 60.00000 | Range | 52.00000 |
| Interquartile Range | 26.00000 | ||
Note: The mode displayed is the smallest of 2 modes with a count of 2.
| Tests for Location: Mu0=0 | ||||
|---|---|---|---|---|
| Test | Statistic | p Value | ||
| Student's t | t | 16.34889 | Pr > |t| | <.0001 |
| Sign | M | 8 | Pr >= |M| | <.0001 |
| Signed Rank | S | 68 | Pr >= |S| | <.0001 |
| Quantiles (Definition 5) | |
|---|---|
| Level | Quantile |
| 100% Max | 92.0 |
| 99% | 92.0 |
| 95% | 92.0 |
| 90% | 90.0 |
| 75% Q3 | 78.5 |
| 50% Median | 66.5 |
| 25% Q1 | 52.5 |
| 10% | 45.0 |
| 5% | 40.0 |
| 1% | 40.0 |
| 0% Min | 40.0 |
| Extreme Observations | |||
|---|---|---|---|
| Lowest | Highest | ||
| Value | Obs | Value | Obs |
| 40 | 9 | 75 | 15 |
| 45 | 3 | 82 | 11 |
| 48 | 14 | 85 | 4 |
| 50 | 7 | 90 | 6 |
| 55 | 16 | 92 | 13 |
The UNIVARIATE Procedure
These are risk models:
· High mean fatigue → unsafe operations
· High variance → inconsistent scheduling
STEP 7 – PROC FREQ
proc freq data=crew_schedule;
tables Utilization_Status Fraud_Flag Fatigue_Score;
run;
OUTPUT:
The FREQ Procedure
| Utilization_Status | Frequency | Percent | Cumulative Frequency | Cumulative Percent |
|---|---|---|---|---|
| HIGH | 7 | 43.75 | 7 | 43.75 |
| LOW | 3 | 18.75 | 10 | 62.50 |
| MEDIUM | 6 | 37.50 | 16 | 100.00 |
| Fraud_Flag | Frequency | Percent | Cumulative Frequency | Cumulative Percent |
|---|---|---|---|---|
| NO | 12 | 75.00 | 12 | 75.00 |
| YES | 4 | 25.00 | 16 | 100.00 |
| Fatigue_Score | Frequency | Percent | Cumulative Frequency | Cumulative Percent |
|---|---|---|---|---|
| 40 | 1 | 6.25 | 1 | 6.25 |
| 45 | 1 | 6.25 | 2 | 12.50 |
| 48 | 1 | 6.25 | 3 | 18.75 |
| 50 | 1 | 6.25 | 4 | 25.00 |
| 55 | 1 | 6.25 | 5 | 31.25 |
| 60 | 2 | 12.50 | 7 | 43.75 |
| 65 | 1 | 6.25 | 8 | 50.00 |
| 68 | 1 | 6.25 | 9 | 56.25 |
| 70 | 1 | 6.25 | 10 | 62.50 |
| 75 | 2 | 12.50 | 12 | 75.00 |
| 82 | 1 | 6.25 | 13 | 81.25 |
| 85 | 1 | 6.25 | 14 | 87.50 |
| 90 | 1 | 6.25 | 15 | 93.75 |
| 92 | 1 | 6.25 | 16 | 100.00 |
STEP 8 – PROC CORR
proc corr data=crew_schedule;
var Flight_Hours Rest_Hours Fatigue_Score;
run;
OUTPUT:
The CORR Procedure
| 3 Variables: | Flight_Hours Rest_Hours Fatigue_Score |
|---|
| Simple Statistics | ||||||
|---|---|---|---|---|---|---|
| Variable | N | Mean | Std Dev | Sum | Minimum | Maximum |
| Flight_Hours | 16 | 8.81250 | 2.19754 | 141.00000 | 5.00000 | 12.00000 |
| Rest_Hours | 16 | 9.18750 | 2.19754 | 147.00000 | 6.00000 | 13.00000 |
| Fatigue_Score | 16 | 66.25000 | 16.20905 | 1060 | 40.00000 | 92.00000 |
| Pearson Correlation Coefficients, N = 16 Prob > |r| under H0: Rho=0 | |||
|---|---|---|---|
| Flight_Hours | Rest_Hours | Fatigue_Score | |
| Flight_Hours | 1.00000 | -1.00000 <.0001 | 0.99148 <.0001 |
| Rest_Hours | -1.00000 <.0001 | 1.00000 | -0.99148 <.0001 |
| Fatigue_Score | 0.99148 <.0001 | -0.99148 <.0001 | 1.00000 |
This tells management:
Does more flying cause fatigue?
If correlation is strong → reschedule crews.
STEP 9 – PROC SGPLOT
proc sgplot data=crew_schedule;
scatter x=Flight_Hours y=Fatigue_Score;
reg x=Flight_Hours y=Fatigue_Score;
run;
OUTPUT:
Executives do not read tables.
They want graphs:
Flight hours vs fatigue visually proves burnout.
STEP 10 – DATE FUNCTIONS
data crew_dates;
set crew_schedule;
Next_Duty = intnx('day', Duty_Date, 2);
Gap_Days = intck('day', Duty_Date, Next_Duty);
run;
proc print data=crew_dates;
run;
OUTPUT:
| Obs | Crew_ID | Name | Aircraft_Type | Duty_Date | Flight_Hours | Rest_Hours | Delay_Incidents | Fatigue_Score | Utilization_Pct | Utilization_Status | Fraud_Flag | Next_Duty | Gap_Days |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | C001 | Ramesh_A320 | A320 | 01JAN2026 | 8 | 10 | 1 | 60 | 66.67% | MEDIUM | NO | 24109 | 2 |
| 2 | C002 | Sita_B737 | B737 | 02JAN2026 | 10 | 8 | 2 | 75 | 83.33% | HIGH | NO | 24110 | 2 |
| 3 | C003 | Arjun_A320 | A320 | 03JAN2026 | 6 | 12 | 0 | 45 | 50.00% | LOW | NO | 24111 | 2 |
| 4 | C004 | Meera_A350 | A350 | 04JAN2026 | 11 | 7 | 3 | 85 | 91.67% | HIGH | YES | 24112 | 2 |
| 5 | C005 | Kiran_B787 | B787 | 05JAN2026 | 9 | 9 | 1 | 65 | 75.00% | MEDIUM | NO | 24113 | 2 |
| 6 | C006 | Pooja_A320 | A320 | 06JAN2026 | 12 | 6 | 4 | 90 | 100.0% | HIGH | YES | 24114 | 2 |
| 7 | C007 | Rahul_B737 | B737 | 07JAN2026 | 7 | 11 | 0 | 50 | 58.33% | MEDIUM | NO | 24115 | 2 |
| 8 | C008 | Neha_A350 | A350 | 08JAN2026 | 10 | 8 | 2 | 70 | 83.33% | HIGH | NO | 24116 | 2 |
| 9 | C009 | Vinay_B787 | B787 | 09JAN2026 | 5 | 13 | 0 | 40 | 41.67% | LOW | NO | 24117 | 2 |
| 10 | C010 | Kavya_A320 | A320 | 10JAN2026 | 9 | 9 | 1 | 68 | 75.00% | MEDIUM | NO | 24118 | 2 |
| 11 | C011 | Ajay_B737 | B737 | 11JAN2026 | 11 | 7 | 3 | 82 | 91.67% | HIGH | YES | 24119 | 2 |
| 12 | C012 | Divya_A350 | A350 | 12JAN2026 | 8 | 10 | 1 | 60 | 66.67% | MEDIUM | NO | 24120 | 2 |
| 13 | C013 | Sanjay_B787 | B787 | 13JAN2026 | 12 | 6 | 4 | 92 | 100.0% | HIGH | YES | 24121 | 2 |
| 14 | C014 | Anu_A320 | A320 | 14JAN2026 | 6 | 12 | 0 | 48 | 50.00% | LOW | NO | 24122 | 2 |
| 15 | C015 | Rohit_B737 | B737 | 15JAN2026 | 10 | 8 | 2 | 75 | 83.33% | HIGH | NO | 24123 | 2 |
| 16 | C016 | Sneha_A350 | A350 | 16JAN2026 | 7 | 11 | 0 | 55 | 58.33% | MEDIUM | NO | 24124 | 2 |
Aviation runs on time. Without proper dates:
· You cannot calculate rest periods
· You cannot validate duty cycles
· You cannot schedule shifts
Hence:
· MDY() builds dates
· INTCK() calculates gaps
· INTNX() predicts future duties
STEP 11 – CHARACTER FUNCTIONS
data char_demo;
set crew_schedule;
Clean_Name = propcase(lowcase(strip(Name)));
Full_Label = catx("-", upcase(Crew_ID), Aircraft_Type);
run;
proc print data=char_demo;
run;
OUTPUT:
| Obs | Crew_ID | Name | Aircraft_Type | Duty_Date | Flight_Hours | Rest_Hours | Delay_Incidents | Fatigue_Score | Utilization_Pct | Utilization_Status | Fraud_Flag | Clean_Name | Full_Label |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | C001 | Ramesh_A320 | A320 | 01JAN2026 | 8 | 10 | 1 | 60 | 66.67% | MEDIUM | NO | Ramesh_a320 | C001-A320 |
| 2 | C002 | Sita_B737 | B737 | 02JAN2026 | 10 | 8 | 2 | 75 | 83.33% | HIGH | NO | Sita_b737 | C002-B737 |
| 3 | C003 | Arjun_A320 | A320 | 03JAN2026 | 6 | 12 | 0 | 45 | 50.00% | LOW | NO | Arjun_a320 | C003-A320 |
| 4 | C004 | Meera_A350 | A350 | 04JAN2026 | 11 | 7 | 3 | 85 | 91.67% | HIGH | YES | Meera_a350 | C004-A350 |
| 5 | C005 | Kiran_B787 | B787 | 05JAN2026 | 9 | 9 | 1 | 65 | 75.00% | MEDIUM | NO | Kiran_b787 | C005-B787 |
| 6 | C006 | Pooja_A320 | A320 | 06JAN2026 | 12 | 6 | 4 | 90 | 100.0% | HIGH | YES | Pooja_a320 | C006-A320 |
| 7 | C007 | Rahul_B737 | B737 | 07JAN2026 | 7 | 11 | 0 | 50 | 58.33% | MEDIUM | NO | Rahul_b737 | C007-B737 |
| 8 | C008 | Neha_A350 | A350 | 08JAN2026 | 10 | 8 | 2 | 70 | 83.33% | HIGH | NO | Neha_a350 | C008-A350 |
| 9 | C009 | Vinay_B787 | B787 | 09JAN2026 | 5 | 13 | 0 | 40 | 41.67% | LOW | NO | Vinay_b787 | C009-B787 |
| 10 | C010 | Kavya_A320 | A320 | 10JAN2026 | 9 | 9 | 1 | 68 | 75.00% | MEDIUM | NO | Kavya_a320 | C010-A320 |
| 11 | C011 | Ajay_B737 | B737 | 11JAN2026 | 11 | 7 | 3 | 82 | 91.67% | HIGH | YES | Ajay_b737 | C011-B737 |
| 12 | C012 | Divya_A350 | A350 | 12JAN2026 | 8 | 10 | 1 | 60 | 66.67% | MEDIUM | NO | Divya_a350 | C012-A350 |
| 13 | C013 | Sanjay_B787 | B787 | 13JAN2026 | 12 | 6 | 4 | 92 | 100.0% | HIGH | YES | Sanjay_b787 | C013-B787 |
| 14 | C014 | Anu_A320 | A320 | 14JAN2026 | 6 | 12 | 0 | 48 | 50.00% | LOW | NO | Anu_a320 | C014-A320 |
| 15 | C015 | Rohit_B737 | B737 | 15JAN2026 | 10 | 8 | 2 | 75 | 83.33% | HIGH | NO | Rohit_b737 | C015-B737 |
| 16 | C016 | Sneha_A350 | A350 | 16JAN2026 | 7 | 11 | 0 | 55 | 58.33% | MEDIUM | NO | Sneha_a350 | C016-A350 |
Real data is dirty:
- Extra spaces
- Inconsistent cases
- Poor labeling
Functions clean data for:
- Reporting
- Merging
- Integration
STEP 12 – TRANSPOSE
proc transpose data=crew_schedule out=crew_t;
var Flight_Hours Rest_Hours Fatigue_Score;
id Crew_ID;
run;
proc print data=crew_t;
run;
OUTPUT:
| Obs | _NAME_ | C001 | C002 | C003 | C004 | C005 | C006 | C007 | C008 | C009 | C010 | C011 | C012 | C013 | C014 | C015 | C016 |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | Flight_Hours | 8 | 10 | 6 | 11 | 9 | 12 | 7 | 10 | 5 | 9 | 11 | 8 | 12 | 6 | 10 | 7 |
| 2 | Rest_Hours | 10 | 8 | 12 | 7 | 9 | 6 | 11 | 8 | 13 | 9 | 7 | 10 | 6 | 12 | 8 | 11 |
| 3 | Fatigue_Score | 60 | 75 | 45 | 85 | 65 | 90 | 50 | 70 | 40 | 68 | 82 | 60 | 92 | 48 | 75 | 55 |
Used for:
- Excel exports
- Regulatory forms
- Cross-sectional layouts
STEP 13 – APPEND
Creating New Data
data crew_newday;
input Crew_ID $ Name $13. Aircraft_Type $ Duty_Date :date9.
Flight_Hours Rest_Hours Delay_Incidents Fatigue_Score;
Utilization_Pct = Flight_Hours / 12;
format Duty_Date date9. Utilization_Pct percent8.2;
datalines;
C017 Aman_A320 A320 17JAN2026 9 9 1 67
C018 Riya_B737 B737 17JAN2026 11 7 3 83
C019 Mohan_A350 A350 17JAN2026 8 10 1 60
C020 Tara_B787 B787 17JAN2026 12 6 4 91
;
run;
proc print data=crew_newday;
run;
OUTPUT:
| Obs | Crew_ID | Name | Aircraft_Type | Duty_Date | Flight_Hours | Rest_Hours | Delay_Incidents | Fatigue_Score | Utilization_Pct |
|---|---|---|---|---|---|---|---|---|---|
| 1 | C017 | Aman_A320 A32 | 0 | 17JAN2026 | 9 | 9 | 1 | 67 | 75.00% |
| 2 | C018 | Riya_B737 B73 | 7 | 17JAN2026 | 11 | 7 | 3 | 83 | 91.67% |
| 3 | C019 | Mohan_A350 A3 | 50 | 17JAN2026 | 8 | 10 | 1 | 60 | 66.67% |
| 4 | C020 | Tara_B787 B78 | 7 | 17JAN2026 | 12 | 6 | 4 | 91 | 100.0% |
proc append base=crew_schedule
data=crew_newday force;
run;
proc print data=crew_schedule;
run;
OUTPUT:
| Obs | Crew_ID | Name | Aircraft_Type | Duty_Date | Flight_Hours | Rest_Hours | Delay_Incidents | Fatigue_Score | Utilization_Pct | Utilization_Status | Fraud_Flag |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | C001 | Ramesh_A320 | A320 | 01JAN2026 | 8 | 10 | 1 | 60 | 66.67% | MEDIUM | NO |
| 2 | C002 | Sita_B737 | B737 | 02JAN2026 | 10 | 8 | 2 | 75 | 83.33% | HIGH | NO |
| 3 | C003 | Arjun_A320 | A320 | 03JAN2026 | 6 | 12 | 0 | 45 | 50.00% | LOW | NO |
| 4 | C004 | Meera_A350 | A350 | 04JAN2026 | 11 | 7 | 3 | 85 | 91.67% | HIGH | YES |
| 5 | C005 | Kiran_B787 | B787 | 05JAN2026 | 9 | 9 | 1 | 65 | 75.00% | MEDIUM | NO |
| 6 | C006 | Pooja_A320 | A320 | 06JAN2026 | 12 | 6 | 4 | 90 | 100.0% | HIGH | YES |
| 7 | C007 | Rahul_B737 | B737 | 07JAN2026 | 7 | 11 | 0 | 50 | 58.33% | MEDIUM | NO |
| 8 | C008 | Neha_A350 | A350 | 08JAN2026 | 10 | 8 | 2 | 70 | 83.33% | HIGH | NO |
| 9 | C009 | Vinay_B787 | B787 | 09JAN2026 | 5 | 13 | 0 | 40 | 41.67% | LOW | NO |
| 10 | C010 | Kavya_A320 | A320 | 10JAN2026 | 9 | 9 | 1 | 68 | 75.00% | MEDIUM | NO |
| 11 | C011 | Ajay_B737 | B737 | 11JAN2026 | 11 | 7 | 3 | 82 | 91.67% | HIGH | YES |
| 12 | C012 | Divya_A350 | A350 | 12JAN2026 | 8 | 10 | 1 | 60 | 66.67% | MEDIUM | NO |
| 13 | C013 | Sanjay_B787 | B787 | 13JAN2026 | 12 | 6 | 4 | 92 | 100.0% | HIGH | YES |
| 14 | C014 | Anu_A320 | A320 | 14JAN2026 | 6 | 12 | 0 | 48 | 50.00% | LOW | NO |
| 15 | C015 | Rohit_B737 | B737 | 15JAN2026 | 10 | 8 | 2 | 75 | 83.33% | HIGH | NO |
| 16 | C016 | Sneha_A350 | A350 | 16JAN2026 | 7 | 11 | 0 | 55 | 58.33% | MEDIUM | NO |
| 17 | C017 | Aman_A320 A32 | 0 | 17JAN2026 | 9 | 9 | 1 | 67 | 75.00% | ||
| 18 | C018 | Riya_B737 B73 | 7 | 17JAN2026 | 11 | 7 | 3 | 83 | 91.67% | ||
| 19 | C019 | Mohan_A350 A3 | 50 | 17JAN2026 | 8 | 10 | 1 | 60 | 66.67% | ||
| 20 | C020 | Tara_B787 B78 | 7 | 17JAN2026 | 12 | 6 | 4 | 91 | 100.0% |
Simulates daily data inflow.
Airlines get new records every day.
STEP 14 – SORT & MERGE
proc sort data=crew_schedule;by Aircraft_Type;run;
proc print data=crew_schedule;
run;
OUTPUT:
| Obs | Crew_ID | Name | Aircraft_Type | Duty_Date | Flight_Hours | Rest_Hours | Delay_Incidents | Fatigue_Score | Utilization_Pct | Utilization_Status | Fraud_Flag |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | C017 | Aman_A320 A32 | 0 | 17JAN2026 | 9 | 9 | 1 | 67 | 75.00% | ||
| 2 | C019 | Mohan_A350 A3 | 50 | 17JAN2026 | 8 | 10 | 1 | 60 | 66.67% | ||
| 3 | C018 | Riya_B737 B73 | 7 | 17JAN2026 | 11 | 7 | 3 | 83 | 91.67% | ||
| 4 | C020 | Tara_B787 B78 | 7 | 17JAN2026 | 12 | 6 | 4 | 91 | 100.0% | ||
| 5 | C001 | Ramesh_A320 | A320 | 01JAN2026 | 8 | 10 | 1 | 60 | 66.67% | MEDIUM | NO |
| 6 | C003 | Arjun_A320 | A320 | 03JAN2026 | 6 | 12 | 0 | 45 | 50.00% | LOW | NO |
| 7 | C006 | Pooja_A320 | A320 | 06JAN2026 | 12 | 6 | 4 | 90 | 100.0% | HIGH | YES |
| 8 | C010 | Kavya_A320 | A320 | 10JAN2026 | 9 | 9 | 1 | 68 | 75.00% | MEDIUM | NO |
| 9 | C014 | Anu_A320 | A320 | 14JAN2026 | 6 | 12 | 0 | 48 | 50.00% | LOW | NO |
| 10 | C004 | Meera_A350 | A350 | 04JAN2026 | 11 | 7 | 3 | 85 | 91.67% | HIGH | YES |
| 11 | C008 | Neha_A350 | A350 | 08JAN2026 | 10 | 8 | 2 | 70 | 83.33% | HIGH | NO |
| 12 | C012 | Divya_A350 | A350 | 12JAN2026 | 8 | 10 | 1 | 60 | 66.67% | MEDIUM | NO |
| 13 | C016 | Sneha_A350 | A350 | 16JAN2026 | 7 | 11 | 0 | 55 | 58.33% | MEDIUM | NO |
| 14 | C002 | Sita_B737 | B737 | 02JAN2026 | 10 | 8 | 2 | 75 | 83.33% | HIGH | NO |
| 15 | C007 | Rahul_B737 | B737 | 07JAN2026 | 7 | 11 | 0 | 50 | 58.33% | MEDIUM | NO |
| 16 | C011 | Ajay_B737 | B737 | 11JAN2026 | 11 | 7 | 3 | 82 | 91.67% | HIGH | YES |
| 17 | C015 | Rohit_B737 | B737 | 15JAN2026 | 10 | 8 | 2 | 75 | 83.33% | HIGH | NO |
| 18 | C005 | Kiran_B787 | B787 | 05JAN2026 | 9 | 9 | 1 | 65 | 75.00% | MEDIUM | NO |
| 19 | C009 | Vinay_B787 | B787 | 09JAN2026 | 5 | 13 | 0 | 40 | 41.67% | LOW | NO |
| 20 | C013 | Sanjay_B787 | B787 | 13JAN2026 | 12 | 6 | 4 | 92 | 100.0% | HIGH | YES |
proc sort data=summary_sql;by Aircraft_Type;run;
proc print data=summary_sql;
run;
OUTPUT:
| Obs | Aircraft_Type | Avg_Flight | Avg_Rest | Total_Delays |
|---|---|---|---|---|
| 1 | A320 | 8.20000 | 9.80000 | 6 |
| 2 | A350 | 9.00000 | 9.00000 | 6 |
| 3 | B737 | 9.50000 | 8.50000 | 7 |
| 4 | B787 | 8.66667 | 9.33333 | 5 |
data merged_data;
merge crew_schedule
summary_sql;
by Aircraft_Type;
run;
proc print data=merged_data;
run;
OUTPUT:
| Obs | Crew_ID | Name | Aircraft_Type | Duty_Date | Flight_Hours | Rest_Hours | Delay_Incidents | Fatigue_Score | Utilization_Pct | Utilization_Status | Fraud_Flag | Avg_Flight | Avg_Rest | Total_Delays |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | C017 | Aman_A320 A32 | 0 | 17JAN2026 | 9 | 9 | 1 | 67 | 75.00% | . | . | . | ||
| 2 | C019 | Mohan_A350 A3 | 50 | 17JAN2026 | 8 | 10 | 1 | 60 | 66.67% | . | . | . | ||
| 3 | C018 | Riya_B737 B73 | 7 | 17JAN2026 | 11 | 7 | 3 | 83 | 91.67% | . | . | . | ||
| 4 | C020 | Tara_B787 B78 | 7 | 17JAN2026 | 12 | 6 | 4 | 91 | 100.0% | . | . | . | ||
| 5 | C001 | Ramesh_A320 | A320 | 01JAN2026 | 8 | 10 | 1 | 60 | 66.67% | MEDIUM | NO | 8.20000 | 9.80000 | 6 |
| 6 | C003 | Arjun_A320 | A320 | 03JAN2026 | 6 | 12 | 0 | 45 | 50.00% | LOW | NO | 8.20000 | 9.80000 | 6 |
| 7 | C006 | Pooja_A320 | A320 | 06JAN2026 | 12 | 6 | 4 | 90 | 100.0% | HIGH | YES | 8.20000 | 9.80000 | 6 |
| 8 | C010 | Kavya_A320 | A320 | 10JAN2026 | 9 | 9 | 1 | 68 | 75.00% | MEDIUM | NO | 8.20000 | 9.80000 | 6 |
| 9 | C014 | Anu_A320 | A320 | 14JAN2026 | 6 | 12 | 0 | 48 | 50.00% | LOW | NO | 8.20000 | 9.80000 | 6 |
| 10 | C004 | Meera_A350 | A350 | 04JAN2026 | 11 | 7 | 3 | 85 | 91.67% | HIGH | YES | 9.00000 | 9.00000 | 6 |
| 11 | C008 | Neha_A350 | A350 | 08JAN2026 | 10 | 8 | 2 | 70 | 83.33% | HIGH | NO | 9.00000 | 9.00000 | 6 |
| 12 | C012 | Divya_A350 | A350 | 12JAN2026 | 8 | 10 | 1 | 60 | 66.67% | MEDIUM | NO | 9.00000 | 9.00000 | 6 |
| 13 | C016 | Sneha_A350 | A350 | 16JAN2026 | 7 | 11 | 0 | 55 | 58.33% | MEDIUM | NO | 9.00000 | 9.00000 | 6 |
| 14 | C002 | Sita_B737 | B737 | 02JAN2026 | 10 | 8 | 2 | 75 | 83.33% | HIGH | NO | 9.50000 | 8.50000 | 7 |
| 15 | C007 | Rahul_B737 | B737 | 07JAN2026 | 7 | 11 | 0 | 50 | 58.33% | MEDIUM | NO | 9.50000 | 8.50000 | 7 |
| 16 | C011 | Ajay_B737 | B737 | 11JAN2026 | 11 | 7 | 3 | 82 | 91.67% | HIGH | YES | 9.50000 | 8.50000 | 7 |
| 17 | C015 | Rohit_B737 | B737 | 15JAN2026 | 10 | 8 | 2 | 75 | 83.33% | HIGH | NO | 9.50000 | 8.50000 | 7 |
| 18 | C005 | Kiran_B787 | B787 | 05JAN2026 | 9 | 9 | 1 | 65 | 75.00% | MEDIUM | NO | 8.66667 | 9.33333 | 5 |
| 19 | C009 | Vinay_B787 | B787 | 09JAN2026 | 5 | 13 | 0 | 40 | 41.67% | LOW | NO | 8.66667 | 9.33333 | 5 |
| 20 | C013 | Sanjay_B787 | B787 | 13JAN2026 | 12 | 6 | 4 | 92 | 100.0% | HIGH | YES | 8.66667 | 9.33333 | 5 |
Combines operational data with:
- Summaries
- Benchmarks
- Targets
STEP 15 – DELETE USING PROC DATASETS
proc datasets library=work;
delete char_demo crew_t;
quit;
LOG:
Professional hygiene:
- Free memory
- Avoid confusion
- Speed processing
THIS PROJECT COVERS
This single project prepares you for:
Topic | Covered |
Base SAS | YES |
SQL | YES |
Macros | YES |
Dates | YES |
Character functions | YES |
Numeric functions | YES |
Reporting | YES |
Visualization | YES |
Data integration | YES |
Business logic | YES |
FINAL BUSINESS IMPACT
If this system runs in real life:
Airline gets:
· Safer operations
· Lower accident risk
· Better crew morale
· Lower regulatory penalties
· Higher efficiency
This is exactly what aviation analytics teams do daily.
FUTURE ENHANCEMENTS
This system can be expanded with:
· Weather impact analysis
· Time zone adjustments
· Multi-leg flight modeling
· Predictive machine learning
· Real-time streaming data
· Integration with HR systems
CONCLUSION
The Airline Crew Schedules Analytics System demonstrates how SAS can be used as a complete operational intelligence platform in a highly regulated, safety-critical industry.
This project is not a simple academic exercise. It reflects how real airlines:
· Collect operational data
· Apply business rules
· Detect risks
· Visualize patterns
· Support decision making
· Ensure regulatory compliance
By using:
· PROC SQL for summarization
· PROC MEANS and UNIVARIATE for statistics
· PROC CORR for relationship analysis
· PROC SGPLOT for visualization
· SAS Macros for automation
· Date functions for scheduling logic
· Character and numeric functions for data quality
This system becomes a fully functional airline analytics framework.
INTERVIEW QUESTIONS FOR YOU
· What is the difference between CAT, CATS, and CATX?
· What is the difference between TRIM and STRIP?
· What is SCAN function?
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
About the Author:
SAS Learning Hub is a data analytics and SAS programming platform focused on clinical, financial, and real-world data analysis. The content is created by professionals with academic training in Pharmaceutics and hands-on experience in Base SAS, PROC SQL, Macros, SDTM, and ADaM, providing practical and industry-relevant SAS learning resources.
Disclaimer:
The datasets and analysis in this article are created for educational and demonstration purposes only. They do not represent Airline Crew data.
Our Mission:
This blog provides industry-focused SAS programming tutorials and analytics projects covering finance, healthcare, and technology.
This project is suitable for:
· Students learning SAS
· Data analysts building portfolios
· Professionals preparing for SAS interviews
· Bloggers writing about analytics and smart cities
· EV and energy industry professionals
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Follow Us On :
To deepen your understanding of SAS analytics, please refer to our other data science and industry-focused projects listed below:
1.Can we analyze planets using SAS like NASA scientists?
2.Can SAS Compare How 3 Different Families Live in Their Homes?
3.Can SAS Help Us Choose the Most Efficient Cooler?
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Comments
Post a Comment