427.Can SAS’s Precision and R’s Flexibility Together Eliminate Hidden Errors in Orbital Debris Data Analytics?
Advanced Data Validation and Transformation Techniques Across SAS and R
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Statements Used in Both SAS and R (Equivalent Mapping Format)
DATA CREATION — DATA STEP | LENGTH | INPUT | DATALINES → data.frame() | tibble()
DATA READING / MANIPULATION — SET → <-
(assignment) | mutate()
DATA JOINING — MERGE → merge() |
left_join()
DATA SORTING — PROC SORT → arrange()
| order()
DATA DISPLAY — PROC PRINT → print()
DATA STRUCTURE VIEW — PROC CONTENTS → str()
| summary()
FREQUENCY ANALYSIS — PROC FREQ → table() |
count()
DATA RESHAPING — PROC TRANSPOSE → pivot_longer()
| pivot_wider()
DATA APPENDING — PROC APPEND → rbind()
| bind_rows()
DATA DELETION — PROC DATASETS DELETE → rm()
AUTOMATION — MACRO | %MACRO | %MEND → function()
NUMERIC FUNCTIONS — ROUND | LOG | ABS → round()
| log() | abs()
CHARACTER
FUNCTIONS — STRIP | TRIM | CAT | CATX |
UPCASE | LOWCASE | PROPCASE → trimws() | paste() | paste0() | toupper() | tolower() |
tools::toTitleCase()
Introduction
Orbital
debris—commonly known as space junk—is one of the most critical challenges in
modern space operations. With thousands of satellites and millions of debris
fragments orbiting Earth, tracking and managing this data is essential for
collision avoidance, mission safety, and regulatory compliance.
In
real-world analytics environments (especially aerospace and defense domains),
datasets are rarely clean. They contain:
- Missing values
- Incorrect formats
- Logical inconsistencies
- Fraudulent or manipulated
entries
This
project demonstrates how Advanced SAS Programming can:
- Create a complex orbital
debris dataset
- Introduce intentional data
errors
- Detect and correct those
errors
- Apply transformations using
SAS procedures, functions, macros
- Replicate the same logic in R
programming
Table of Contents
- Business Context
- Raw Dataset Creation (With
Errors)
- Error Identification
- Data Cleaning Using SAS
- Date Handling (MDY, INTCK,
INTNX)
- Character Functions
- Numeric Functions
- Dataset Operations (SET,
MERGE, APPEND)
- PROC TRANSPOSE
- Fraud Detection Macros
- Utilization Classification
- PROC DATASETS DELETE
- Final Clean Dataset
- R Code Equivalent
- SAS VS R
- 10 Key Points
- Summary
- Conclusion
Business Context
Organizations
like space agencies and satellite operators rely on accurate orbital data to:
- Prevent satellite collisions
- Estimate risk probabilities
- Optimize tracking resource
allocation
- Detect anomalies or
fraudulent data entries
A small
error (e.g., wrong velocity) can lead to catastrophic mission failure.
1. Raw Dataset Creation (With Intentional Errors)
SAS Code
data orbital_debris_raw;
input Object_ID $ Orbit_Altitude Velocity_km_s Collision_Risk $
Object_Size Tracking_Agency $ Fees Utilization $ Launch_Date $10.;
datalines;
OBJ001 400 7.8 HIGH 2 NASA 5000 HIGH 01-01-2023
OBJ002 . 8.1 MEDIUM 3 ISRO 4500 MEDIUM 15-02-2023
OBJ003 550 -7.5 LOW 1 ESA 3000 LOW 20-03-2023
OBJ004 700 8.5 HIGH . NASA 6000 HIGH 05-04-2023
OBJ005 800 abc MEDIUM 5 JAXA 5500 MEDIUM 10-05-2023
OBJ006 650 7.9 LOW 2 ISRO . LOW 15-06-2023
;
run;
LOG:
Explanation
This dataset contains intentional errors:
·
Missing values (.)
·
Invalid numeric value (abc)
·
Negative velocity
·
Missing object size
·
Missing fees
Why This Step is Important
·
Simulates real-world messy data
· Helps in testing validation logic
R Code
orbital_debris_raw <- data.frame(
Object_ID = c("OBJ001","OBJ002","OBJ003","OBJ004","OBJ005","OBJ006"),
Orbit_Altitude = c(400, NA, 550, 700, 800, 650),
Velocity_km_s = c("7.8","8.1","-7.5","8.5","abc","7.9"),
Collision_Risk = c("HIGH"."MEDIUM","LOW","HIGH","MEDIUM","LOW"),
Object_Size = c(2,3,1,NA,5,2),
Trackint_Agency = c("NASA","ISRO","ESA","NASA","JAXA","ISRO"),
Fees = c(5000,4500,3000,6000,5500,NA),
Utilization = c("HIGH","MEDIUM","LOW","HIGH","MEDIUM","LOW"),
Launch_Date = c("01-01-2023","15-02-2023","20-03-2023",
"05-042023","10-05-2023","15-06-2023"),
stringAsFactors = FALSE
)
CONSOLE:
> orbital_debris_raw <- data.frame( + Object_ID = c("OBJ001","OBJ002","OBJ003","OBJ004","OBJ005","OBJ006"), + Orbit_Altitude = c(400, NA, 550, 700, 800, 650), + Velocity_km_s = c("7.8","8.1","-7.5","8.5","abc","7.9"), + Collision_Risk = c("HIGH","MEDIUM","LOW","HIGH","MEDIUM","LOW"), + Object_Size = c(2,3,1,NA,5,2), + Tracking_Agency = c("NASA","ISRO","ESA","NASA","JAXA","ISRO"), + Fees = c(5000,4500,3000,6000,5500,NA), + Utilization = c("HIGH","MEDIUM","LOW","HIGH","MEDIUM","LOW"), + Launch_Date = c("01-01-2023","15-02-2023","20-03-2023", + "05-04-2023","10-05-2023","15-06-2023"), + stringsAsFactors = FALSE + )Explanation
· SAS uses
INPUT+ DATALINES· R uses
data.frame()· Velocity intentionally stored as character (to simulate error like
abc)
2. Identifying Errors
SAS Code
proc print data=orbital_debris_raw;
run;
OUTPUT:
| Obs | Object_ID | Orbit_Altitude | Velocity_km_s | Collision_Risk | Object_Size | Tracking_Agency | Fees | Utilization | Launch_Date |
|---|---|---|---|---|---|---|---|---|---|
| 1 | OBJ001 | 400 | 7.8 | HIGH | 2 | NASA | 5000 | HIGH | 01-01-2023 |
| 2 | OBJ002 | . | 8.1 | MEDIUM | 3 | ISRO | 4500 | MEDIUM | 15-02-2023 |
| 3 | OBJ003 | 550 | -7.5 | LOW | 1 | ESA | 3000 | LOW | 20-03-2023 |
| 4 | OBJ004 | 700 | 8.5 | HIGH | . | NASA | 6000 | HIGH | 05-04-2023 |
| 5 | OBJ005 | 800 | . | MEDIUM | 5 | JAXA | 5500 | MEDIUM | 10-05-2023 |
| 6 | OBJ006 | 650 | 7.9 | LOW | 2 | ISRO | . | LOW | 15-06-2023 |
Errors Identified
· Missing
Orbit_Altitude
·
Negative Velocity
·
Character value in numeric field
·
Missing Object_Size
·
Missing Fees
R Code
print(orbital_debris_raw)
R OUTPUT:
|
Object_ID |
Orbit_Altitude |
Velocity_km_s |
Collision_Risk |
Object_Size |
Trackint_Agency |
Fees |
Utilization |
Launch_Date |
stringAsFactors |
|
OBJ001 |
400 |
7.8 |
HIGH |
2 |
NASA |
5000 |
HIGH |
01-01-2023 |
FALSE |
|
OBJ002 |
8.1 |
MEDIUM |
3 |
ISRO |
4500 |
MEDIUM |
15-02-2023 |
FALSE |
|
|
OBJ003 |
550 |
-7.5 |
LOW |
1 |
ESA |
3000 |
LOW |
20-03-2023 |
FALSE |
|
OBJ004 |
700 |
8.5 |
HIGH |
NASA |
6000 |
HIGH |
05-04-2023 |
FALSE |
|
|
OBJ005 |
800 |
abc |
MEDIUM |
5 |
JAXA |
5500 |
MEDIUM |
10-05-2023 |
FALSE |
|
OBJ006 |
650 |
7.9 |
LOW |
2 |
ISRO |
LOW |
15-06-2023 |
FALSE |
3. Data Cleaning and Correction
SAS Code
data orbital_debris_clean;
set orbital_debris_raw;
if Velocity_km_s < 0 then Velocity_km_s = abs(Velocity_km_s);
if Orbit_Altitude = . then Orbit_Altitude = 500;
if Object_Size = . then Object_Size = 2;
if Fees = . then Fees = 4000;
Velocity_km_s = input(Velocity_km_s, best12.);
run;
proc print data=orbital_debris_clean;
run;
OUTPUT:
| Obs | Object_ID | Orbit_Altitude | Velocity_km_s | Collision_Risk | Object_Size | Tracking_Agency | Fees | Utilization | Launch_Date |
|---|---|---|---|---|---|---|---|---|---|
| 1 | OBJ001 | 400 | 7.8 | HIGH | 2 | NASA | 5000 | HIGH | 01-01-2023 |
| 2 | OBJ002 | 500 | 8.1 | MEDIUM | 3 | ISRO | 4500 | MEDIUM | 15-02-2023 |
| 3 | OBJ003 | 550 | 7.5 | LOW | 1 | ESA | 3000 | LOW | 20-03-2023 |
| 4 | OBJ004 | 700 | 8.5 | HIGH | 2 | NASA | 6000 | HIGH | 05-04-2023 |
| 5 | OBJ005 | 800 | . | MEDIUM | 5 | JAXA | 5500 | MEDIUM | 10-05-2023 |
| 6 | OBJ006 | 650 | 7.9 | LOW | 2 | ISRO | 4000 | LOW | 15-06-2023 |
Explanation
· ABS( ) corrects negative
values
·
Default values assigned to missing
fields
·
INPUT( ) converts character
to numeric
Why Used
· Ensures
consistency
·
Prevents downstream errors
R Code
library(dplyr)
orbital_debris_clean <- orbital_debris_raw %>%
mutate(
Velocity_km_s = as.numeric(Velocity_km_s),
Velocity_km_s = ifelse(Velocity_km_s < 0 , abs(Velocity_km_s),Velocity_km_s),
Orbit_Altitude = ifelse(is.na(Orbit_Altitude),500,Orbit_Altitude),
Object_Size = ifelse(is.na(Object_Size),2,Object_Size),
Fees = ifelse(is.na(Fees),4000,Fees)
)
print(orbital_debris_clean)
Explanation
·
input( )
→ as.numeric( )
·
if
→ ifelse( )
·
Handles invalid values (abc → NA)
4. Date Handling (MDY, INTCK, INTNX)
SAS Code
data orbital_dates;
set orbital_debris_clean;
Launch_Date_SAS = input(Launch_Date, ddmmyy10.);
format Launch_Date_SAS date9.;
Today = today();
Age_Days = intck('day', Launch_Date_SAS, Today);
Next_Check = intnx('month', Launch_Date_SAS, 6);
format Next_Check date9.;
run;
proc print data=orbital_dates;
run;
OUTPUT:
| Obs | Object_ID | Orbit_Altitude | Velocity_km_s | Collision_Risk | Object_Size | Tracking_Agency | Fees | Utilization | Launch_Date | Launch_Date_SAS | Today | Age_Days | Next_Check |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | OBJ001 | 400 | 7.8 | HIGH | 2 | NASA | 5000 | HIGH | 01-01-2023 | 01JAN2023 | 24185 | 1174 | 01JUL2023 |
| 2 | OBJ002 | 500 | 8.1 | MEDIUM | 3 | ISRO | 4500 | MEDIUM | 15-02-2023 | 15FEB2023 | 24185 | 1129 | 01AUG2023 |
| 3 | OBJ003 | 550 | 7.5 | LOW | 1 | ESA | 3000 | LOW | 20-03-2023 | 20MAR2023 | 24185 | 1096 | 01SEP2023 |
| 4 | OBJ004 | 700 | 8.5 | HIGH | 2 | NASA | 6000 | HIGH | 05-04-2023 | 05APR2023 | 24185 | 1080 | 01OCT2023 |
| 5 | OBJ005 | 800 | . | MEDIUM | 5 | JAXA | 5500 | MEDIUM | 10-05-2023 | 10MAY2023 | 24185 | 1045 | 01NOV2023 |
| 6 | OBJ006 | 650 | 7.9 | LOW | 2 | ISRO | 4000 | LOW | 15-06-2023 | 15JUN2023 | 24185 | 1009 | 01DEC2023 |
Explanation
· INPUT ( ) converts character
date
·
INTCK ( )
calculates difference
·
INTNX ( ) calculates future
date
Why Used
· Time-based
analysis is critical in orbital tracking
R Code
orbital_dates <- orbital_debris_clean %>%
mutate(
Launch_Date_SAS = dmy(Launch_Date),
Today = Sys.Date(),
Age_Days = as.numeric(Today - Launch_Date_SAS),
Next_Check = Launch_Date_SAS %m+% months(6)
)
print(orbital_dates)
5. Character Functions
SAS Code
data char_functions;
set orbital_dates;
Agency_Clean = propcase(strip(Tracking_Agency));
Risk_Upper = upcase(Collision_Risk);
Risk_Lower = lowcase(Collision_Risk);
Combined = catx('-', Object_ID, Agency_Clean);
run;
proc print data=char_functions;
run;
OUTPUT:
| Obs | Object_ID | Orbit_Altitude | Velocity_km_s | Collision_Risk | Object_Size | Tracking_Agency | Fees | Utilization | Launch_Date | Launch_Date_SAS | Today | Age_Days | Next_Check | Agency_Clean | Risk_Upper | Risk_Lower | Combined |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | OBJ001 | 400 | 7.8 | HIGH | 2 | NASA | 5000 | HIGH | 01-01-2023 | 01JAN2023 | 24185 | 1174 | 01JUL2023 | Nasa | HIGH | high | OBJ001-Nasa |
| 2 | OBJ002 | 500 | 8.1 | MEDIUM | 3 | ISRO | 4500 | MEDIUM | 15-02-2023 | 15FEB2023 | 24185 | 1129 | 01AUG2023 | Isro | MEDIUM | medium | OBJ002-Isro |
| 3 | OBJ003 | 550 | 7.5 | LOW | 1 | ESA | 3000 | LOW | 20-03-2023 | 20MAR2023 | 24185 | 1096 | 01SEP2023 | Esa | LOW | low | OBJ003-Esa |
| 4 | OBJ004 | 700 | 8.5 | HIGH | 2 | NASA | 6000 | HIGH | 05-04-2023 | 05APR2023 | 24185 | 1080 | 01OCT2023 | Nasa | HIGH | high | OBJ004-Nasa |
| 5 | OBJ005 | 800 | . | MEDIUM | 5 | JAXA | 5500 | MEDIUM | 10-05-2023 | 10MAY2023 | 24185 | 1045 | 01NOV2023 | Jaxa | MEDIUM | medium | OBJ005-Jaxa |
| 6 | OBJ006 | 650 | 7.9 | LOW | 2 | ISRO | 4000 | LOW | 15-06-2023 | 15JUN2023 | 24185 | 1009 | 01DEC2023 | Isro | LOW | low | OBJ006-Isro |
Functions Used
· STRIP ( ) removes spaces
·
PROPCASE ( ) formats text
·
CATX ( ) concatenates
6. Numeric Functions
SAS Code
data numeric_functions;
set char_functions;
Risk_Score = round(Velocity_km_s * Object_Size, 0.1);
Log_Fees = log(Fees);
run;
proc print data=numeric_functions;
run;
OUTPUT:
| Obs | Object_ID | Orbit_Altitude | Velocity_km_s | Collision_Risk | Object_Size | Tracking_Agency | Fees | Utilization | Launch_Date | Launch_Date_SAS | Today | Age_Days | Next_Check | Agency_Clean | Risk_Upper | Risk_Lower | Combined | Risk_Score | Log_Fees |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | OBJ001 | 400 | 7.8 | HIGH | 2 | NASA | 5000 | HIGH | 01-01-2023 | 01JAN2023 | 24185 | 1174 | 01JUL2023 | Nasa | HIGH | high | OBJ001-Nasa | 15.6 | 8.51719 |
| 2 | OBJ002 | 500 | 8.1 | MEDIUM | 3 | ISRO | 4500 | MEDIUM | 15-02-2023 | 15FEB2023 | 24185 | 1129 | 01AUG2023 | Isro | MEDIUM | medium | OBJ002-Isro | 24.3 | 8.41183 |
| 3 | OBJ003 | 550 | 7.5 | LOW | 1 | ESA | 3000 | LOW | 20-03-2023 | 20MAR2023 | 24185 | 1096 | 01SEP2023 | Esa | LOW | low | OBJ003-Esa | 7.5 | 8.00637 |
| 4 | OBJ004 | 700 | 8.5 | HIGH | 2 | NASA | 6000 | HIGH | 05-04-2023 | 05APR2023 | 24185 | 1080 | 01OCT2023 | Nasa | HIGH | high | OBJ004-Nasa | 17.0 | 8.69951 |
| 5 | OBJ005 | 800 | . | MEDIUM | 5 | JAXA | 5500 | MEDIUM | 10-05-2023 | 10MAY2023 | 24185 | 1045 | 01NOV2023 | Jaxa | MEDIUM | medium | OBJ005-Jaxa | . | 8.61250 |
| 6 | OBJ006 | 650 | 7.9 | LOW | 2 | ISRO | 4000 | LOW | 15-06-2023 | 15JUN2023 | 24185 | 1009 | 01DEC2023 | Isro | LOW | low | OBJ006-Isro | 15.8 | 8.29405 |
Explanation
· Calculates
derived metrics
·
Useful for risk modeling
7. SET Operation
SAS Code
data combined_data;
set numeric_functions;
run;
proc print data=combined_data;
run;
OUTPUT:
| Obs | Object_ID | Orbit_Altitude | Velocity_km_s | Collision_Risk | Object_Size | Tracking_Agency | Fees | Utilization | Launch_Date | Launch_Date_SAS | Today | Age_Days | Next_Check | Agency_Clean | Risk_Upper | Risk_Lower | Combined | Risk_Score | Log_Fees |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | OBJ001 | 400 | 7.8 | HIGH | 2 | NASA | 5000 | HIGH | 01-01-2023 | 01JAN2023 | 24185 | 1174 | 01JUL2023 | Nasa | HIGH | high | OBJ001-Nasa | 15.6 | 8.51719 |
| 2 | OBJ002 | 500 | 8.1 | MEDIUM | 3 | ISRO | 4500 | MEDIUM | 15-02-2023 | 15FEB2023 | 24185 | 1129 | 01AUG2023 | Isro | MEDIUM | medium | OBJ002-Isro | 24.3 | 8.41183 |
| 3 | OBJ003 | 550 | 7.5 | LOW | 1 | ESA | 3000 | LOW | 20-03-2023 | 20MAR2023 | 24185 | 1096 | 01SEP2023 | Esa | LOW | low | OBJ003-Esa | 7.5 | 8.00637 |
| 4 | OBJ004 | 700 | 8.5 | HIGH | 2 | NASA | 6000 | HIGH | 05-04-2023 | 05APR2023 | 24185 | 1080 | 01OCT2023 | Nasa | HIGH | high | OBJ004-Nasa | 17.0 | 8.69951 |
| 5 | OBJ005 | 800 | . | MEDIUM | 5 | JAXA | 5500 | MEDIUM | 10-05-2023 | 10MAY2023 | 24185 | 1045 | 01NOV2023 | Jaxa | MEDIUM | medium | OBJ005-Jaxa | . | 8.61250 |
| 6 | OBJ006 | 650 | 7.9 | LOW | 2 | ISRO | 4000 | LOW | 15-06-2023 | 15JUN2023 | 24185 | 1009 | 01DEC2023 | Isro | LOW | low | OBJ006-Isro | 15.8 | 8.29405 |
Why Used
· Combines
datasets sequentially
8. SORT & MERGE Operation
SAS Code
data agency_info;
input Tracking_Agency $ Country $;
datalines;
NASA USA
ISRO INDIA
ESA EUROPE
JAXA JAPAN
;
run;
proc print data=agency_info;
run;
OUTPUT:
| Obs | Tracking_Agency | Country |
|---|---|---|
| 1 | NASA | USA |
| 2 | ISRO | INDIA |
| 3 | ESA | EUROPE |
| 4 | JAXA | JAPAN |
proc sort data=combined_data;by Tracking_Agency;run;
proc print data=combined_data;
run;
OUTPUT:
| Obs | Object_ID | Orbit_Altitude | Velocity_km_s | Collision_Risk | Object_Size | Tracking_Agency | Fees | Utilization | Launch_Date | Launch_Date_SAS | Today | Age_Days | Next_Check | Agency_Clean | Risk_Upper | Risk_Lower | Combined | Risk_Score | Log_Fees |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | OBJ003 | 550 | 7.5 | LOW | 1 | ESA | 3000 | LOW | 20-03-2023 | 20MAR2023 | 24185 | 1096 | 01SEP2023 | Esa | LOW | low | OBJ003-Esa | 7.5 | 8.00637 |
| 2 | OBJ002 | 500 | 8.1 | MEDIUM | 3 | ISRO | 4500 | MEDIUM | 15-02-2023 | 15FEB2023 | 24185 | 1129 | 01AUG2023 | Isro | MEDIUM | medium | OBJ002-Isro | 24.3 | 8.41183 |
| 3 | OBJ006 | 650 | 7.9 | LOW | 2 | ISRO | 4000 | LOW | 15-06-2023 | 15JUN2023 | 24185 | 1009 | 01DEC2023 | Isro | LOW | low | OBJ006-Isro | 15.8 | 8.29405 |
| 4 | OBJ005 | 800 | . | MEDIUM | 5 | JAXA | 5500 | MEDIUM | 10-05-2023 | 10MAY2023 | 24185 | 1045 | 01NOV2023 | Jaxa | MEDIUM | medium | OBJ005-Jaxa | . | 8.61250 |
| 5 | OBJ001 | 400 | 7.8 | HIGH | 2 | NASA | 5000 | HIGH | 01-01-2023 | 01JAN2023 | 24185 | 1174 | 01JUL2023 | Nasa | HIGH | high | OBJ001-Nasa | 15.6 | 8.51719 |
| 6 | OBJ004 | 700 | 8.5 | HIGH | 2 | NASA | 6000 | HIGH | 05-04-2023 | 05APR2023 | 24185 | 1080 | 01OCT2023 | Nasa | HIGH | high | OBJ004-Nasa | 17.0 | 8.69951 |
proc sort data=agency_info;by Tracking_Agency;run;
proc print data=agency_info;
run;
OUTPUT:
| Obs | Tracking_Agency | Country |
|---|---|---|
| 1 | ESA | EUROPE |
| 2 | ISRO | INDIA |
| 3 | JAXA | JAPAN |
| 4 | NASA | USA |
data merged_data;
merge combined_data
agency_info;
by Tracking_Agency;
run;
proc print data=merged_data;
run;
OUTPUT:
| Obs | Object_ID | Orbit_Altitude | Velocity_km_s | Collision_Risk | Object_Size | Tracking_Agency | Fees | Utilization | Launch_Date | Launch_Date_SAS | Today | Age_Days | Next_Check | Agency_Clean | Risk_Upper | Risk_Lower | Combined | Risk_Score | Log_Fees | Country |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | OBJ003 | 550 | 7.5 | LOW | 1 | ESA | 3000 | LOW | 20-03-2023 | 20MAR2023 | 24185 | 1096 | 01SEP2023 | Esa | LOW | low | OBJ003-Esa | 7.5 | 8.00637 | EUROPE |
| 2 | OBJ002 | 500 | 8.1 | MEDIUM | 3 | ISRO | 4500 | MEDIUM | 15-02-2023 | 15FEB2023 | 24185 | 1129 | 01AUG2023 | Isro | MEDIUM | medium | OBJ002-Isro | 24.3 | 8.41183 | INDIA |
| 3 | OBJ006 | 650 | 7.9 | LOW | 2 | ISRO | 4000 | LOW | 15-06-2023 | 15JUN2023 | 24185 | 1009 | 01DEC2023 | Isro | LOW | low | OBJ006-Isro | 15.8 | 8.29405 | INDIA |
| 4 | OBJ005 | 800 | . | MEDIUM | 5 | JAXA | 5500 | MEDIUM | 10-05-2023 | 10MAY2023 | 24185 | 1045 | 01NOV2023 | Jaxa | MEDIUM | medium | OBJ005-Jaxa | . | 8.61250 | JAPAN |
| 5 | OBJ001 | 400 | 7.8 | HIGH | 2 | NASA | 5000 | HIGH | 01-01-2023 | 01JAN2023 | 24185 | 1174 | 01JUL2023 | Nasa | HIGH | high | OBJ001-Nasa | 15.6 | 8.51719 | USA |
| 6 | OBJ004 | 700 | 8.5 | HIGH | 2 | NASA | 6000 | HIGH | 05-04-2023 | 05APR2023 | 24185 | 1080 | 01OCT2023 | Nasa | HIGH | high | OBJ004-Nasa | 17.0 | 8.69951 | USA |
Explanation
· Joins
datasets
·
Adds metadata
9. APPEND Operation
SAS Code
proc append base=merged_data
data=combined_data;
run;
proc print data=merged_data;
run;
OUTPUT:
| Obs | Object_ID | Orbit_Altitude | Velocity_km_s | Collision_Risk | Object_Size | Tracking_Agency | Fees | Utilization | Launch_Date | Launch_Date_SAS | Today | Age_Days | Next_Check | Agency_Clean | Risk_Upper | Risk_Lower | Combined | Risk_Score | Log_Fees | Country |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | OBJ003 | 550 | 7.5 | LOW | 1 | ESA | 3000 | LOW | 20-03-2023 | 20MAR2023 | 24185 | 1096 | 01SEP2023 | Esa | LOW | low | OBJ003-Esa | 7.5 | 8.00637 | EUROPE |
| 2 | OBJ002 | 500 | 8.1 | MEDIUM | 3 | ISRO | 4500 | MEDIUM | 15-02-2023 | 15FEB2023 | 24185 | 1129 | 01AUG2023 | Isro | MEDIUM | medium | OBJ002-Isro | 24.3 | 8.41183 | INDIA |
| 3 | OBJ006 | 650 | 7.9 | LOW | 2 | ISRO | 4000 | LOW | 15-06-2023 | 15JUN2023 | 24185 | 1009 | 01DEC2023 | Isro | LOW | low | OBJ006-Isro | 15.8 | 8.29405 | INDIA |
| 4 | OBJ005 | 800 | . | MEDIUM | 5 | JAXA | 5500 | MEDIUM | 10-05-2023 | 10MAY2023 | 24185 | 1045 | 01NOV2023 | Jaxa | MEDIUM | medium | OBJ005-Jaxa | . | 8.61250 | JAPAN |
| 5 | OBJ001 | 400 | 7.8 | HIGH | 2 | NASA | 5000 | HIGH | 01-01-2023 | 01JAN2023 | 24185 | 1174 | 01JUL2023 | Nasa | HIGH | high | OBJ001-Nasa | 15.6 | 8.51719 | USA |
| 6 | OBJ004 | 700 | 8.5 | HIGH | 2 | NASA | 6000 | HIGH | 05-04-2023 | 05APR2023 | 24185 | 1080 | 01OCT2023 | Nasa | HIGH | high | OBJ004-Nasa | 17.0 | 8.69951 | USA |
| 7 | OBJ003 | 550 | 7.5 | LOW | 1 | ESA | 3000 | LOW | 20-03-2023 | 20MAR2023 | 24185 | 1096 | 01SEP2023 | Esa | LOW | low | OBJ003-Esa | 7.5 | 8.00637 | |
| 8 | OBJ002 | 500 | 8.1 | MEDIUM | 3 | ISRO | 4500 | MEDIUM | 15-02-2023 | 15FEB2023 | 24185 | 1129 | 01AUG2023 | Isro | MEDIUM | medium | OBJ002-Isro | 24.3 | 8.41183 | |
| 9 | OBJ006 | 650 | 7.9 | LOW | 2 | ISRO | 4000 | LOW | 15-06-2023 | 15JUN2023 | 24185 | 1009 | 01DEC2023 | Isro | LOW | low | OBJ006-Isro | 15.8 | 8.29405 | |
| 10 | OBJ005 | 800 | . | MEDIUM | 5 | JAXA | 5500 | MEDIUM | 10-05-2023 | 10MAY2023 | 24185 | 1045 | 01NOV2023 | Jaxa | MEDIUM | medium | OBJ005-Jaxa | . | 8.61250 | |
| 11 | OBJ001 | 400 | 7.8 | HIGH | 2 | NASA | 5000 | HIGH | 01-01-2023 | 01JAN2023 | 24185 | 1174 | 01JUL2023 | Nasa | HIGH | high | OBJ001-Nasa | 15.6 | 8.51719 | |
| 12 | OBJ004 | 700 | 8.5 | HIGH | 2 | NASA | 6000 | HIGH | 05-04-2023 | 05APR2023 | 24185 | 1080 | 01OCT2023 | Nasa | HIGH | high | OBJ004-Nasa | 17.0 | 8.69951 |
Why Used
·
Adds records efficiently
10. PROC TRANSPOSE
SAS Code
proc transpose data=merged_data out=transposed_data;
by Object_ID NotSorted;
var Velocity_km_s Object_Size Fees;
run;
proc print data=transposed_data;
run;
OUTPUT:
| Obs | Object_ID | _NAME_ | COL1 |
|---|---|---|---|
| 1 | OBJ003 | Velocity_km_s | 7.5 |
| 2 | OBJ003 | Object_Size | 1.0 |
| 3 | OBJ003 | Fees | 3000.0 |
| 4 | OBJ002 | Velocity_km_s | 8.1 |
| 5 | OBJ002 | Object_Size | 3.0 |
| 6 | OBJ002 | Fees | 4500.0 |
| 7 | OBJ006 | Velocity_km_s | 7.9 |
| 8 | OBJ006 | Object_Size | 2.0 |
| 9 | OBJ006 | Fees | 4000.0 |
| 10 | OBJ005 | Velocity_km_s | . |
| 11 | OBJ005 | Object_Size | 5.0 |
| 12 | OBJ005 | Fees | 5500.0 |
| 13 | OBJ001 | Velocity_km_s | 7.8 |
| 14 | OBJ001 | Object_Size | 2.0 |
| 15 | OBJ001 | Fees | 5000.0 |
| 16 | OBJ004 | Velocity_km_s | 8.5 |
| 17 | OBJ004 | Object_Size | 2.0 |
| 18 | OBJ004 | Fees | 6000.0 |
| 19 | OBJ003 | Velocity_km_s | 7.5 |
| 20 | OBJ003 | Object_Size | 1.0 |
| 21 | OBJ003 | Fees | 3000.0 |
| 22 | OBJ002 | Velocity_km_s | 8.1 |
| 23 | OBJ002 | Object_Size | 3.0 |
| 24 | OBJ002 | Fees | 4500.0 |
| 25 | OBJ006 | Velocity_km_s | 7.9 |
| 26 | OBJ006 | Object_Size | 2.0 |
| 27 | OBJ006 | Fees | 4000.0 |
| 28 | OBJ005 | Velocity_km_s | . |
| 29 | OBJ005 | Object_Size | 5.0 |
| 30 | OBJ005 | Fees | 5500.0 |
| 31 | OBJ001 | Velocity_km_s | 7.8 |
| 32 | OBJ001 | Object_Size | 2.0 |
| 33 | OBJ001 | Fees | 5000.0 |
| 34 | OBJ004 | Velocity_km_s | 8.5 |
| 35 | OBJ004 | Object_Size | 2.0 |
| 36 | OBJ004 | Fees | 6000.0 |
Explanation
· Converts
rows → columns
11. Fraud Detection Macro
SAS Code
%macro fraud_check(data=);
data fraud_flag;
set &data;
if Fees > 7000 or Velocity_km_s > 7 then Fraud_Flag = 1;
else Fraud_Flag = 0;
run;
proc print data=fraud_flag;
run;
%mend;
%fraud_check(data=merged_data);
OUTPUT:
| Obs | Object_ID | Orbit_Altitude | Velocity_km_s | Collision_Risk | Object_Size | Tracking_Agency | Fees | Utilization | Launch_Date | Launch_Date_SAS | Today | Age_Days | Next_Check | Agency_Clean | Risk_Upper | Risk_Lower | Combined | Risk_Score | Log_Fees | Country | Fraud_Flag |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | OBJ003 | 550 | 7.5 | LOW | 1 | ESA | 3000 | LOW | 20-03-2023 | 20MAR2023 | 24185 | 1096 | 01SEP2023 | Esa | LOW | low | OBJ003-Esa | 7.5 | 8.00637 | EUROPE | 1 |
| 2 | OBJ002 | 500 | 8.1 | MEDIUM | 3 | ISRO | 4500 | MEDIUM | 15-02-2023 | 15FEB2023 | 24185 | 1129 | 01AUG2023 | Isro | MEDIUM | medium | OBJ002-Isro | 24.3 | 8.41183 | INDIA | 1 |
| 3 | OBJ006 | 650 | 7.9 | LOW | 2 | ISRO | 4000 | LOW | 15-06-2023 | 15JUN2023 | 24185 | 1009 | 01DEC2023 | Isro | LOW | low | OBJ006-Isro | 15.8 | 8.29405 | INDIA | 1 |
| 4 | OBJ005 | 800 | . | MEDIUM | 5 | JAXA | 5500 | MEDIUM | 10-05-2023 | 10MAY2023 | 24185 | 1045 | 01NOV2023 | Jaxa | MEDIUM | medium | OBJ005-Jaxa | . | 8.61250 | JAPAN | 0 |
| 5 | OBJ001 | 400 | 7.8 | HIGH | 2 | NASA | 5000 | HIGH | 01-01-2023 | 01JAN2023 | 24185 | 1174 | 01JUL2023 | Nasa | HIGH | high | OBJ001-Nasa | 15.6 | 8.51719 | USA | 1 |
| 6 | OBJ004 | 700 | 8.5 | HIGH | 2 | NASA | 6000 | HIGH | 05-04-2023 | 05APR2023 | 24185 | 1080 | 01OCT2023 | Nasa | HIGH | high | OBJ004-Nasa | 17.0 | 8.69951 | USA | 1 |
| 7 | OBJ003 | 550 | 7.5 | LOW | 1 | ESA | 3000 | LOW | 20-03-2023 | 20MAR2023 | 24185 | 1096 | 01SEP2023 | Esa | LOW | low | OBJ003-Esa | 7.5 | 8.00637 | 1 | |
| 8 | OBJ002 | 500 | 8.1 | MEDIUM | 3 | ISRO | 4500 | MEDIUM | 15-02-2023 | 15FEB2023 | 24185 | 1129 | 01AUG2023 | Isro | MEDIUM | medium | OBJ002-Isro | 24.3 | 8.41183 | 1 | |
| 9 | OBJ006 | 650 | 7.9 | LOW | 2 | ISRO | 4000 | LOW | 15-06-2023 | 15JUN2023 | 24185 | 1009 | 01DEC2023 | Isro | LOW | low | OBJ006-Isro | 15.8 | 8.29405 | 1 | |
| 10 | OBJ005 | 800 | . | MEDIUM | 5 | JAXA | 5500 | MEDIUM | 10-05-2023 | 10MAY2023 | 24185 | 1045 | 01NOV2023 | Jaxa | MEDIUM | medium | OBJ005-Jaxa | . | 8.61250 | 0 | |
| 11 | OBJ001 | 400 | 7.8 | HIGH | 2 | NASA | 5000 | HIGH | 01-01-2023 | 01JAN2023 | 24185 | 1174 | 01JUL2023 | Nasa | HIGH | high | OBJ001-Nasa | 15.6 | 8.51719 | 1 | |
| 12 | OBJ004 | 700 | 8.5 | HIGH | 2 | NASA | 6000 | HIGH | 05-04-2023 | 05APR2023 | 24185 | 1080 | 01OCT2023 | Nasa | HIGH | high | OBJ004-Nasa | 17.0 | 8.69951 | 1 |
Explanation
·
Flags abnormal values
·
Automates validation
12. Utilization Classification
SAS Code
data utilization_class;
set fraud_flag;
if Utilization = 'HIGH' then Util_Class = 3;
else if Utilization = 'MEDIUM' then Util_Class = 2;
else Util_Class = 1;
run;
proc print data=utilization_class;
run;
OUTPUT:
| Obs | Object_ID | Orbit_Altitude | Velocity_km_s | Collision_Risk | Object_Size | Tracking_Agency | Fees | Utilization | Launch_Date | Launch_Date_SAS | Today | Age_Days | Next_Check | Agency_Clean | Risk_Upper | Risk_Lower | Combined | Risk_Score | Log_Fees | Country | Fraud_Flag | Util_Class |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | OBJ003 | 550 | 7.5 | LOW | 1 | ESA | 3000 | LOW | 20-03-2023 | 20MAR2023 | 24185 | 1096 | 01SEP2023 | Esa | LOW | low | OBJ003-Esa | 7.5 | 8.00637 | EUROPE | 1 | 1 |
| 2 | OBJ002 | 500 | 8.1 | MEDIUM | 3 | ISRO | 4500 | MEDIUM | 15-02-2023 | 15FEB2023 | 24185 | 1129 | 01AUG2023 | Isro | MEDIUM | medium | OBJ002-Isro | 24.3 | 8.41183 | INDIA | 1 | 2 |
| 3 | OBJ006 | 650 | 7.9 | LOW | 2 | ISRO | 4000 | LOW | 15-06-2023 | 15JUN2023 | 24185 | 1009 | 01DEC2023 | Isro | LOW | low | OBJ006-Isro | 15.8 | 8.29405 | INDIA | 1 | 1 |
| 4 | OBJ005 | 800 | . | MEDIUM | 5 | JAXA | 5500 | MEDIUM | 10-05-2023 | 10MAY2023 | 24185 | 1045 | 01NOV2023 | Jaxa | MEDIUM | medium | OBJ005-Jaxa | . | 8.61250 | JAPAN | 0 | 2 |
| 5 | OBJ001 | 400 | 7.8 | HIGH | 2 | NASA | 5000 | HIGH | 01-01-2023 | 01JAN2023 | 24185 | 1174 | 01JUL2023 | Nasa | HIGH | high | OBJ001-Nasa | 15.6 | 8.51719 | USA | 1 | 3 |
| 6 | OBJ004 | 700 | 8.5 | HIGH | 2 | NASA | 6000 | HIGH | 05-04-2023 | 05APR2023 | 24185 | 1080 | 01OCT2023 | Nasa | HIGH | high | OBJ004-Nasa | 17.0 | 8.69951 | USA | 1 | 3 |
| 7 | OBJ003 | 550 | 7.5 | LOW | 1 | ESA | 3000 | LOW | 20-03-2023 | 20MAR2023 | 24185 | 1096 | 01SEP2023 | Esa | LOW | low | OBJ003-Esa | 7.5 | 8.00637 | 1 | 1 | |
| 8 | OBJ002 | 500 | 8.1 | MEDIUM | 3 | ISRO | 4500 | MEDIUM | 15-02-2023 | 15FEB2023 | 24185 | 1129 | 01AUG2023 | Isro | MEDIUM | medium | OBJ002-Isro | 24.3 | 8.41183 | 1 | 2 | |
| 9 | OBJ006 | 650 | 7.9 | LOW | 2 | ISRO | 4000 | LOW | 15-06-2023 | 15JUN2023 | 24185 | 1009 | 01DEC2023 | Isro | LOW | low | OBJ006-Isro | 15.8 | 8.29405 | 1 | 1 | |
| 10 | OBJ005 | 800 | . | MEDIUM | 5 | JAXA | 5500 | MEDIUM | 10-05-2023 | 10MAY2023 | 24185 | 1045 | 01NOV2023 | Jaxa | MEDIUM | medium | OBJ005-Jaxa | . | 8.61250 | 0 | 2 | |
| 11 | OBJ001 | 400 | 7.8 | HIGH | 2 | NASA | 5000 | HIGH | 01-01-2023 | 01JAN2023 | 24185 | 1174 | 01JUL2023 | Nasa | HIGH | high | OBJ001-Nasa | 15.6 | 8.51719 | 1 | 3 | |
| 12 | OBJ004 | 700 | 8.5 | HIGH | 2 | NASA | 6000 | HIGH | 05-04-2023 | 05APR2023 | 24185 | 1080 | 01OCT2023 | Nasa | HIGH | high | OBJ004-Nasa | 17.0 | 8.69951 | 1 | 3 |
13. PROC DATASETS DELETE
SAS Code
proc datasets library=work;
delete orbital_debris_raw;
quit;
LOG:
Why Used
· Cleans
temporary datasets
14. Final Corrected Dataset
SAS Code
data final_dataset;
set utilization_class;
run;
proc print data=final_dataset;
run;
OUTPUT:
| Obs | Object_ID | Orbit_Altitude | Velocity_km_s | Collision_Risk | Object_Size | Tracking_Agency | Fees | Utilization | Launch_Date | Launch_Date_SAS | Today | Age_Days | Next_Check | Agency_Clean | Risk_Upper | Risk_Lower | Combined | Risk_Score | Log_Fees | Country | Fraud_Flag | Util_Class |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | OBJ003 | 550 | 7.5 | LOW | 1 | ESA | 3000 | LOW | 20-03-2023 | 20MAR2023 | 24185 | 1096 | 01SEP2023 | Esa | LOW | low | OBJ003-Esa | 7.5 | 8.00637 | EUROPE | 1 | 1 |
| 2 | OBJ002 | 500 | 8.1 | MEDIUM | 3 | ISRO | 4500 | MEDIUM | 15-02-2023 | 15FEB2023 | 24185 | 1129 | 01AUG2023 | Isro | MEDIUM | medium | OBJ002-Isro | 24.3 | 8.41183 | INDIA | 1 | 2 |
| 3 | OBJ006 | 650 | 7.9 | LOW | 2 | ISRO | 4000 | LOW | 15-06-2023 | 15JUN2023 | 24185 | 1009 | 01DEC2023 | Isro | LOW | low | OBJ006-Isro | 15.8 | 8.29405 | INDIA | 1 | 1 |
| 4 | OBJ005 | 800 | . | MEDIUM | 5 | JAXA | 5500 | MEDIUM | 10-05-2023 | 10MAY2023 | 24185 | 1045 | 01NOV2023 | Jaxa | MEDIUM | medium | OBJ005-Jaxa | . | 8.61250 | JAPAN | 0 | 2 |
| 5 | OBJ001 | 400 | 7.8 | HIGH | 2 | NASA | 5000 | HIGH | 01-01-2023 | 01JAN2023 | 24185 | 1174 | 01JUL2023 | Nasa | HIGH | high | OBJ001-Nasa | 15.6 | 8.51719 | USA | 1 | 3 |
| 6 | OBJ004 | 700 | 8.5 | HIGH | 2 | NASA | 6000 | HIGH | 05-04-2023 | 05APR2023 | 24185 | 1080 | 01OCT2023 | Nasa | HIGH | high | OBJ004-Nasa | 17.0 | 8.69951 | USA | 1 | 3 |
| 7 | OBJ003 | 550 | 7.5 | LOW | 1 | ESA | 3000 | LOW | 20-03-2023 | 20MAR2023 | 24185 | 1096 | 01SEP2023 | Esa | LOW | low | OBJ003-Esa | 7.5 | 8.00637 | 1 | 1 | |
| 8 | OBJ002 | 500 | 8.1 | MEDIUM | 3 | ISRO | 4500 | MEDIUM | 15-02-2023 | 15FEB2023 | 24185 | 1129 | 01AUG2023 | Isro | MEDIUM | medium | OBJ002-Isro | 24.3 | 8.41183 | 1 | 2 | |
| 9 | OBJ006 | 650 | 7.9 | LOW | 2 | ISRO | 4000 | LOW | 15-06-2023 | 15JUN2023 | 24185 | 1009 | 01DEC2023 | Isro | LOW | low | OBJ006-Isro | 15.8 | 8.29405 | 1 | 1 | |
| 10 | OBJ005 | 800 | . | MEDIUM | 5 | JAXA | 5500 | MEDIUM | 10-05-2023 | 10MAY2023 | 24185 | 1045 | 01NOV2023 | Jaxa | MEDIUM | medium | OBJ005-Jaxa | . | 8.61250 | 0 | 2 | |
| 11 | OBJ001 | 400 | 7.8 | HIGH | 2 | NASA | 5000 | HIGH | 01-01-2023 | 01JAN2023 | 24185 | 1174 | 01JUL2023 | Nasa | HIGH | high | OBJ001-Nasa | 15.6 | 8.51719 | 1 | 3 | |
| 12 | OBJ004 | 700 | 8.5 | HIGH | 2 | NASA | 6000 | HIGH | 05-04-2023 | 05APR2023 | 24185 | 1080 | 01OCT2023 | Nasa | HIGH | high | OBJ004-Nasa | 17.0 | 8.69951 | 1 | 3 |
15. R Code
library(dplyr)
orbital_data <- data.frame(
Object_ID = c("OBJ001","OBJ002","OBJ003","OBJ004","OBJ005","OBJ006"),
Orbit_Altitude = c(400, NA, 550, 700, 800, 650),
Velocity_km_s = c(7.8, 8.1, -7.5, 8.5, NA, 7.9),
Fees = c(5000,4500,3000,6000,5500,NA)
)
orbital_data <- orbital_data %>%
mutate(
Velocity_km_s = abs(Velocity_km_s),
Orbit_Altitude = ifelse(is.na(Orbit_Altitude), 500, Orbit_Altitude),
Fees = ifelse(is.na(Fees), 4000, Fees),
Risk_Score = round(Velocity_km_s * 2,1)
)
print(orbital_data)
Object_ID Orbit_Altitude Velocity_km_s Fees Risk_Score
OBJ001 400 7.8 5000 15.6
OBJ002 500
8.1
4500 16.2
OBJ003 550 7.5 3000 15
OBJ004 700 8.5 6000 17
OBJ005 800
5500
OBJ006 650
7.9 4000 15.8
SAS vs R — Direct Comparison
1. Data Input Handling
·
SAS uses INPUT
+ DATALINES
(structured, fixed schema)
·
R uses data.frame( ) (flexible, dynamic typing)
2. Error Behavior
·
SAS logs errors (_ERROR_=1) but continues
execution
·
R silently converts invalid values to NA with warnings
3. Type Conversion
·
SAS: INPUT( )
function explicitly converts character → numeric
·
R: as.numeric( )
performs conversion but introduces NA automatically
4. Missing Value Handling
·
SAS uses . for numeric missing
·
R uses NA
(more explicit and consistent across types)
5. Conditional
Logic
·
SAS uses IF-THEN statements
·
R uses ifelse( )
or case_when( )
6. String Processing
·
SAS: STRIP,
PROPCASE,
UPCASE
·
R: trimws(),
tools::toTitleCase(),
toupper()
7. Date Handling
·
SAS: INPUT, INTCK, INTNX
(very powerful for intervals)
·
R: as.Date( ),
arithmetic operations (simpler but less domain-specific)
8. Dataset
Operations
·
SAS: SET, MERGE, PROC APPEND
·
R: dplyr,
merge( ),
rbind( )
(more flexible but requires packages)
9. Macro
vs Functional Programming
·
SAS Macros automate workflows (%macro)
·
R uses functions and pipelines (dplyr, %>%)
10. Debugging
& Auditability
·
SAS provides detailed logs
(industry-grade validation)
·
R provides warnings but less structured
audit trails
10 Key Points
1.
We created an orbital debris dataset with real-world
variables like altitude, velocity, and risk.
2.
Intentional errors (like “abc” in velocity) were
introduced to simulate real data issues.
3.
SAS detected errors using log messages and _ERROR_
variable.
4.
R handled the same error by converting invalid values
into NA.
5.
Data cleaning was performed by replacing missing and
incorrect values.
6.
Date variables were transformed and analyzed using SAS
and R functions.
7.
Character and numeric transformations were applied for
standardization.
8.
Dataset operations like merge, append, and transpose
were executed.
9.
Fraud detection logic was implemented using SAS macros
and R conditions.
10. Final
dataset was cleaned, validated, and ready for analysis.
Summary
In this project, we worked with orbital debris
data that had multiple errors, just like real-world datasets. SAS handled the
errors in a very structured way by showing detailed logs and continuing
execution without stopping the process. This makes SAS very reliable in
industries like clinical trials and aerospace where data validation is
critical.
On the other hand, R provided a more flexible
and modern approach. It handled errors quietly by converting invalid values
into NA, which makes it easier
for quick data analysis but slightly less controlled compared to SAS.
Overall, SAS is strong in data governance, validation, and audit
trails, while R is powerful in flexibility, speed, and modern data
manipulation. Both tools complement each other, and
understanding both gives a strong advantage in real-world data projects.
Conclusion
This project clearly demonstrates that Advanced SAS Programming is extremely
powerful in handling real-world data challenges. Through
structured steps, we:
· Created a
dataset with intentional errors
·
Detected inconsistencies
·
Applied correction logic
·
Used advanced functions and macros
·
Ensured data integrity
·
Replicated logic in R
Key Takeaways
· SAS excels
in data cleaning and validation
·
Macros automate complex logic
·
Functions ensure precision
·
Procedures enable structured analysis
SAS INTERVIEW QUESTIONS
1. How do FIRST. and LAST. work?
Answer:
FIRST. and LAST. are temporary variables automatically created by SAS when using a BY statement after sorting data.
- FIRST.variable = 1 for the first observation of each BY group
- LAST.variable = 1 for the last observation of each BY group
Example: Used for group totals or identifying first/last records in a group.
2. Difference between CALL SYMPUT and CALL SYMPUTX?
Answer:
Feature | CALL SYMPUT | CALL SYMPUTX |
Spaces | Keeps leading/trailing spaces | Removes spaces automatically |
Type conversion | May cause numeric-to-character notes | Automatically converts values |
Usage | Older method | Recommended method |
Short: CALL SYMPUTX is an improved version of CALL SYMPUT.
3. Difference between INPUT and PUT functions?
Answer:
Function | Purpose |
INPUT | Converts character → numeric |
PUT | Converts numeric → character |
Example:
- INPUT(char_var,8.) → character to numeric
- PUT(num_var,8.) → numeric to character.
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
About the Author:
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
About the Author:
SAS Learning Hub is a data analytics and SAS programming platform focused on clinical, financial, and real-world data analysis. The content is created by professionals with academic training in Pharmaceutics and hands-on experience in Base SAS, PROC SQL, Macros, SDTM, and ADaM, providing practical and industry-relevant SAS learning resources.
Disclaimer:
The datasets and analysis in this article are created for educational and demonstration purposes only. They do not represent DIGITAL ADVERTISING data.
Our Mission:
This blog provides industry-focused SAS programming tutorials and analytics projects covering finance, healthcare, and technology.
This project is suitable for:
· Students learning SAS
· Data analysts building portfolios
· Professionals preparing for SAS interviews
· Bloggers writing about analytics
· Clinical SAS Programmer
· Research Data Analyst
· Regulatory Data Validator
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Comments
Post a Comment