427.Can SAS’s Precision and R’s Flexibility Together Eliminate Hidden Errors in Orbital Debris Data Analytics?

Advanced Data Validation and Transformation Techniques Across SAS and R

--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Statements Used in Both SAS and R (Equivalent Mapping Format)

DATA CREATION — DATA STEP | LENGTH | INPUT | DATALINESdata.frame() | tibble()

DATA READING / MANIPULATION — SET<- (assignment) | mutate()

DATA JOINING — MERGEmerge() | left_join()

DATA SORTING — PROC SORTarrange() | order()

DATA DISPLAY — PROC PRINTprint()

DATA STRUCTURE VIEW — PROC CONTENTSstr() | summary()

FREQUENCY ANALYSIS — PROC FREQtable() | count()

DATA RESHAPING — PROC TRANSPOSEpivot_longer() | pivot_wider()

DATA APPENDING — PROC APPENDrbind() | bind_rows()

DATA DELETION — PROC DATASETS DELETErm()

AUTOMATION — MACRO | %MACRO | %MENDfunction()

NUMERIC FUNCTIONS — ROUND | LOG | ABSround() | log() | abs()

CHARACTER FUNCTIONS — STRIP | TRIM | CAT | CATX | UPCASE | LOWCASE | PROPCASEtrimws() | paste() | paste0() | toupper() | tolower() | tools::toTitleCase()

--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Introduction

Orbital debris—commonly known as space junk—is one of the most critical challenges in modern space operations. With thousands of satellites and millions of debris fragments orbiting Earth, tracking and managing this data is essential for collision avoidance, mission safety, and regulatory compliance.

In real-world analytics environments (especially aerospace and defense domains), datasets are rarely clean. They contain:

  • Missing values
  • Incorrect formats
  • Logical inconsistencies
  • Fraudulent or manipulated entries

This project demonstrates how Advanced SAS Programming can:

  • Create a complex orbital debris dataset
  • Introduce intentional data errors
  • Detect and correct those errors
  • Apply transformations using SAS procedures, functions, macros
  • Replicate the same logic in R programming

Table of Contents

  1. Business Context
  2. Raw Dataset Creation (With Errors)
  3. Error Identification
  4. Data Cleaning Using SAS
  5. Date Handling (MDY, INTCK, INTNX)
  6. Character Functions
  7. Numeric Functions
  8. Dataset Operations (SET, MERGE, APPEND)
  9. PROC TRANSPOSE
  10. Fraud Detection Macros
  11. Utilization Classification
  12. PROC DATASETS DELETE
  13. Final Clean Dataset
  14. R Code Equivalent
  15. SAS VS R
  16. 10 Key Points
  17. Summary
  18. Conclusion

Business Context

Organizations like space agencies and satellite operators rely on accurate orbital data to:

  • Prevent satellite collisions
  • Estimate risk probabilities
  • Optimize tracking resource allocation
  • Detect anomalies or fraudulent data entries

A small error (e.g., wrong velocity) can lead to catastrophic mission failure.

1. Raw Dataset Creation (With Intentional Errors)

SAS Code

data orbital_debris_raw;

    input Object_ID $ Orbit_Altitude Velocity_km_s Collision_Risk $ 

          Object_Size Tracking_Agency $ Fees Utilization $ Launch_Date $10.;

    datalines;

OBJ001 400 7.8 HIGH 2 NASA 5000 HIGH 01-01-2023

OBJ002 . 8.1 MEDIUM 3 ISRO 4500 MEDIUM 15-02-2023

OBJ003 550 -7.5 LOW 1 ESA 3000 LOW 20-03-2023

OBJ004 700 8.5 HIGH . NASA 6000 HIGH 05-04-2023

OBJ005 800 abc MEDIUM 5 JAXA 5500 MEDIUM 10-05-2023

OBJ006 650 7.9 LOW 2 ISRO . LOW 15-06-2023

;

run;

LOG:

77 OBJ005 800 abc MEDIUM 5 JAXA 5500 MEDIUM 10-05-2023
Object_ID=OBJ005 Orbit_Altitude=800 Velocity_km_s=. Collision_Risk=MEDIUM Object_Size=5 Tracking_Agency=JAXA Fees=5500
Utilization=MEDIUM Launch_Date=10-05-2023 _ERROR_=1 _N_=5
NOTE: The data set WORK.ORBITAL_DEBRIS_RAW has 6 observations and 9 variables.

Explanation

This dataset contains intentional errors:

·  Missing values (.)

·  Invalid numeric value (abc)

·  Negative velocity

·  Missing object size

·  Missing fees

Why This Step is Important

·  Simulates real-world messy data

·  Helps in testing validation logic

R Code

orbital_debris_raw <- data.frame(

  Object_ID = c("OBJ001","OBJ002","OBJ003","OBJ004","OBJ005","OBJ006"),

  Orbit_Altitude = c(400, NA, 550, 700, 800, 650),

  Velocity_km_s = c("7.8","8.1","-7.5","8.5","abc","7.9"),

  Collision_Risk = c("HIGH"."MEDIUM","LOW","HIGH","MEDIUM","LOW"),

  Object_Size = c(2,3,1,NA,5,2),

  Trackint_Agency = c("NASA","ISRO","ESA","NASA","JAXA","ISRO"),

  Fees = c(5000,4500,3000,6000,5500,NA),

  Utilization = c("HIGH","MEDIUM","LOW","HIGH","MEDIUM","LOW"),

  Launch_Date = c("01-01-2023","15-02-2023","20-03-2023",

                  "05-042023","10-05-2023","15-06-2023"),

  stringAsFactors = FALSE 

)

CONSOLE:

> orbital_debris_raw <- data.frame(
+   Object_ID = c("OBJ001","OBJ002","OBJ003","OBJ004","OBJ005","OBJ006"),
+   Orbit_Altitude = c(400, NA, 550, 700, 800, 650),
+   Velocity_km_s = c("7.8","8.1","-7.5","8.5","abc","7.9"),
+   Collision_Risk = c("HIGH","MEDIUM","LOW","HIGH","MEDIUM","LOW"),
+   Object_Size = c(2,3,1,NA,5,2),
+   Tracking_Agency = c("NASA","ISRO","ESA","NASA","JAXA","ISRO"),
+   Fees = c(5000,4500,3000,6000,5500,NA),
+   Utilization = c("HIGH","MEDIUM","LOW","HIGH","MEDIUM","LOW"),
+   Launch_Date = c("01-01-2023","15-02-2023","20-03-2023",
+                   "05-04-2023","10-05-2023","15-06-2023"),
+   stringsAsFactors = FALSE
+ )

Explanation

·   SAS uses INPUT+ DATALINES

·   R uses data.frame( )

·  Velocity intentionally stored as character (to simulate error like abc)

2. Identifying Errors

SAS Code

proc print data=orbital_debris_raw;

run;

OUTPUT:

ObsObject_IDOrbit_AltitudeVelocity_km_sCollision_RiskObject_SizeTracking_AgencyFeesUtilizationLaunch_Date
1OBJ0014007.8HIGH2NASA5000HIGH01-01-2023
2OBJ002.8.1MEDIUM3ISRO4500MEDIUM15-02-2023
3OBJ003550-7.5LOW1ESA3000LOW20-03-2023
4OBJ0047008.5HIGH.NASA6000HIGH05-04-2023
5OBJ005800.MEDIUM5JAXA5500MEDIUM10-05-2023
6OBJ0066507.9LOW2ISRO.LOW15-06-2023

Errors Identified

·  Missing Orbit_Altitude

·  Negative Velocity

·  Character value in numeric field

·  Missing Object_Size

·  Missing Fees

R Code

print(orbital_debris_raw)

R OUTPUT:

Object_ID

Orbit_Altitude

Velocity_km_s

Collision_Risk

Object_Size

Trackint_Agency

Fees

Utilization

Launch_Date

stringAsFactors

OBJ001

400

7.8

HIGH

2

NASA

5000

HIGH

01-01-2023

FALSE

OBJ002

8.1

MEDIUM

3

ISRO

4500

MEDIUM

15-02-2023

FALSE

OBJ003

550

-7.5

LOW

1

ESA

3000

LOW

20-03-2023

FALSE

OBJ004

700

8.5

HIGH

NASA

6000

HIGH

05-04-2023

FALSE

OBJ005

800

abc

MEDIUM

5

JAXA

5500

MEDIUM

10-05-2023

FALSE

OBJ006

650

7.9

LOW

2

ISRO

LOW

15-06-2023

FALSE

3. Data Cleaning and Correction

SAS Code

data orbital_debris_clean;

    set orbital_debris_raw;

    if Velocity_km_s < 0 then Velocity_km_s = abs(Velocity_km_s);

    if Orbit_Altitude = . then Orbit_Altitude = 500;

    if Object_Size = . then Object_Size = 2;

    if Fees = . then Fees = 4000;

    Velocity_km_s = input(Velocity_km_s, best12.);

run;

proc print data=orbital_debris_clean;

run;

OUTPUT:

ObsObject_IDOrbit_AltitudeVelocity_km_sCollision_RiskObject_SizeTracking_AgencyFeesUtilizationLaunch_Date
1OBJ0014007.8HIGH2NASA5000HIGH01-01-2023
2OBJ0025008.1MEDIUM3ISRO4500MEDIUM15-02-2023
3OBJ0035507.5LOW1ESA3000LOW20-03-2023
4OBJ0047008.5HIGH2NASA6000HIGH05-04-2023
5OBJ005800.MEDIUM5JAXA5500MEDIUM10-05-2023
6OBJ0066507.9LOW2ISRO4000LOW15-06-2023

Explanation

·  ABS( ) corrects negative values

·  Default values assigned to missing fields

·  INPUT( ) converts character to numeric

Why Used

·  Ensures consistency

·  Prevents downstream errors

R Code

library(dplyr)


orbital_debris_clean <- orbital_debris_raw %>%

  mutate(

    Velocity_km_s = as.numeric(Velocity_km_s),

    Velocity_km_s = ifelse(Velocity_km_s < 0 , abs(Velocity_km_s),Velocity_km_s),

    Orbit_Altitude = ifelse(is.na(Orbit_Altitude),500,Orbit_Altitude),

    Object_Size = ifelse(is.na(Object_Size),2,Object_Size),

    Fees = ifelse(is.na(Fees),4000,Fees)

  )

print(orbital_debris_clean)

Explanation

·  input( )as.numeric( )

·  ififelse( )

·  Handles invalid values (abc → NA)

4. Date Handling (MDY, INTCK, INTNX)

SAS Code

data orbital_dates;

    set orbital_debris_clean;

    Launch_Date_SAS = input(Launch_Date, ddmmyy10.);

    format Launch_Date_SAS date9.;

    Today = today();

    Age_Days = intck('day', Launch_Date_SAS, Today);

    Next_Check = intnx('month', Launch_Date_SAS, 6);

    format Next_Check date9.;

run;

proc print data=orbital_dates;

run;

OUTPUT:

ObsObject_IDOrbit_AltitudeVelocity_km_sCollision_RiskObject_SizeTracking_AgencyFeesUtilizationLaunch_DateLaunch_Date_SASTodayAge_DaysNext_Check
1OBJ0014007.8HIGH2NASA5000HIGH01-01-202301JAN202324185117401JUL2023
2OBJ0025008.1MEDIUM3ISRO4500MEDIUM15-02-202315FEB202324185112901AUG2023
3OBJ0035507.5LOW1ESA3000LOW20-03-202320MAR202324185109601SEP2023
4OBJ0047008.5HIGH2NASA6000HIGH05-04-202305APR202324185108001OCT2023
5OBJ005800.MEDIUM5JAXA5500MEDIUM10-05-202310MAY202324185104501NOV2023
6OBJ0066507.9LOW2ISRO4000LOW15-06-202315JUN202324185100901DEC2023

Explanation

·  INPUT ( ) converts character date

·  INTCK ( ) calculates difference

·  INTNX ( ) calculates future date

Why Used

·  Time-based analysis is critical in orbital tracking

R Code

orbital_dates <- orbital_debris_clean %>%

  mutate(

    Launch_Date_SAS = dmy(Launch_Date),

    Today = Sys.Date(),

    Age_Days = as.numeric(Today - Launch_Date_SAS),

    Next_Check = Launch_Date_SAS %m+% months(6) 

  )


print(orbital_dates)

5. Character Functions

SAS Code

data char_functions;

    set orbital_dates;

    Agency_Clean = propcase(strip(Tracking_Agency));

    Risk_Upper = upcase(Collision_Risk);

    Risk_Lower = lowcase(Collision_Risk);

    Combined = catx('-', Object_ID, Agency_Clean);

run;

proc print data=char_functions;

run;

OUTPUT:

ObsObject_IDOrbit_AltitudeVelocity_km_sCollision_RiskObject_SizeTracking_AgencyFeesUtilizationLaunch_DateLaunch_Date_SASTodayAge_DaysNext_CheckAgency_CleanRisk_UpperRisk_LowerCombined
1OBJ0014007.8HIGH2NASA5000HIGH01-01-202301JAN202324185117401JUL2023NasaHIGHhighOBJ001-Nasa
2OBJ0025008.1MEDIUM3ISRO4500MEDIUM15-02-202315FEB202324185112901AUG2023IsroMEDIUMmediumOBJ002-Isro
3OBJ0035507.5LOW1ESA3000LOW20-03-202320MAR202324185109601SEP2023EsaLOWlowOBJ003-Esa
4OBJ0047008.5HIGH2NASA6000HIGH05-04-202305APR202324185108001OCT2023NasaHIGHhighOBJ004-Nasa
5OBJ005800.MEDIUM5JAXA5500MEDIUM10-05-202310MAY202324185104501NOV2023JaxaMEDIUMmediumOBJ005-Jaxa
6OBJ0066507.9LOW2ISRO4000LOW15-06-202315JUN202324185100901DEC2023IsroLOWlowOBJ006-Isro

Functions Used

·  STRIP ( ) removes spaces

·  PROPCASE ( ) formats text

·  CATX ( ) concatenates

6. Numeric Functions

SAS Code

data numeric_functions;

    set char_functions;

    Risk_Score = round(Velocity_km_s * Object_Size, 0.1);

    Log_Fees = log(Fees);

run;

proc print data=numeric_functions;

run;

OUTPUT:

ObsObject_IDOrbit_AltitudeVelocity_km_sCollision_RiskObject_SizeTracking_AgencyFeesUtilizationLaunch_DateLaunch_Date_SASTodayAge_DaysNext_CheckAgency_CleanRisk_UpperRisk_LowerCombinedRisk_ScoreLog_Fees
1OBJ0014007.8HIGH2NASA5000HIGH01-01-202301JAN202324185117401JUL2023NasaHIGHhighOBJ001-Nasa15.68.51719
2OBJ0025008.1MEDIUM3ISRO4500MEDIUM15-02-202315FEB202324185112901AUG2023IsroMEDIUMmediumOBJ002-Isro24.38.41183
3OBJ0035507.5LOW1ESA3000LOW20-03-202320MAR202324185109601SEP2023EsaLOWlowOBJ003-Esa7.58.00637
4OBJ0047008.5HIGH2NASA6000HIGH05-04-202305APR202324185108001OCT2023NasaHIGHhighOBJ004-Nasa17.08.69951
5OBJ005800.MEDIUM5JAXA5500MEDIUM10-05-202310MAY202324185104501NOV2023JaxaMEDIUMmediumOBJ005-Jaxa.8.61250
6OBJ0066507.9LOW2ISRO4000LOW15-06-202315JUN202324185100901DEC2023IsroLOWlowOBJ006-Isro15.88.29405

Explanation

·  Calculates derived metrics

·  Useful for risk modeling

7. SET Operation

SAS Code

data combined_data;

    set numeric_functions;

run;

proc print data=combined_data;

run;

OUTPUT:

ObsObject_IDOrbit_AltitudeVelocity_km_sCollision_RiskObject_SizeTracking_AgencyFeesUtilizationLaunch_DateLaunch_Date_SASTodayAge_DaysNext_CheckAgency_CleanRisk_UpperRisk_LowerCombinedRisk_ScoreLog_Fees
1OBJ0014007.8HIGH2NASA5000HIGH01-01-202301JAN202324185117401JUL2023NasaHIGHhighOBJ001-Nasa15.68.51719
2OBJ0025008.1MEDIUM3ISRO4500MEDIUM15-02-202315FEB202324185112901AUG2023IsroMEDIUMmediumOBJ002-Isro24.38.41183
3OBJ0035507.5LOW1ESA3000LOW20-03-202320MAR202324185109601SEP2023EsaLOWlowOBJ003-Esa7.58.00637
4OBJ0047008.5HIGH2NASA6000HIGH05-04-202305APR202324185108001OCT2023NasaHIGHhighOBJ004-Nasa17.08.69951
5OBJ005800.MEDIUM5JAXA5500MEDIUM10-05-202310MAY202324185104501NOV2023JaxaMEDIUMmediumOBJ005-Jaxa.8.61250
6OBJ0066507.9LOW2ISRO4000LOW15-06-202315JUN202324185100901DEC2023IsroLOWlowOBJ006-Isro15.88.29405

Why Used

·  Combines datasets sequentially

8. SORT & MERGE Operation

SAS Code

data agency_info;

    input Tracking_Agency $ Country $;

    datalines;

NASA USA

ISRO INDIA

ESA EUROPE

JAXA JAPAN

;

run;

proc print data=agency_info;

run;

OUTPUT:

ObsTracking_AgencyCountry
1NASAUSA
2ISROINDIA
3ESAEUROPE
4JAXAJAPAN

proc sort data=combined_data;by Tracking_Agency;run;

proc print data=combined_data;

run;

OUTPUT:

ObsObject_IDOrbit_AltitudeVelocity_km_sCollision_RiskObject_SizeTracking_AgencyFeesUtilizationLaunch_DateLaunch_Date_SASTodayAge_DaysNext_CheckAgency_CleanRisk_UpperRisk_LowerCombinedRisk_ScoreLog_Fees
1OBJ0035507.5LOW1ESA3000LOW20-03-202320MAR202324185109601SEP2023EsaLOWlowOBJ003-Esa7.58.00637
2OBJ0025008.1MEDIUM3ISRO4500MEDIUM15-02-202315FEB202324185112901AUG2023IsroMEDIUMmediumOBJ002-Isro24.38.41183
3OBJ0066507.9LOW2ISRO4000LOW15-06-202315JUN202324185100901DEC2023IsroLOWlowOBJ006-Isro15.88.29405
4OBJ005800.MEDIUM5JAXA5500MEDIUM10-05-202310MAY202324185104501NOV2023JaxaMEDIUMmediumOBJ005-Jaxa.8.61250
5OBJ0014007.8HIGH2NASA5000HIGH01-01-202301JAN202324185117401JUL2023NasaHIGHhighOBJ001-Nasa15.68.51719
6OBJ0047008.5HIGH2NASA6000HIGH05-04-202305APR202324185108001OCT2023NasaHIGHhighOBJ004-Nasa17.08.69951

proc sort data=agency_info;by Tracking_Agency;run;

proc print data=agency_info;

run;

OUTPUT:

ObsTracking_AgencyCountry
1ESAEUROPE
2ISROINDIA
3JAXAJAPAN
4NASAUSA

data merged_data;

    merge combined_data 

          agency_info;

    by Tracking_Agency;

run;

proc print data=merged_data;

run;

OUTPUT:

ObsObject_IDOrbit_AltitudeVelocity_km_sCollision_RiskObject_SizeTracking_AgencyFeesUtilizationLaunch_DateLaunch_Date_SASTodayAge_DaysNext_CheckAgency_CleanRisk_UpperRisk_LowerCombinedRisk_ScoreLog_FeesCountry
1OBJ0035507.5LOW1ESA3000LOW20-03-202320MAR202324185109601SEP2023EsaLOWlowOBJ003-Esa7.58.00637EUROPE
2OBJ0025008.1MEDIUM3ISRO4500MEDIUM15-02-202315FEB202324185112901AUG2023IsroMEDIUMmediumOBJ002-Isro24.38.41183INDIA
3OBJ0066507.9LOW2ISRO4000LOW15-06-202315JUN202324185100901DEC2023IsroLOWlowOBJ006-Isro15.88.29405INDIA
4OBJ005800.MEDIUM5JAXA5500MEDIUM10-05-202310MAY202324185104501NOV2023JaxaMEDIUMmediumOBJ005-Jaxa.8.61250JAPAN
5OBJ0014007.8HIGH2NASA5000HIGH01-01-202301JAN202324185117401JUL2023NasaHIGHhighOBJ001-Nasa15.68.51719USA
6OBJ0047008.5HIGH2NASA6000HIGH05-04-202305APR202324185108001OCT2023NasaHIGHhighOBJ004-Nasa17.08.69951USA

Explanation

·  Joins datasets

·  Adds metadata

9. APPEND Operation

SAS Code

proc append base=merged_data 

            data=combined_data;

run;

proc print data=merged_data;

run;

OUTPUT:

ObsObject_IDOrbit_AltitudeVelocity_km_sCollision_RiskObject_SizeTracking_AgencyFeesUtilizationLaunch_DateLaunch_Date_SASTodayAge_DaysNext_CheckAgency_CleanRisk_UpperRisk_LowerCombinedRisk_ScoreLog_FeesCountry
1OBJ0035507.5LOW1ESA3000LOW20-03-202320MAR202324185109601SEP2023EsaLOWlowOBJ003-Esa7.58.00637EUROPE
2OBJ0025008.1MEDIUM3ISRO4500MEDIUM15-02-202315FEB202324185112901AUG2023IsroMEDIUMmediumOBJ002-Isro24.38.41183INDIA
3OBJ0066507.9LOW2ISRO4000LOW15-06-202315JUN202324185100901DEC2023IsroLOWlowOBJ006-Isro15.88.29405INDIA
4OBJ005800.MEDIUM5JAXA5500MEDIUM10-05-202310MAY202324185104501NOV2023JaxaMEDIUMmediumOBJ005-Jaxa.8.61250JAPAN
5OBJ0014007.8HIGH2NASA5000HIGH01-01-202301JAN202324185117401JUL2023NasaHIGHhighOBJ001-Nasa15.68.51719USA
6OBJ0047008.5HIGH2NASA6000HIGH05-04-202305APR202324185108001OCT2023NasaHIGHhighOBJ004-Nasa17.08.69951USA
7OBJ0035507.5LOW1ESA3000LOW20-03-202320MAR202324185109601SEP2023EsaLOWlowOBJ003-Esa7.58.00637 
8OBJ0025008.1MEDIUM3ISRO4500MEDIUM15-02-202315FEB202324185112901AUG2023IsroMEDIUMmediumOBJ002-Isro24.38.41183 
9OBJ0066507.9LOW2ISRO4000LOW15-06-202315JUN202324185100901DEC2023IsroLOWlowOBJ006-Isro15.88.29405 
10OBJ005800.MEDIUM5JAXA5500MEDIUM10-05-202310MAY202324185104501NOV2023JaxaMEDIUMmediumOBJ005-Jaxa.8.61250 
11OBJ0014007.8HIGH2NASA5000HIGH01-01-202301JAN202324185117401JUL2023NasaHIGHhighOBJ001-Nasa15.68.51719 
12OBJ0047008.5HIGH2NASA6000HIGH05-04-202305APR202324185108001OCT2023NasaHIGHhighOBJ004-Nasa17.08.69951 

Why Used

·  Adds records efficiently

10. PROC TRANSPOSE

SAS Code

proc transpose data=merged_data out=transposed_data;

    by Object_ID NotSorted;

    var Velocity_km_s Object_Size Fees;

run;

proc print data=transposed_data;

run;

OUTPUT:

ObsObject_ID_NAME_COL1
1OBJ003Velocity_km_s7.5
2OBJ003Object_Size1.0
3OBJ003Fees3000.0
4OBJ002Velocity_km_s8.1
5OBJ002Object_Size3.0
6OBJ002Fees4500.0
7OBJ006Velocity_km_s7.9
8OBJ006Object_Size2.0
9OBJ006Fees4000.0
10OBJ005Velocity_km_s.
11OBJ005Object_Size5.0
12OBJ005Fees5500.0
13OBJ001Velocity_km_s7.8
14OBJ001Object_Size2.0
15OBJ001Fees5000.0
16OBJ004Velocity_km_s8.5
17OBJ004Object_Size2.0
18OBJ004Fees6000.0
19OBJ003Velocity_km_s7.5
20OBJ003Object_Size1.0
21OBJ003Fees3000.0
22OBJ002Velocity_km_s8.1
23OBJ002Object_Size3.0
24OBJ002Fees4500.0
25OBJ006Velocity_km_s7.9
26OBJ006Object_Size2.0
27OBJ006Fees4000.0
28OBJ005Velocity_km_s.
29OBJ005Object_Size5.0
30OBJ005Fees5500.0
31OBJ001Velocity_km_s7.8
32OBJ001Object_Size2.0
33OBJ001Fees5000.0
34OBJ004Velocity_km_s8.5
35OBJ004Object_Size2.0
36OBJ004Fees6000.0

Explanation

·  Converts rows → columns

11. Fraud Detection Macro

SAS Code

%macro fraud_check(data=);

data fraud_flag;

    set &data;

    if Fees > 7000 or Velocity_km_s > 7 then Fraud_Flag = 1;

    else Fraud_Flag = 0;

run;

proc print data=fraud_flag;

run;

%mend;


%fraud_check(data=merged_data);

OUTPUT:

ObsObject_IDOrbit_AltitudeVelocity_km_sCollision_RiskObject_SizeTracking_AgencyFeesUtilizationLaunch_DateLaunch_Date_SASTodayAge_DaysNext_CheckAgency_CleanRisk_UpperRisk_LowerCombinedRisk_ScoreLog_FeesCountryFraud_Flag
1OBJ0035507.5LOW1ESA3000LOW20-03-202320MAR202324185109601SEP2023EsaLOWlowOBJ003-Esa7.58.00637EUROPE1
2OBJ0025008.1MEDIUM3ISRO4500MEDIUM15-02-202315FEB202324185112901AUG2023IsroMEDIUMmediumOBJ002-Isro24.38.41183INDIA1
3OBJ0066507.9LOW2ISRO4000LOW15-06-202315JUN202324185100901DEC2023IsroLOWlowOBJ006-Isro15.88.29405INDIA1
4OBJ005800.MEDIUM5JAXA5500MEDIUM10-05-202310MAY202324185104501NOV2023JaxaMEDIUMmediumOBJ005-Jaxa.8.61250JAPAN0
5OBJ0014007.8HIGH2NASA5000HIGH01-01-202301JAN202324185117401JUL2023NasaHIGHhighOBJ001-Nasa15.68.51719USA1
6OBJ0047008.5HIGH2NASA6000HIGH05-04-202305APR202324185108001OCT2023NasaHIGHhighOBJ004-Nasa17.08.69951USA1
7OBJ0035507.5LOW1ESA3000LOW20-03-202320MAR202324185109601SEP2023EsaLOWlowOBJ003-Esa7.58.00637 1
8OBJ0025008.1MEDIUM3ISRO4500MEDIUM15-02-202315FEB202324185112901AUG2023IsroMEDIUMmediumOBJ002-Isro24.38.41183 1
9OBJ0066507.9LOW2ISRO4000LOW15-06-202315JUN202324185100901DEC2023IsroLOWlowOBJ006-Isro15.88.29405 1
10OBJ005800.MEDIUM5JAXA5500MEDIUM10-05-202310MAY202324185104501NOV2023JaxaMEDIUMmediumOBJ005-Jaxa.8.61250 0
11OBJ0014007.8HIGH2NASA5000HIGH01-01-202301JAN202324185117401JUL2023NasaHIGHhighOBJ001-Nasa15.68.51719 1
12OBJ0047008.5HIGH2NASA6000HIGH05-04-202305APR202324185108001OCT2023NasaHIGHhighOBJ004-Nasa17.08.69951 1

Explanation

·  Flags abnormal values

·  Automates validation

12. Utilization Classification

SAS Code

data utilization_class;

    set fraud_flag;

    if Utilization = 'HIGH' then Util_Class = 3;

    else if Utilization = 'MEDIUM' then Util_Class = 2;

    else Util_Class = 1;

run;

proc print data=utilization_class;

run;

OUTPUT:

ObsObject_IDOrbit_AltitudeVelocity_km_sCollision_RiskObject_SizeTracking_AgencyFeesUtilizationLaunch_DateLaunch_Date_SASTodayAge_DaysNext_CheckAgency_CleanRisk_UpperRisk_LowerCombinedRisk_ScoreLog_FeesCountryFraud_FlagUtil_Class
1OBJ0035507.5LOW1ESA3000LOW20-03-202320MAR202324185109601SEP2023EsaLOWlowOBJ003-Esa7.58.00637EUROPE11
2OBJ0025008.1MEDIUM3ISRO4500MEDIUM15-02-202315FEB202324185112901AUG2023IsroMEDIUMmediumOBJ002-Isro24.38.41183INDIA12
3OBJ0066507.9LOW2ISRO4000LOW15-06-202315JUN202324185100901DEC2023IsroLOWlowOBJ006-Isro15.88.29405INDIA11
4OBJ005800.MEDIUM5JAXA5500MEDIUM10-05-202310MAY202324185104501NOV2023JaxaMEDIUMmediumOBJ005-Jaxa.8.61250JAPAN02
5OBJ0014007.8HIGH2NASA5000HIGH01-01-202301JAN202324185117401JUL2023NasaHIGHhighOBJ001-Nasa15.68.51719USA13
6OBJ0047008.5HIGH2NASA6000HIGH05-04-202305APR202324185108001OCT2023NasaHIGHhighOBJ004-Nasa17.08.69951USA13
7OBJ0035507.5LOW1ESA3000LOW20-03-202320MAR202324185109601SEP2023EsaLOWlowOBJ003-Esa7.58.00637 11
8OBJ0025008.1MEDIUM3ISRO4500MEDIUM15-02-202315FEB202324185112901AUG2023IsroMEDIUMmediumOBJ002-Isro24.38.41183 12
9OBJ0066507.9LOW2ISRO4000LOW15-06-202315JUN202324185100901DEC2023IsroLOWlowOBJ006-Isro15.88.29405 11
10OBJ005800.MEDIUM5JAXA5500MEDIUM10-05-202310MAY202324185104501NOV2023JaxaMEDIUMmediumOBJ005-Jaxa.8.61250 02
11OBJ0014007.8HIGH2NASA5000HIGH01-01-202301JAN202324185117401JUL2023NasaHIGHhighOBJ001-Nasa15.68.51719 13
12OBJ0047008.5HIGH2NASA6000HIGH05-04-202305APR202324185108001OCT2023NasaHIGHhighOBJ004-Nasa17.08.69951 13

13. PROC DATASETS DELETE

SAS Code

proc datasets library=work;

    delete orbital_debris_raw;

quit;

LOG:

NOTE: Deleting WORK.ORBITAL_DEBRIS_RAW (memtype=DATA).

Why Used

·  Cleans temporary datasets

14. Final Corrected Dataset

SAS Code

data final_dataset;

    set utilization_class;

run;

proc print data=final_dataset;

run;

OUTPUT:

ObsObject_IDOrbit_AltitudeVelocity_km_sCollision_RiskObject_SizeTracking_AgencyFeesUtilizationLaunch_DateLaunch_Date_SASTodayAge_DaysNext_CheckAgency_CleanRisk_UpperRisk_LowerCombinedRisk_ScoreLog_FeesCountryFraud_FlagUtil_Class
1OBJ0035507.5LOW1ESA3000LOW20-03-202320MAR202324185109601SEP2023EsaLOWlowOBJ003-Esa7.58.00637EUROPE11
2OBJ0025008.1MEDIUM3ISRO4500MEDIUM15-02-202315FEB202324185112901AUG2023IsroMEDIUMmediumOBJ002-Isro24.38.41183INDIA12
3OBJ0066507.9LOW2ISRO4000LOW15-06-202315JUN202324185100901DEC2023IsroLOWlowOBJ006-Isro15.88.29405INDIA11
4OBJ005800.MEDIUM5JAXA5500MEDIUM10-05-202310MAY202324185104501NOV2023JaxaMEDIUMmediumOBJ005-Jaxa.8.61250JAPAN02
5OBJ0014007.8HIGH2NASA5000HIGH01-01-202301JAN202324185117401JUL2023NasaHIGHhighOBJ001-Nasa15.68.51719USA13
6OBJ0047008.5HIGH2NASA6000HIGH05-04-202305APR202324185108001OCT2023NasaHIGHhighOBJ004-Nasa17.08.69951USA13
7OBJ0035507.5LOW1ESA3000LOW20-03-202320MAR202324185109601SEP2023EsaLOWlowOBJ003-Esa7.58.00637 11
8OBJ0025008.1MEDIUM3ISRO4500MEDIUM15-02-202315FEB202324185112901AUG2023IsroMEDIUMmediumOBJ002-Isro24.38.41183 12
9OBJ0066507.9LOW2ISRO4000LOW15-06-202315JUN202324185100901DEC2023IsroLOWlowOBJ006-Isro15.88.29405 11
10OBJ005800.MEDIUM5JAXA5500MEDIUM10-05-202310MAY202324185104501NOV2023JaxaMEDIUMmediumOBJ005-Jaxa.8.61250 02
11OBJ0014007.8HIGH2NASA5000HIGH01-01-202301JAN202324185117401JUL2023NasaHIGHhighOBJ001-Nasa15.68.51719 13
12OBJ0047008.5HIGH2NASA6000HIGH05-04-202305APR202324185108001OCT2023NasaHIGHhighOBJ004-Nasa17.08.69951 13

15. R Code

library(dplyr)


orbital_data <- data.frame(

  Object_ID = c("OBJ001","OBJ002","OBJ003","OBJ004","OBJ005","OBJ006"),

  Orbit_Altitude = c(400, NA, 550, 700, 800, 650),

  Velocity_km_s = c(7.8, 8.1, -7.5, 8.5, NA, 7.9),

  Fees = c(5000,4500,3000,6000,5500,NA)

)


orbital_data <- orbital_data %>%

  mutate(

    Velocity_km_s = abs(Velocity_km_s),

    Orbit_Altitude = ifelse(is.na(Orbit_Altitude), 500, Orbit_Altitude),

    Fees = ifelse(is.na(Fees), 4000, Fees),

    Risk_Score = round(Velocity_km_s * 2,1)

  )


print(orbital_data)

R OUTPUT:

Object_ID           Orbit_Altitude   Velocity_km_s   Fees      Risk_Score

OBJ001               400                           7.8                   5000      15.6

OBJ002               500                           8.1                  4500       16.2

OBJ003               550                           7.5                   3000      15

OBJ004               700                           8.5                   6000      17

OBJ005               800                                                    5500   

OBJ006              650                            7.9                   4000     15.8


SAS vs R — Direct Comparison

1. Data Input Handling

·  SAS uses INPUT + DATALINES (structured, fixed schema)

·  R uses data.frame( ) (flexible, dynamic typing)

2. Error Behavior

·  SAS logs errors (_ERROR_=1) but continues execution

·  R silently converts invalid values to NA with warnings

3. Type Conversion

·  SAS: INPUT( ) function explicitly converts character → numeric

·  R: as.numeric( ) performs conversion but introduces NA automatically

4. Missing Value Handling

·  SAS uses . for numeric missing

·  R uses NA (more explicit and consistent across types)

5. Conditional Logic

·  SAS uses IF-THEN statements

·  R uses ifelse( ) or case_when( )

6. String Processing

·  SAS: STRIP, PROPCASE, UPCASE

·  R: trimws(), tools::toTitleCase(), toupper()

7.  Date Handling

·  SAS: INPUT, INTCK, INTNX (very powerful for intervals)

·  R: as.Date( ), arithmetic operations (simpler but less domain-specific)

8. Dataset Operations

·  SAS: SET, MERGE, PROC APPEND

·  R: dplyr, merge( ), rbind( ) (more flexible but requires packages)

9. Macro vs Functional Programming

·  SAS Macros automate workflows (%macro)

·  R uses functions and pipelines (dplyr, %>%)

10. Debugging & Auditability

·  SAS provides detailed logs (industry-grade validation)

·  R provides warnings but less structured audit trails

10 Key Points

1.     We created an orbital debris dataset with real-world variables like altitude, velocity, and risk.

2.     Intentional errors (like “abc” in velocity) were introduced to simulate real data issues.

3.     SAS detected errors using log messages and _ERROR_ variable.

4.     R handled the same error by converting invalid values into NA.

5.     Data cleaning was performed by replacing missing and incorrect values.

6.     Date variables were transformed and analyzed using SAS and R functions.

7.     Character and numeric transformations were applied for standardization.

8.     Dataset operations like merge, append, and transpose were executed.

9.     Fraud detection logic was implemented using SAS macros and R conditions.

10.  Final dataset was cleaned, validated, and ready for analysis.

Summary

In this project, we worked with orbital debris data that had multiple errors, just like real-world datasets. SAS handled the errors in a very structured way by showing detailed logs and continuing execution without stopping the process. This makes SAS very reliable in industries like clinical trials and aerospace where data validation is critical.

On the other hand, R provided a more flexible and modern approach. It handled errors quietly by converting invalid values into NA, which makes it easier for quick data analysis but slightly less controlled compared to SAS.

Overall, SAS is strong in data governance, validation, and audit trails, while R is powerful in flexibility, speed, and modern data manipulation. Both tools complement each other, and understanding both gives a strong advantage in real-world data projects.

Conclusion

This project clearly demonstrates that Advanced SAS Programming is extremely powerful in handling real-world data challenges. Through structured steps, we:

·  Created a dataset with intentional errors

·  Detected inconsistencies

·  Applied correction logic

·  Used advanced functions and macros

·  Ensured data integrity

·  Replicated logic in R

Key Takeaways

·  SAS excels in data cleaning and validation

·  Macros automate complex logic

·  Functions ensure precision

·  Procedures enable structured analysis


SAS INTERVIEW QUESTIONS

1. How do FIRST. and LAST. work?

Answer:

FIRST. and LAST. are temporary variables automatically created by SAS when using a BY statement after sorting data.

  • FIRST.variable = 1 for the first observation of each BY group
  • LAST.variable = 1 for the last observation of each BY group

Example: Used for group totals or identifying first/last records in a group.

2. Difference between CALL SYMPUT and CALL SYMPUTX?

Answer:

Feature

CALL SYMPUT

CALL SYMPUTX

Spaces

Keeps leading/trailing spaces

Removes spaces automatically

Type conversion

May cause numeric-to-character notes

Automatically converts values

Usage

Older method

Recommended method

Short: CALL SYMPUTX is an improved version of CALL SYMPUT.

3. Difference between INPUT and PUT functions?

Answer:

Function

Purpose

INPUT

Converts     character → numeric

PUT

Converts     numeric → character

Example:

  • INPUT(char_var,8.) → character to numeric
  • PUT(num_var,8.) → numeric to character.

--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

About the Author:

SAS Learning Hub is a data analytics and SAS programming platform focused on clinical, financial, and real-world data analysis. The content is created by professionals with academic training in Pharmaceutics and hands-on experience in Base SAS, PROC SQL, Macros, SDTM, and ADaM, providing practical and industry-relevant SAS learning resources.


Disclaimer:

The datasets and analysis in this article are created for educational and demonstration purposes only. They do not represent DIGITAL ADVERTISING data.


Our Mission:

This blog provides industry-focused SAS programming tutorials and analytics projects covering finance, healthcare, and technology.


This project is suitable for:

·  Students learning SAS

·  Data analysts building portfolios

·  Professionals preparing for SAS interviews

·  Bloggers writing about analytics 

·  Clinical SAS Programmer

·  Research Data Analyst

·  Regulatory Data Validator

--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Follow Us On : 


 
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

--->Follow our blog for more SAS-based analytics projects and industry data models.

---> Support Us By Following Our Blog..

To deepen your understanding of SAS analytics, please refer to our other data science and industry-focused projects listed below:

1.Which Country Truly Dominates the Olympics? – A Complete SAS Medal Efficiency Analytics Project

2.Which Airports Are Really the Busiest? – An End-to-End SAS Airport Traffic Analytics Project

3.Can Data Predict Election Outcomes? – A Complete SAS Voting Analytics Project

--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

About Us | Contact Privacy Policy


Comments

Popular posts from this blog

412.Can We Build And Clean A University Course Analytics & Fraud Detection System In Sas Using Only Macros While Intentionally Creating And Fixing Errors?

420.Can We Detect Errors, Prevent Fraud, And Optimize Biometric Access System Security Using Advanced SAS Programming?

418.Can We Design, Debug, Detect Fraud, And Optimize A Smart Parking System Using Advanced SAS Programming Techniques?