374.How Can SAS Be Used to Analyze Planetary Habitability?

How Can SAS Be Used to Analyze Planetary Habitability?



HERE IN THIS PROJECT WE USED THESE SAS STATEMENTS ---DATA STEP | PROC SQL | MACROS | SGPLOT | DATE FUNCTIONS (MDY-INTCK-INTNX) |  APPEND | TRANSPOSE | PROC UNIVARIATE

1. BUSINESS / SCIENTIFIC CONTEXT

Why a “Planet Dataset”?

Although SAS is heavily used in clinical trials, banking, and insurance, interviewers often test conceptual strength using neutral domains like astronomy, geography, or products.

A Planet Analytics dataset is excellent because:

  • It is non-clinical
  • It contains continuous + categorical variables
  • It naturally supports:
    • classification
    • aggregation
    • statistical analysis
    • visualization
    • macro automation

This project simulates how SAS would be used to:

  • analyze planetary characteristics
  • classify habitability
  • study distance vs temperature
  • prepare summary statistics
  • automate business logic using macros

2. DATASET DESIGN

Variables Included

Variable

Description

Planet_Name

Name of the planet

Type

Terrestrial / Gas Giant / Ice Giant / Dwarf

Distance

Distance from Sun (Million km)

Gravity

Surface gravity (m/s²)

Temperature

Average temperature (°C)

Habitability_Index

Numeric score (0–100)

Discovery_Date

SAS date

Observation_Date

Derived using INTNX



3. COMPLETE SAS CODE

STEP 1: CREATE BASE PLANET DATASET (Using DATA STEP + MDY)

data planets_base;

    length Planet_Name $15 Type $12;

    format Discovery_Date Observation_Date date9.;

    input Planet_Name $ Type $ Distance Gravity Temperature Habitability_Index 

          Discovery_Date :date9.;

    Observation_Date = intnx('year', Discovery_Date, 5, 'same');

    datalines;

Mercury Terrestrial 57.9 3.7 167 5 01JAN1631

Venus Terrestrial 108.2 8.9 464 2 01JAN1610

Earth Terrestrial 149.6 9.8 15 95 01JAN0001

Mars Terrestrial 227.9 3.7 -65 60 01JAN1659

Jupiter GasGiant 778.5 24.8 -110 1 01JAN1610

Saturn GasGiant 1434 10.4 -140 1 01JAN1610

Uranus IceGiant 2871 8.7 -195 0 01JAN1781

Neptune IceGiant 4495 11.2 -200 0 01JAN1846

Pluto Dwarf 5906 0.6 -225 0 01JAN1930

Kepler22b Exoplanet 600000 11.0 22 85 05DEC2011

ProximaB Exoplanet 40100 11.0 -39 70 24AUG2016

TRAPPIST1e Exoplanet 39000 9.1 -1 88 22FEB2017

TRAPPIST1f Exoplanet 39000 8.3 -27 75 22FEB2017

TRAPPIST1g Exoplanet 39000 8.9 -75 65 22FEB2017

Kepler452b Exoplanet 1400000 19.6 10 90 23JUL2015

Gliese667Cc Exoplanet 230000 14.3 -25 80 21NOV2011

;

run;

proc print data=planets_base;

run;

OUTPUT:

ObsPlanet_NameTypeDiscovery_DateObservation_DateDistanceGravityTemperatureHabitability_Index
1MercuryTerrestrial01JAN163101JAN163657.93.71675
2VenusTerrestrial01JAN161001JAN1615108.28.94642
3EarthTerrestrial01JAN200101JAN2006149.69.81595
4MarsTerrestrial01JAN165901JAN1664227.93.7-6560
5JupiterGasGiant01JAN161001JAN1615778.524.8-1101
6SaturnGasGiant01JAN161001JAN16151434.010.4-1401
7UranusIceGiant01JAN178101JAN17862871.08.7-1950
8NeptuneIceGiant01JAN184601JAN18514495.011.2-2000
9PlutoDwarf01JAN193001JAN19355906.00.6-2250
10Kepler22bExoplanet05DEC201105DEC2016600000.011.02285
11ProximaBExoplanet24AUG201624AUG202140100.011.0-3970
12TRAPPIST1eExoplanet22FEB201722FEB202239000.09.1-188
13TRAPPIST1fExoplanet22FEB201722FEB202239000.08.3-2775
14TRAPPIST1gExoplanet22FEB201722FEB202239000.08.9-7565
15Kepler452bExoplanet23JUL201523JUL20201400000.019.61090
16Gliese667CcExoplanet21NOV201121NOV2016230000.014.3-2580

Why DATA step here?

·       Full control over:

o   variable lengths

o   formats

o   date logic

·       Faster than SQL for row-level derivations

Each observation is read, processed, and written sequentially.

Intnx Used to:

·       shift dates forward/backward

·       maintain alignment (same, beginning, end)


STEP 2: CREATE ADDITIONAL DATASET & APPEND

data planets_new;

    length Planet_Name $15 Type $12;

    format Discovery_Date date9.;

    Planet_Name = "LHS1140b";

    Type = "Exoplanet";

    Distance = 41000;

    Gravity = 15.0;

    Temperature = -10;

    Habitability_Index = 82;

    Discovery_Date = mdy(4,19,2017);

run;

proc print data=planets_new;

run;

OUTPUT:

ObsPlanet_NameTypeDiscovery_DateDistanceGravityTemperatureHabitability_Index
1LHS1140bExoplanet19APR20174100015-1082


MDY() Used to:

·  Converts month/day/year → SAS date

·  Stored as number of days since 01JAN1960


proc append base=planets_base 

            data=planets_new force;

run;

proc print data=planets_base;

run;

OUTPUT:

ObsPlanet_NameTypeDiscovery_DateObservation_DateDistanceGravityTemperatureHabitability_Index
1MercuryTerrestrial01JAN163101JAN163657.93.71675
2VenusTerrestrial01JAN161001JAN1615108.28.94642
3EarthTerrestrial01JAN200101JAN2006149.69.81595
4MarsTerrestrial01JAN165901JAN1664227.93.7-6560
5JupiterGasGiant01JAN161001JAN1615778.524.8-1101
6SaturnGasGiant01JAN161001JAN16151434.010.4-1401
7UranusIceGiant01JAN178101JAN17862871.08.7-1950
8NeptuneIceGiant01JAN184601JAN18514495.011.2-2000
9PlutoDwarf01JAN193001JAN19355906.00.6-2250
10Kepler22bExoplanet05DEC201105DEC2016600000.011.02285
11ProximaBExoplanet24AUG201624AUG202140100.011.0-3970
12TRAPPIST1eExoplanet22FEB201722FEB202239000.09.1-188
13TRAPPIST1fExoplanet22FEB201722FEB202239000.08.3-2775
14TRAPPIST1gExoplanet22FEB201722FEB202239000.08.9-7565
15Kepler452bExoplanet23JUL201523JUL20201400000.019.61090
16Gliese667CcExoplanet21NOV201121NOV2016230000.014.3-2580
17LHS1140bExoplanet19APR2017.41000.015.0-1082

APPEND

·       Faster than SET

·       Does NOT read base dataset

·       Common in production pipelines


STEP 3: HABITABLE ZONE CLASSIFICATION MACRO

%macro habitable_zone(input=, output=);

data &output;

    set &input;

    length Habitable_Zone $17;

    if Habitability_Index >= 80 then Habitable_Zone = "Highly Habitable";

    else if Habitability_Index >= 50 then Habitable_Zone = "Potential";

    else Habitable_Zone = "Non-Habitable";

run;

proc print data=&output;

run;

%mend habitable_zone;


%habitable_zone(input=planets_base, output=planets_classified);

OUTPUT:

ObsPlanet_NameTypeDiscovery_DateObservation_DateDistanceGravityTemperatureHabitability_IndexHabitable_Zone
1MercuryTerrestrial01JAN163101JAN163657.93.71675Non-Habitable
2VenusTerrestrial01JAN161001JAN1615108.28.94642Non-Habitable
3EarthTerrestrial01JAN200101JAN2006149.69.81595Highly Habitable
4MarsTerrestrial01JAN165901JAN1664227.93.7-6560Potential
5JupiterGasGiant01JAN161001JAN1615778.524.8-1101Non-Habitable
6SaturnGasGiant01JAN161001JAN16151434.010.4-1401Non-Habitable
7UranusIceGiant01JAN178101JAN17862871.08.7-1950Non-Habitable
8NeptuneIceGiant01JAN184601JAN18514495.011.2-2000Non-Habitable
9PlutoDwarf01JAN193001JAN19355906.00.6-2250Non-Habitable
10Kepler22bExoplanet05DEC201105DEC2016600000.011.02285Highly Habitable
11ProximaBExoplanet24AUG201624AUG202140100.011.0-3970Potential
12TRAPPIST1eExoplanet22FEB201722FEB202239000.09.1-188Highly Habitable
13TRAPPIST1fExoplanet22FEB201722FEB202239000.08.3-2775Potential
14TRAPPIST1gExoplanet22FEB201722FEB202239000.08.9-7565Potential
15Kepler452bExoplanet23JUL201523JUL20201400000.019.61090Highly Habitable
16Gliese667CcExoplanet21NOV201121NOV2016230000.014.3-2580Highly Habitable
17LHS1140bExoplanet19APR2017.41000.015.0-1082Highly Habitable

Macro:

  • avoids repetition
  • enforces consistency
  • allows parameter-driven logic

STEP 4: PROC SQL – ANALYTICS

proc sql;

    create table planet_summary as

    select 

        Type,

        count(*) as Planet_Count,

        avg(Gravity) as Avg_Gravity format=6.2,

        avg(Temperature) as Avg_Temp format=6.1

    from planets_classified

    group by Type;

quit;

proc print data=planet_summary;

run;

OUTPUT:

ObsTypePlanet_CountAvg_GravityAvg_Temp
1Dwarf10.60-225.0
2Exoplanet812.15-18.1
3GasGiant217.60-125.0
4IceGiant29.95-197.5
5Terrestrial46.53145.3

PROC SQL VS DATA STEP

PROC SQL

DATA STEP

Set-based

Row-based

Aggregation

Derivation

Joins

Conditional logic

Easier grouping

More control

·       In interviews, saying “I choose SQL for summarization and DATA step for derivation” is a strong answer.


STEP 5: DATE ANALYSIS USING INTCK

data planets_dates;

    set planets_classified;

    Years_Since_Discovery = intck('year', Discovery_Date, today());

run;

proc print data=planets_dates;

run;

OUTPUT:

ObsPlanet_NameTypeDiscovery_DateObservation_DateDistanceGravityTemperatureHabitability_IndexHabitable_ZoneYears_Since_Discovery
1MercuryTerrestrial01JAN163101JAN163657.93.71675Non-Habitable395
2VenusTerrestrial01JAN161001JAN1615108.28.94642Non-Habitable416
3EarthTerrestrial01JAN200101JAN2006149.69.81595Highly Habitable25
4MarsTerrestrial01JAN165901JAN1664227.93.7-6560Potential367
5JupiterGasGiant01JAN161001JAN1615778.524.8-1101Non-Habitable416
6SaturnGasGiant01JAN161001JAN16151434.010.4-1401Non-Habitable416
7UranusIceGiant01JAN178101JAN17862871.08.7-1950Non-Habitable245
8NeptuneIceGiant01JAN184601JAN18514495.011.2-2000Non-Habitable180
9PlutoDwarf01JAN193001JAN19355906.00.6-2250Non-Habitable96
10Kepler22bExoplanet05DEC201105DEC2016600000.011.02285Highly Habitable15
11ProximaBExoplanet24AUG201624AUG202140100.011.0-3970Potential10
12TRAPPIST1eExoplanet22FEB201722FEB202239000.09.1-188Highly Habitable9
13TRAPPIST1fExoplanet22FEB201722FEB202239000.08.3-2775Potential9
14TRAPPIST1gExoplanet22FEB201722FEB202239000.08.9-7565Potential9
15Kepler452bExoplanet23JUL201523JUL20201400000.019.61090Highly Habitable11
16Gliese667CcExoplanet21NOV201121NOV2016230000.014.3-2580Highly Habitable15
17LHS1140bExoplanet19APR2017.41000.015.0-1082Highly Habitable9

Intck Counts:

  • number of year boundaries crossed
  • commonly used in:
    • age calculation
    • study duration
    • exposure time

STEP 7: PROC UNIVARIATE

proc univariate data=planets_dates;

    var Habitability_Index;

    histogram / normal;

run;

OUTPUT:

The UNIVARIATE Procedure

Variable: Habitability_Index

Moments
N17Sum Weights17
Mean47Sum Observations799
Std Deviation40.346933Variance1627.875
Skewness-0.2550592Kurtosis-2.0020696
Uncorrected SS63599Corrected SS26046
Coeff Variation85.8445382Std Error Mean9.78556861
Basic Statistical Measures
LocationVariability
Mean47.00000Std Deviation40.34693
Median65.00000Variance1628
Mode0.00000Range95.00000
  Interquartile Range81.00000
Tests for Location: Mu0=0
TestStatisticp Value
Student's tt4.802991Pr > |t|0.0002
SignM7Pr >= |M|0.0001
Signed RankS52.5Pr >= |S|0.0001
Quantiles (Definition 5)
LevelQuantile
100% Max95
99%95
95%95
90%90
75% Q382
50% Median65
25% Q11
10%0
5%0
1%0
0% Min0
Extreme Observations
LowestHighest
ValueObsValueObs
098217
088510
078812
169015
15953

The UNIVARIATE Procedure

Histogram for Habitability_Index


The UNIVARIATE Procedure

Fitted Normal Distribution for Habitability_Index

Parameters for Normal Distribution
ParameterSymbolEstimate
MeanMu47
Std DevSigma40.34693
Goodness-of-Fit Tests for Normal Distribution
TestStatisticp Value
Kolmogorov-SmirnovD0.26282028Pr > D<0.010
Cramer-von MisesW-Sq0.26981561Pr > W-Sq<0.005
Anderson-DarlingA-Sq1.64457364Pr > A-Sq<0.005
Quantiles for Normal Distribution
PercentQuantile
ObservedEstimated
1.00.0000-46.86100
5.00.0000-19.36480
10.00.0000-4.70668
25.01.000019.78641
50.065.000047.00000
75.082.000074.21359
90.090.000098.70668
95.095.0000113.36480
99.095.0000140.86100
PROC UNIVARIATE

·       Distribution analysis

·       Normality testing

·       Outlier detection

STEP 8: PROC SGPLOT – VISUALIZATION

proc sgplot data=planets_dates;

    scatter x=Distance y=Temperature / group=Habitable_Zone;

    xaxis label="Distance from Sun (Million km)";

    yaxis label="Average Temperature (°C)";

    title "Planet Distance vs Temperature by Habitability";

run;

OUTPUT:

The SGPlot Procedure

Scatter plot helps answer:

  • Are distant planets colder?
  • Do habitable planets cluster?

Visualization is increasingly important in:

  • clinical reviews
  • management reporting
  • regulatory storytelling

STEP 9: PROC TRANSPOSE

proc transpose data=planet_summary out=planet_summary_t;

    by Type NotSorted;

    var Planet_Count Avg_Gravity Avg_Temp;

run;

proc print data=planet_summary_t;

run;

OUTPUT:

ObsType_NAME_COL1
1DwarfPlanet_Count1.000
2DwarfAvg_Gravity0.600
3DwarfAvg_Temp-225.000
4ExoplanetPlanet_Count8.000
5ExoplanetAvg_Gravity12.150
6ExoplanetAvg_Temp-18.125
7GasGiantPlanet_Count2.000
8GasGiantAvg_Gravity17.600
9GasGiantAvg_Temp-125.000
10IceGiantPlanet_Count2.000
11IceGiantAvg_Gravity9.950
12IceGiantAvg_Temp-197.500
13TerrestrialPlanet_Count4.000
14TerrestrialAvg_Gravity6.525
15TerrestrialAvg_Temp145.250

Transpose:

  • Converts rows → columns
  • Required for:
    • reporting tables
    • shells
    • regulatory outputs

4. INTERVIEW QUESTIONS THIS PROJECT PREPARES YOU FOR

1.     Difference between INTNX and INTCK?

2.     When to use PROC SQL vs DATA step?

3.     Why use macros?

4.     Difference between SET and APPEND?

5.     How SAS handles dates internally?

6.     How to classify data dynamically?

7.     How to perform EDA in SAS?

8.     How to visualize relationships?

9.     How to structure reusable code?


Conclusion:

This project demonstrated an end-to-end SAS workflow using a planetary dataset, covering data creation, date handling, macros, SQL-based summarization, statistical analysis, and visualization. By integrating multiple SAS procedures and functions in a single, coherent use case, it highlights how complex business or scientific logic can be automated, analyzed, and clearly presented—skills directly transferable to real-world SAS programming and interview scenarios.


About the Author:

SAS Learning Hub is a data analytics and SAS programming platform focused on clinical, financial, and real-world data analysis. The content is created by professionals with academic training in Pharmaceutics and hands-on experience in Base SAS, PROC SQL, Macros, SDTM, and ADaM, providing practical and industry-relevant SAS learning resources.


Disclaimer:

The datasets and analysis in this article are created for educational and demonstration purposes only. They do not represent real Planetory data.


Our Mission:

This blog provides industry-focused SAS programming tutorials and analytics projects covering finance, healthcare, and technology.


This project is suitable for:

SAS Programmer Interviews

SAS Programmer Job Seekers

SAS Analysts


Follow Us On : 


 


--->Follow our blog for more SAS-based analytics projects and industry data models.


To deepen your understanding of SAS analytics, please refer to our other data science and industry-focused projects listed below:








Comments

Popular posts from this blog

184.HOW TO CREATE MOCK SHELLS AND CLINICAL DATA LISTINGS IN SAS USING DATA NULL | PROC PRINT | PROC REPORT | PROC SORT | PROC COMPARE | ODS PDF | ODS RTF | ODS EXCEL | A COMPLETE STEP-BY-STEP GUIDE FOR CLINICAL SAS PROGRAMMERS

62.ADDING NEW DATA INTO AVAILABLE DATA USING MERGE

341.FAMOUS SCIENTISTS DATA ANALYSIS USING PROC SQL | PROC MEANS | PROC RANK | PROC FORMAT | PROC SGPLOT | MACROS | DATE FUNCTIONS FOR IMPACT EVALUATION | RANKING INSIGHTS | TIME-BASED SCIENTIFIC CONTRIBUTION STUDY