374.How Can SAS Be Used to Analyze Planetary Habitability?
How Can SAS Be Used to Analyze Planetary Habitability?
1. BUSINESS / SCIENTIFIC
CONTEXT
Why a “Planet Dataset”?
Although
SAS is heavily used in clinical trials, banking, and insurance,
interviewers often test conceptual strength using neutral domains
like astronomy, geography, or products.
A Planet
Analytics dataset is excellent because:
- It is non-clinical
- It contains continuous +
categorical variables
- It naturally supports:
- classification
- aggregation
- statistical analysis
- visualization
- macro automation
This project
simulates how SAS would be used to:
- analyze planetary
characteristics
- classify habitability
- study distance vs
temperature
- prepare summary statistics
- automate business logic
using macros
2. DATASET DESIGN
Variables Included
|
Variable |
Description |
|
Planet_Name |
Name of the planet |
|
Type |
Terrestrial / Gas Giant / Ice Giant / Dwarf |
|
Distance |
Distance from Sun (Million km) |
|
Gravity |
Surface gravity (m/s²) |
|
Temperature |
Average temperature (°C) |
|
Habitability_Index |
Numeric score (0–100) |
|
Discovery_Date |
SAS date |
|
Observation_Date |
Derived using INTNX |
3. COMPLETE SAS CODE
STEP 1: CREATE BASE PLANET DATASET (Using DATA STEP + MDY)
data planets_base;
length Planet_Name $15 Type $12;
format Discovery_Date Observation_Date date9.;
input Planet_Name $ Type $ Distance Gravity Temperature Habitability_Index
Discovery_Date :date9.;
Observation_Date = intnx('year', Discovery_Date, 5, 'same');
datalines;
Mercury Terrestrial 57.9 3.7 167 5 01JAN1631
Venus Terrestrial 108.2 8.9 464 2 01JAN1610
Earth Terrestrial 149.6 9.8 15 95 01JAN0001
Mars Terrestrial 227.9 3.7 -65 60 01JAN1659
Jupiter GasGiant 778.5 24.8 -110 1 01JAN1610
Saturn GasGiant 1434 10.4 -140 1 01JAN1610
Uranus IceGiant 2871 8.7 -195 0 01JAN1781
Neptune IceGiant 4495 11.2 -200 0 01JAN1846
Pluto Dwarf 5906 0.6 -225 0 01JAN1930
Kepler22b Exoplanet 600000 11.0 22 85 05DEC2011
ProximaB Exoplanet 40100 11.0 -39 70 24AUG2016
TRAPPIST1e Exoplanet 39000 9.1 -1 88 22FEB2017
TRAPPIST1f Exoplanet 39000 8.3 -27 75 22FEB2017
TRAPPIST1g Exoplanet 39000 8.9 -75 65 22FEB2017
Kepler452b Exoplanet 1400000 19.6 10 90 23JUL2015
Gliese667Cc Exoplanet 230000 14.3 -25 80 21NOV2011
;
run;
proc print data=planets_base;
run;
OUTPUT:
| Obs | Planet_Name | Type | Discovery_Date | Observation_Date | Distance | Gravity | Temperature | Habitability_Index |
|---|---|---|---|---|---|---|---|---|
| 1 | Mercury | Terrestrial | 01JAN1631 | 01JAN1636 | 57.9 | 3.7 | 167 | 5 |
| 2 | Venus | Terrestrial | 01JAN1610 | 01JAN1615 | 108.2 | 8.9 | 464 | 2 |
| 3 | Earth | Terrestrial | 01JAN2001 | 01JAN2006 | 149.6 | 9.8 | 15 | 95 |
| 4 | Mars | Terrestrial | 01JAN1659 | 01JAN1664 | 227.9 | 3.7 | -65 | 60 |
| 5 | Jupiter | GasGiant | 01JAN1610 | 01JAN1615 | 778.5 | 24.8 | -110 | 1 |
| 6 | Saturn | GasGiant | 01JAN1610 | 01JAN1615 | 1434.0 | 10.4 | -140 | 1 |
| 7 | Uranus | IceGiant | 01JAN1781 | 01JAN1786 | 2871.0 | 8.7 | -195 | 0 |
| 8 | Neptune | IceGiant | 01JAN1846 | 01JAN1851 | 4495.0 | 11.2 | -200 | 0 |
| 9 | Pluto | Dwarf | 01JAN1930 | 01JAN1935 | 5906.0 | 0.6 | -225 | 0 |
| 10 | Kepler22b | Exoplanet | 05DEC2011 | 05DEC2016 | 600000.0 | 11.0 | 22 | 85 |
| 11 | ProximaB | Exoplanet | 24AUG2016 | 24AUG2021 | 40100.0 | 11.0 | -39 | 70 |
| 12 | TRAPPIST1e | Exoplanet | 22FEB2017 | 22FEB2022 | 39000.0 | 9.1 | -1 | 88 |
| 13 | TRAPPIST1f | Exoplanet | 22FEB2017 | 22FEB2022 | 39000.0 | 8.3 | -27 | 75 |
| 14 | TRAPPIST1g | Exoplanet | 22FEB2017 | 22FEB2022 | 39000.0 | 8.9 | -75 | 65 |
| 15 | Kepler452b | Exoplanet | 23JUL2015 | 23JUL2020 | 1400000.0 | 19.6 | 10 | 90 |
| 16 | Gliese667Cc | Exoplanet | 21NOV2011 | 21NOV2016 | 230000.0 | 14.3 | -25 | 80 |
Why DATA step here?
·
Full control over:
o variable
lengths
o formats
o date
logic
·
Faster than SQL for row-level derivations
Each
observation is read, processed, and written sequentially.
Intnx Used to:
·
shift dates forward/backward
·
maintain alignment (same, beginning,
end)
STEP 2: CREATE ADDITIONAL DATASET & APPEND
data planets_new;
length Planet_Name $15 Type $12;
format Discovery_Date date9.;
Planet_Name = "LHS1140b";
Type = "Exoplanet";
Distance = 41000;
Gravity = 15.0;
Temperature = -10;
Habitability_Index = 82;
Discovery_Date = mdy(4,19,2017);
run;
proc print data=planets_new;
run;
OUTPUT:
| Obs | Planet_Name | Type | Discovery_Date | Distance | Gravity | Temperature | Habitability_Index |
|---|---|---|---|---|---|---|---|
| 1 | LHS1140b | Exoplanet | 19APR2017 | 41000 | 15 | -10 | 82 |
MDY() Used to:
· Converts
month/day/year → SAS date
· Stored as number of days since 01JAN1960
proc append base=planets_base
data=planets_new force;
run;
proc print data=planets_base;
run;
OUTPUT:
| Obs | Planet_Name | Type | Discovery_Date | Observation_Date | Distance | Gravity | Temperature | Habitability_Index |
|---|---|---|---|---|---|---|---|---|
| 1 | Mercury | Terrestrial | 01JAN1631 | 01JAN1636 | 57.9 | 3.7 | 167 | 5 |
| 2 | Venus | Terrestrial | 01JAN1610 | 01JAN1615 | 108.2 | 8.9 | 464 | 2 |
| 3 | Earth | Terrestrial | 01JAN2001 | 01JAN2006 | 149.6 | 9.8 | 15 | 95 |
| 4 | Mars | Terrestrial | 01JAN1659 | 01JAN1664 | 227.9 | 3.7 | -65 | 60 |
| 5 | Jupiter | GasGiant | 01JAN1610 | 01JAN1615 | 778.5 | 24.8 | -110 | 1 |
| 6 | Saturn | GasGiant | 01JAN1610 | 01JAN1615 | 1434.0 | 10.4 | -140 | 1 |
| 7 | Uranus | IceGiant | 01JAN1781 | 01JAN1786 | 2871.0 | 8.7 | -195 | 0 |
| 8 | Neptune | IceGiant | 01JAN1846 | 01JAN1851 | 4495.0 | 11.2 | -200 | 0 |
| 9 | Pluto | Dwarf | 01JAN1930 | 01JAN1935 | 5906.0 | 0.6 | -225 | 0 |
| 10 | Kepler22b | Exoplanet | 05DEC2011 | 05DEC2016 | 600000.0 | 11.0 | 22 | 85 |
| 11 | ProximaB | Exoplanet | 24AUG2016 | 24AUG2021 | 40100.0 | 11.0 | -39 | 70 |
| 12 | TRAPPIST1e | Exoplanet | 22FEB2017 | 22FEB2022 | 39000.0 | 9.1 | -1 | 88 |
| 13 | TRAPPIST1f | Exoplanet | 22FEB2017 | 22FEB2022 | 39000.0 | 8.3 | -27 | 75 |
| 14 | TRAPPIST1g | Exoplanet | 22FEB2017 | 22FEB2022 | 39000.0 | 8.9 | -75 | 65 |
| 15 | Kepler452b | Exoplanet | 23JUL2015 | 23JUL2020 | 1400000.0 | 19.6 | 10 | 90 |
| 16 | Gliese667Cc | Exoplanet | 21NOV2011 | 21NOV2016 | 230000.0 | 14.3 | -25 | 80 |
| 17 | LHS1140b | Exoplanet | 19APR2017 | . | 41000.0 | 15.0 | -10 | 82 |
APPEND
·
Faster than SET
·
Does NOT read base dataset
·
Common in production pipelines
STEP 3: HABITABLE ZONE CLASSIFICATION MACRO
%macro habitable_zone(input=, output=);
data &output;
set &input;
length Habitable_Zone $17;
if Habitability_Index >= 80 then Habitable_Zone = "Highly Habitable";
else if Habitability_Index >= 50 then Habitable_Zone = "Potential";
else Habitable_Zone = "Non-Habitable";
run;
proc print data=&output;
run;
%mend habitable_zone;
%habitable_zone(input=planets_base, output=planets_classified);
OUTPUT:
| Obs | Planet_Name | Type | Discovery_Date | Observation_Date | Distance | Gravity | Temperature | Habitability_Index | Habitable_Zone |
|---|---|---|---|---|---|---|---|---|---|
| 1 | Mercury | Terrestrial | 01JAN1631 | 01JAN1636 | 57.9 | 3.7 | 167 | 5 | Non-Habitable |
| 2 | Venus | Terrestrial | 01JAN1610 | 01JAN1615 | 108.2 | 8.9 | 464 | 2 | Non-Habitable |
| 3 | Earth | Terrestrial | 01JAN2001 | 01JAN2006 | 149.6 | 9.8 | 15 | 95 | Highly Habitable |
| 4 | Mars | Terrestrial | 01JAN1659 | 01JAN1664 | 227.9 | 3.7 | -65 | 60 | Potential |
| 5 | Jupiter | GasGiant | 01JAN1610 | 01JAN1615 | 778.5 | 24.8 | -110 | 1 | Non-Habitable |
| 6 | Saturn | GasGiant | 01JAN1610 | 01JAN1615 | 1434.0 | 10.4 | -140 | 1 | Non-Habitable |
| 7 | Uranus | IceGiant | 01JAN1781 | 01JAN1786 | 2871.0 | 8.7 | -195 | 0 | Non-Habitable |
| 8 | Neptune | IceGiant | 01JAN1846 | 01JAN1851 | 4495.0 | 11.2 | -200 | 0 | Non-Habitable |
| 9 | Pluto | Dwarf | 01JAN1930 | 01JAN1935 | 5906.0 | 0.6 | -225 | 0 | Non-Habitable |
| 10 | Kepler22b | Exoplanet | 05DEC2011 | 05DEC2016 | 600000.0 | 11.0 | 22 | 85 | Highly Habitable |
| 11 | ProximaB | Exoplanet | 24AUG2016 | 24AUG2021 | 40100.0 | 11.0 | -39 | 70 | Potential |
| 12 | TRAPPIST1e | Exoplanet | 22FEB2017 | 22FEB2022 | 39000.0 | 9.1 | -1 | 88 | Highly Habitable |
| 13 | TRAPPIST1f | Exoplanet | 22FEB2017 | 22FEB2022 | 39000.0 | 8.3 | -27 | 75 | Potential |
| 14 | TRAPPIST1g | Exoplanet | 22FEB2017 | 22FEB2022 | 39000.0 | 8.9 | -75 | 65 | Potential |
| 15 | Kepler452b | Exoplanet | 23JUL2015 | 23JUL2020 | 1400000.0 | 19.6 | 10 | 90 | Highly Habitable |
| 16 | Gliese667Cc | Exoplanet | 21NOV2011 | 21NOV2016 | 230000.0 | 14.3 | -25 | 80 | Highly Habitable |
| 17 | LHS1140b | Exoplanet | 19APR2017 | . | 41000.0 | 15.0 | -10 | 82 | Highly Habitable |
Macro:
- avoids repetition
- enforces consistency
- allows parameter-driven logic
STEP 4: PROC SQL – ANALYTICS
proc sql;
create table planet_summary as
select
Type,
count(*) as Planet_Count,
avg(Gravity) as Avg_Gravity format=6.2,
avg(Temperature) as Avg_Temp format=6.1
from planets_classified
group by Type;
quit;
proc print data=planet_summary;
run;
OUTPUT:
| Obs | Type | Planet_Count | Avg_Gravity | Avg_Temp |
|---|---|---|---|---|
| 1 | Dwarf | 1 | 0.60 | -225.0 |
| 2 | Exoplanet | 8 | 12.15 | -18.1 |
| 3 | GasGiant | 2 | 17.60 | -125.0 |
| 4 | IceGiant | 2 | 9.95 | -197.5 |
| 5 | Terrestrial | 4 | 6.53 | 145.3 |
PROC SQL VS DATA STEP
|
PROC
SQL |
DATA
STEP |
|
Set-based |
Row-based |
|
Aggregation |
Derivation |
|
Joins |
Conditional
logic |
|
Easier
grouping |
More
control |
·
In
interviews, saying “I choose SQL for summarization and DATA step for
derivation” is a strong answer.
STEP 5: DATE ANALYSIS USING INTCK
data planets_dates;
set planets_classified;
Years_Since_Discovery = intck('year', Discovery_Date, today());
run;
proc print data=planets_dates;
run;
OUTPUT:
| Obs | Planet_Name | Type | Discovery_Date | Observation_Date | Distance | Gravity | Temperature | Habitability_Index | Habitable_Zone | Years_Since_Discovery |
|---|---|---|---|---|---|---|---|---|---|---|
| 1 | Mercury | Terrestrial | 01JAN1631 | 01JAN1636 | 57.9 | 3.7 | 167 | 5 | Non-Habitable | 395 |
| 2 | Venus | Terrestrial | 01JAN1610 | 01JAN1615 | 108.2 | 8.9 | 464 | 2 | Non-Habitable | 416 |
| 3 | Earth | Terrestrial | 01JAN2001 | 01JAN2006 | 149.6 | 9.8 | 15 | 95 | Highly Habitable | 25 |
| 4 | Mars | Terrestrial | 01JAN1659 | 01JAN1664 | 227.9 | 3.7 | -65 | 60 | Potential | 367 |
| 5 | Jupiter | GasGiant | 01JAN1610 | 01JAN1615 | 778.5 | 24.8 | -110 | 1 | Non-Habitable | 416 |
| 6 | Saturn | GasGiant | 01JAN1610 | 01JAN1615 | 1434.0 | 10.4 | -140 | 1 | Non-Habitable | 416 |
| 7 | Uranus | IceGiant | 01JAN1781 | 01JAN1786 | 2871.0 | 8.7 | -195 | 0 | Non-Habitable | 245 |
| 8 | Neptune | IceGiant | 01JAN1846 | 01JAN1851 | 4495.0 | 11.2 | -200 | 0 | Non-Habitable | 180 |
| 9 | Pluto | Dwarf | 01JAN1930 | 01JAN1935 | 5906.0 | 0.6 | -225 | 0 | Non-Habitable | 96 |
| 10 | Kepler22b | Exoplanet | 05DEC2011 | 05DEC2016 | 600000.0 | 11.0 | 22 | 85 | Highly Habitable | 15 |
| 11 | ProximaB | Exoplanet | 24AUG2016 | 24AUG2021 | 40100.0 | 11.0 | -39 | 70 | Potential | 10 |
| 12 | TRAPPIST1e | Exoplanet | 22FEB2017 | 22FEB2022 | 39000.0 | 9.1 | -1 | 88 | Highly Habitable | 9 |
| 13 | TRAPPIST1f | Exoplanet | 22FEB2017 | 22FEB2022 | 39000.0 | 8.3 | -27 | 75 | Potential | 9 |
| 14 | TRAPPIST1g | Exoplanet | 22FEB2017 | 22FEB2022 | 39000.0 | 8.9 | -75 | 65 | Potential | 9 |
| 15 | Kepler452b | Exoplanet | 23JUL2015 | 23JUL2020 | 1400000.0 | 19.6 | 10 | 90 | Highly Habitable | 11 |
| 16 | Gliese667Cc | Exoplanet | 21NOV2011 | 21NOV2016 | 230000.0 | 14.3 | -25 | 80 | Highly Habitable | 15 |
| 17 | LHS1140b | Exoplanet | 19APR2017 | . | 41000.0 | 15.0 | -10 | 82 | Highly Habitable | 9 |
Intck Counts:
- number of year boundaries
crossed
- commonly used in:
- age calculation
- study duration
- exposure time
STEP 7: PROC UNIVARIATE
proc univariate data=planets_dates;
var Habitability_Index;
histogram / normal;
run;
OUTPUT:
The UNIVARIATE Procedure
Variable: Habitability_Index
| Moments | |||
|---|---|---|---|
| N | 17 | Sum Weights | 17 |
| Mean | 47 | Sum Observations | 799 |
| Std Deviation | 40.346933 | Variance | 1627.875 |
| Skewness | -0.2550592 | Kurtosis | -2.0020696 |
| Uncorrected SS | 63599 | Corrected SS | 26046 |
| Coeff Variation | 85.8445382 | Std Error Mean | 9.78556861 |
| Basic Statistical Measures | |||
|---|---|---|---|
| Location | Variability | ||
| Mean | 47.00000 | Std Deviation | 40.34693 |
| Median | 65.00000 | Variance | 1628 |
| Mode | 0.00000 | Range | 95.00000 |
| Interquartile Range | 81.00000 | ||
| Tests for Location: Mu0=0 | ||||
|---|---|---|---|---|
| Test | Statistic | p Value | ||
| Student's t | t | 4.802991 | Pr > |t| | 0.0002 |
| Sign | M | 7 | Pr >= |M| | 0.0001 |
| Signed Rank | S | 52.5 | Pr >= |S| | 0.0001 |
| Quantiles (Definition 5) | |
|---|---|
| Level | Quantile |
| 100% Max | 95 |
| 99% | 95 |
| 95% | 95 |
| 90% | 90 |
| 75% Q3 | 82 |
| 50% Median | 65 |
| 25% Q1 | 1 |
| 10% | 0 |
| 5% | 0 |
| 1% | 0 |
| 0% Min | 0 |
| Extreme Observations | |||
|---|---|---|---|
| Lowest | Highest | ||
| Value | Obs | Value | Obs |
| 0 | 9 | 82 | 17 |
| 0 | 8 | 85 | 10 |
| 0 | 7 | 88 | 12 |
| 1 | 6 | 90 | 15 |
| 1 | 5 | 95 | 3 |
The UNIVARIATE Procedure
The UNIVARIATE Procedure
Fitted Normal Distribution for Habitability_Index
| Parameters for Normal Distribution | ||
|---|---|---|
| Parameter | Symbol | Estimate |
| Mean | Mu | 47 |
| Std Dev | Sigma | 40.34693 |
| Goodness-of-Fit Tests for Normal Distribution | ||||
|---|---|---|---|---|
| Test | Statistic | p Value | ||
| Kolmogorov-Smirnov | D | 0.26282028 | Pr > D | <0.010 |
| Cramer-von Mises | W-Sq | 0.26981561 | Pr > W-Sq | <0.005 |
| Anderson-Darling | A-Sq | 1.64457364 | Pr > A-Sq | <0.005 |
| Quantiles for Normal Distribution | ||
|---|---|---|
| Percent | Quantile | |
| Observed | Estimated | |
| 1.0 | 0.0000 | -46.86100 |
| 5.0 | 0.0000 | -19.36480 |
| 10.0 | 0.0000 | -4.70668 |
| 25.0 | 1.0000 | 19.78641 |
| 50.0 | 65.0000 | 47.00000 |
| 75.0 | 82.0000 | 74.21359 |
| 90.0 | 90.0000 | 98.70668 |
| 95.0 | 95.0000 | 113.36480 |
| 99.0 | 95.0000 | 140.86100 |
·
Distribution analysis
·
Normality testing
·
Outlier detection
STEP 8: PROC SGPLOT – VISUALIZATION
proc sgplot data=planets_dates;
scatter x=Distance y=Temperature / group=Habitable_Zone;
xaxis label="Distance from Sun (Million km)";
yaxis label="Average Temperature (°C)";
title "Planet Distance vs Temperature by Habitability";
run;
OUTPUT:
Scatter
plot helps answer:
- Are distant planets colder?
- Do habitable planets
cluster?
Visualization
is increasingly important in:
- clinical reviews
- management reporting
- regulatory storytelling
STEP 9: PROC TRANSPOSE
proc transpose data=planet_summary out=planet_summary_t;
by Type NotSorted;
var Planet_Count Avg_Gravity Avg_Temp;
run;
proc print data=planet_summary_t;
run;
OUTPUT:
| Obs | Type | _NAME_ | COL1 |
|---|---|---|---|
| 1 | Dwarf | Planet_Count | 1.000 |
| 2 | Dwarf | Avg_Gravity | 0.600 |
| 3 | Dwarf | Avg_Temp | -225.000 |
| 4 | Exoplanet | Planet_Count | 8.000 |
| 5 | Exoplanet | Avg_Gravity | 12.150 |
| 6 | Exoplanet | Avg_Temp | -18.125 |
| 7 | GasGiant | Planet_Count | 2.000 |
| 8 | GasGiant | Avg_Gravity | 17.600 |
| 9 | GasGiant | Avg_Temp | -125.000 |
| 10 | IceGiant | Planet_Count | 2.000 |
| 11 | IceGiant | Avg_Gravity | 9.950 |
| 12 | IceGiant | Avg_Temp | -197.500 |
| 13 | Terrestrial | Planet_Count | 4.000 |
| 14 | Terrestrial | Avg_Gravity | 6.525 |
| 15 | Terrestrial | Avg_Temp | 145.250 |
Transpose:
- Converts rows → columns
- Required for:
- reporting tables
- shells
- regulatory outputs
4. INTERVIEW QUESTIONS THIS PROJECT PREPARES YOU FOR
1.
Difference between INTNX and INTCK?
2.
When to use PROC SQL vs DATA step?
3.
Why use macros?
4.
Difference between SET and APPEND?
5.
How SAS handles dates internally?
6.
How to classify data dynamically?
7.
How to perform EDA in SAS?
8.
How to visualize relationships?
9. How
to structure reusable code?
Conclusion:
This project demonstrated an end-to-end SAS workflow using a planetary dataset, covering data creation, date handling, macros, SQL-based summarization, statistical analysis, and visualization. By integrating multiple SAS procedures and functions in a single, coherent use case, it highlights how complex business or scientific logic can be automated, analyzed, and clearly presented—skills directly transferable to real-world SAS programming and interview scenarios.
About the Author:
SAS Learning Hub is a data analytics and SAS programming platform focused on clinical, financial, and real-world data analysis. The content is created by professionals with academic training in Pharmaceutics and hands-on experience in Base SAS, PROC SQL, Macros, SDTM, and ADaM, providing practical and industry-relevant SAS learning resources.
Disclaimer:
The datasets and analysis in this article are created for educational and demonstration purposes only. They do not represent real Planetory data.
Our Mission:
This blog provides industry-focused SAS programming tutorials and analytics projects covering finance, healthcare, and technology.
This project is suitable for:
SAS Programmer Interviews
SAS Programmer Job Seekers
SAS Analysts
Comments
Post a Comment