REALWORLD PLANETS DATASET USING PROC PRINT | PROC MEANS | PROC FREQ | PROC SQL | PROC SORT | PROC REPORT | MACROS WITH QC CHECKS AND DENSITY DERIVATION
/*CREATING A DATASET OF REALWORLD PLANETS */
1) Create dataset
options nocenter;
data work.real_planets;
length PlanetID $3 Name $20 Type $12 Mass_Earth 8 Radius_km 8 OrbitalPeriod_days 8
Moons 8 Has_Rings $1;
label PlanetID = 'Planet ID'
Name = 'Planet Name'
Type = 'Planet Type (Terrestrial/Gas/Ice/Dwarf)'
Mass_Earth = 'Mass (in Earth masses)'
Radius_km = 'Radius (km)'
OrbitalPeriod_days = 'Orbital Period (days)'
Moons = 'Number of known moons'
Has_Rings = 'Has Rings (Y/N)';
input PlanetID $ Name :$20. Type :$12. Mass_Earth Radius_km OrbitalPeriod_days
Moons Has_Rings :$1.;
datalines;
P01 Mercury Terrestrial 0.055 2439 88 0 N
P02 Venus Terrestrial 0.815 6052 225 0 N
P03 Earth Terrestrial 1 6371 365.25 1 N
P04 Mars Terrestrial 0.107 3390 687 2 N
P05 Jupiter Gas 317.8 69911 4332 79 Y
P06 Saturn Gas 95.2 58232 10759 83 Y
P07 Uranus Ice 14.5 25362 30685 27 Y
P08 Neptune Ice 17.1 24622 60190 14 Y
P09 Pluto Dwarf 0.0022 1188 90560 5 N
P10 Eris Dwarf 0.0028 1163 203830 1 N
P11 Haumea Dwarf 0.0007 816 103000 2 N
P12 Ceres Dwarf 0.00015 473 1680 0 N
;
run;
proc print;run;
Output:
| Obs | PlanetID | Name | Type | Mass_Earth | Radius_km | OrbitalPeriod_days | Moons | Has_Rings |
|---|---|---|---|---|---|---|---|---|
| 1 | P01 | Mercury | Terrestrial | 0.055 | 2439 | 88.00 | 0 | N |
| 2 | P02 | Venus | Terrestrial | 0.815 | 6052 | 225.00 | 0 | N |
| 3 | P03 | Earth | Terrestrial | 1.000 | 6371 | 365.25 | 1 | N |
| 4 | P04 | Mars | Terrestrial | 0.107 | 3390 | 687.00 | 2 | N |
| 5 | P05 | Jupiter | Gas | 317.800 | 69911 | 4332.00 | 79 | Y |
| 6 | P06 | Saturn | Gas | 95.200 | 58232 | 10759.00 | 83 | Y |
| 7 | P07 | Uranus | Ice | 14.500 | 25362 | 30685.00 | 27 | Y |
| 8 | P08 | Neptune | Ice | 17.100 | 24622 | 60190.00 | 14 | Y |
| 9 | P09 | Pluto | Dwarf | 0.002 | 1188 | 90560.00 | 5 | N |
| 10 | P10 | Eris | Dwarf | 0.003 | 1163 | 203830.00 | 1 | N |
| 11 | P11 | Haumea | Dwarf | 0.001 | 816 | 103000.00 | 2 | N |
| 12 | P12 | Ceres | Dwarf | 0.000 | 473 | 1680.00 | 0 | N |
2) Add a derived variable using another macro: density (mass/volume)
Macro: %derive_density(in=, out=)
Purpose: Demonstrates macro parameters and numeric derivation in DATA step.
Note: Mass given in Earth masses;
convert Earth mass (5.972e24 kg) and
radius in km to compute density g/cm3 roughly.
This is illustrative and not exact to high precision.
%macro derive_density(in=work.real_planets, out=work.real_planets2);
data &out.;
set &in.;
/* Convert mass in Earth masses to kg and radius in km to meters for volume */
Mass_kg = Mass_Earth * 5.972e24; /* mass in kg */
Radius_m = Radius_km * 1000; /* radius in meters */
Volume_m3 = (4/3) * constant('pi') * (Radius_m**3);
/* Density in kg/m^3, convert to g/cm^3 dividing by 1000 then by 1e6 -> kg/m3 to g/cm3 = /1000*/
Density_gcm3 = (Mass_kg / Volume_m3) / 1000;
format Density_gcm3 6.3 Mass_kg comma15.0 Radius_m comma12.0 Volume_m3 comma15.0;
drop Mass_kg Radius_m Volume_m3;
run;
proc print;run;
%mend derive_density;
%derive_density(in=work.real_planets, out=work.real_planets2);
Output:
| Obs | PlanetID | Name | Type | Mass_Earth | Radius_km | OrbitalPeriod_days | Moons | Has_Rings | Density_gcm3 |
|---|---|---|---|---|---|---|---|---|---|
| 1 | P01 | Mercury | Terrestrial | 0.055 | 2439 | 88.00 | 0 | N | 5.405 |
| 2 | P02 | Venus | Terrestrial | 0.815 | 6052 | 225.00 | 0 | N | 5.242 |
| 3 | P03 | Earth | Terrestrial | 1.000 | 6371 | 365.25 | 1 | N | 5.513 |
| 4 | P04 | Mars | Terrestrial | 0.107 | 3390 | 687.00 | 2 | N | 3.916 |
| 5 | P05 | Jupiter | Gas | 317.800 | 69911 | 4332.00 | 79 | Y | 1.326 |
| 6 | P06 | Saturn | Gas | 95.200 | 58232 | 10759.00 | 83 | Y | 0.687 |
| 7 | P07 | Uranus | Ice | 14.500 | 25362 | 30685.00 | 27 | Y | 1.267 |
| 8 | P08 | Neptune | Ice | 17.100 | 24622 | 60190.00 | 14 | Y | 1.633 |
| 9 | P09 | Pluto | Dwarf | 0.002 | 1188 | 90560.00 | 5 | N | 1.871 |
| 10 | P10 | Eris | Dwarf | 0.003 | 1163 | 203830.00 | 1 | N | 2.538 |
| 11 | P11 | Haumea | Dwarf | 0.001 | 816 | 103000.00 | 2 | N | 1.837 |
| 12 | P12 | Ceres | Dwarf | 0.000 | 473 | 1680.00 | 0 | N | 2.021 |
3) PROC PRINT
Purpose: Display the dataset rows for quick inspection.
proc print data=work.real_planets2 label noobs;
title 'PROC PRINT | Purpose: Display dataset rows for quick inspection';
var PlanetID Name Type Mass_Earth Radius_km OrbitalPeriod_days Moons Has_Rings Density_gcm3;
run;
Output:
| Planet ID | Planet Name | Planet Type (Terrestrial/Gas/Ice/Dwarf) | Mass (in Earth masses) | Radius (km) | Orbital Period (days) | Number of known moons | Has Rings (Y/N) | Density_gcm3 |
|---|---|---|---|---|---|---|---|---|
| P01 | Mercury | Terrestrial | 0.055 | 2439 | 88.00 | 0 | N | 5.405 |
| P02 | Venus | Terrestrial | 0.815 | 6052 | 225.00 | 0 | N | 5.242 |
| P03 | Earth | Terrestrial | 1.000 | 6371 | 365.25 | 1 | N | 5.513 |
| P04 | Mars | Terrestrial | 0.107 | 3390 | 687.00 | 2 | N | 3.916 |
| P05 | Jupiter | Gas | 317.800 | 69911 | 4332.00 | 79 | Y | 1.326 |
| P06 | Saturn | Gas | 95.200 | 58232 | 10759.00 | 83 | Y | 0.687 |
| P07 | Uranus | Ice | 14.500 | 25362 | 30685.00 | 27 | Y | 1.267 |
| P08 | Neptune | Ice | 17.100 | 24622 | 60190.00 | 14 | Y | 1.633 |
| P09 | Pluto | Dwarf | 0.002 | 1188 | 90560.00 | 5 | N | 1.871 |
| P10 | Eris | Dwarf | 0.003 | 1163 | 203830.00 | 1 | N | 2.538 |
| P11 | Haumea | Dwarf | 0.001 | 816 | 103000.00 | 2 | N | 1.837 |
| P12 | Ceres | Dwarf | 0.000 | 473 | 1680.00 | 0 | N | 2.021 |
4) PROC MEANS
Purpose: Provide numeric summaries (mean, min, max, std) for continuous variables.
proc means data=work.real_planets2 mean median min max std n nmiss;
title 'PROC MEANS | Purpose: Provide numeric summaries for continuous variables';
var Mass_Earth Radius_km OrbitalPeriod_days Moons Density_gcm3;
run;
Output:
The MEANS Procedure
| Variable | Label | Mean | Median | Minimum | Maximum | Std Dev | N | N Miss |
|---|---|---|---|---|---|---|---|---|
Mass_Earth Radius_km OrbitalPeriod_days Moons Density_gcm3 | Mass (in Earth masses) Radius (km) Orbital Period (days) Number of known moons | 37.2152375 16668.25 42200.10 17.8333333 2.7712886 | 0.4610000 4721.00 7545.50 2.0000000 1.9457874 | 0.000150000 473.0000000 88.0000000 0 0.6873580 | 317.8000000 69911.00 203830.00 83.0000000 5.5132587 | 92.4143495 23917.15 63021.65 30.5609178 1.7621593 | 12 12 12 12 12 | 0 0 0 0 0 |
5) PROC FREQ
Purpose: Frequency distribution for categorical variables like Type and Has_Rings.
proc freq data=work.real_planets2;
title 'PROC FREQ | Purpose: Frequency distribution for categorical variables';
tables Type Has_Rings / norow nocol nopercent;
run;
Output:
The FREQ Procedure
| Planet Type (Terrestrial/Gas/Ice/Dwarf) | ||
|---|---|---|
| Type | Frequency | Cumulative Frequency |
| Dwarf | 4 | 4 |
| Gas | 2 | 6 |
| Ice | 2 | 8 |
| Terrestrial | 4 | 12 |
| Has Rings (Y/N) | ||
|---|---|---|
| Has_Rings | Frequency | Cumulative Frequency |
| N | 8 | 8 |
| Y | 4 | 12 |
6) PROC SQL
Purpose: Use SQL to create aggregated summary tables and perform joins.
proc sql;
title 'PROC SQL | Purpose: Create aggregated tables using SQL syntax';
create table work.by_type as
select Type,
count(*) as Count_Planets,
mean(Mass_Earth) as Avg_Mass_Earth format=8.3,
mean(Density_gcm3) as Avg_Density format=6.3,
sum(Moons) as Total_Moons
from work.real_planets2
group by Type
order by Count_Planets desc;
quit;
proc print data=work.by_type noobs;
title 'Summary by Type (created by PROC SQL)';
run;
Output:
| Type | Count_Planets | Avg_Mass_Earth | Avg_Density | Total_Moons |
|---|---|---|---|---|
| Dwarf | 4 | 0.001 | 2.067 | 8 |
| Terrestrial | 4 | 0.494 | 5.019 | 3 |
| Ice | 2 | 15.800 | 1.450 | 41 |
| Gas | 2 | 206.500 | 1.007 | 162 |
7) PROC SORT
Purpose: Sort the dataset for reporting or analysis.
proc sort data=work.real_planets2 out=work.real_planets_sorted;
by descending Mass_Earth;
run;
proc print;run;
Output:
| Obs | PlanetID | Name | Type | Mass_Earth | Radius_km | OrbitalPeriod_days | Moons | Has_Rings | Density_gcm3 |
|---|---|---|---|---|---|---|---|---|---|
| 1 | P05 | Jupiter | Gas | 317.800 | 69911 | 4332.00 | 79 | Y | 1.326 |
| 2 | P06 | Saturn | Gas | 95.200 | 58232 | 10759.00 | 83 | Y | 0.687 |
| 3 | P08 | Neptune | Ice | 17.100 | 24622 | 60190.00 | 14 | Y | 1.633 |
| 4 | P07 | Uranus | Ice | 14.500 | 25362 | 30685.00 | 27 | Y | 1.267 |
| 5 | P03 | Earth | Terrestrial | 1.000 | 6371 | 365.25 | 1 | N | 5.513 |
| 6 | P02 | Venus | Terrestrial | 0.815 | 6052 | 225.00 | 0 | N | 5.242 |
| 7 | P04 | Mars | Terrestrial | 0.107 | 3390 | 687.00 | 2 | N | 3.916 |
| 8 | P01 | Mercury | Terrestrial | 0.055 | 2439 | 88.00 | 0 | N | 5.405 |
| 9 | P10 | Eris | Dwarf | 0.003 | 1163 | 203830.00 | 1 | N | 2.538 |
| 10 | P09 | Pluto | Dwarf | 0.002 | 1188 | 90560.00 | 5 | N | 1.871 |
| 11 | P11 | Haumea | Dwarf | 0.001 | 816 | 103000.00 | 2 | N | 1.837 |
| 12 | P12 | Ceres | Dwarf | 0.000 | 473 | 1680.00 | 0 | N | 2.021 |
8) PROC REPORT
Purpose: Create a formatted report combining text and numeric columns.
proc report data=work.real_planets_sorted nowd;
title 'PROC REPORT | Purpose: Create a formatted report combining text and numeric columns';
columns PlanetID Name Type Mass_Earth Radius_km Density_gcm3 Moons Has_Rings;
define PlanetID / display 'ID' width=3;
define Name / display 'Planet' width=15;
define Type / display 'Type' width=10;
define Mass_Earth / analysis mean format=8.3 'Mass (Earths)';
define Radius_km / display format=comma8.0 'Radius (km)';
define Density_gcm3 / display format=6.3 'Density (g/cm3)';
define Moons / analysis sum 'Moons';
define Has_Rings / display 'Rings';
run;
Output:
| ID | Planet | Type | Mass (Earths) | Radius (km) | Density (g cm3) | Moons | Rings |
|---|---|---|---|---|---|---|---|
| P05 | Jupiter | Gas | 317.800 | 69,911 | 1.326 | 79 | Y |
| P06 | Saturn | Gas | 95.200 | 58,232 | 0.687 | 83 | Y |
| P08 | Neptune | Ice | 17.100 | 24,622 | 1.633 | 14 | Y |
| P07 | Uranus | Ice | 14.500 | 25,362 | 1.267 | 27 | Y |
| P03 | Earth | Terrestrial | 1.000 | 6,371 | 5.513 | 1 | N |
| P02 | Venus | Terrestrial | 0.815 | 6,052 | 5.242 | 0 | N |
| P04 | Mars | Terrestrial | 0.107 | 3,390 | 3.916 | 2 | N |
| P01 | Mercury | Terrestrial | 0.055 | 2,439 | 5.405 | 0 | N |
| P10 | Eris | Dwarf | 0.003 | 1,163 | 2.538 | 1 | N |
| P09 | Pluto | Dwarf | 0.002 | 1,188 | 1.871 | 5 | N |
| P11 | Haumea | Dwarf | 0.001 | 816 | 1.837 | 2 | N |
| P12 | Ceres | Dwarf | 0.000 | 473 | 2.021 | 0 | N |
9) Example macro to create a radar of which planets have rings
Macro: %planets_with_rings
Purpose: Generate a small table of ringed planets
%macro planets_with_rings(in=work.real_planets2, out=work.ringed);
proc sql;
create table &out. as
select PlanetID, Name, Type, Mass_Earth, Moons
from &in.
where upcase(Has_Rings) = 'Y'
order by Mass_Earth desc;
quit;
%mend planets_with_rings;
%planets_with_rings(in=work.real_planets2, out=work.ringed);
proc print data=work.ringed noobs;
title 'Planets with Rings (macro-generated)';
run;
Output:
| PlanetID | Name | Type | Mass_Earth | Moons |
|---|---|---|---|---|
| P05 | Jupiter | Gas | 317.8 | 79 |
| P06 | Saturn | Gas | 95.2 | 83 |
| P08 | Neptune | Ice | 17.1 | 14 |
| P07 | Uranus | Ice | 14.5 | 27 |
10) Validation / QC checks using PROC SQL and data step
Purpose: Identify missing values, outliers, and logical inconsistencies.
proc sql;
create table work.qc_checks as
select PlanetID, Name,
case when Mass_Earth <= 0 then 'Mass<=0' else '' end as Mass_Issue,
case when Radius_km <= 0 then 'Radius<=0' else '' end as Radius_Issue,
case when Missing(Type) then 'TypeMissing' else '' end as Type_Issue
from work.real_planets2;
quit;
proc print data=work.qc_checks noobs;
title 'QC Checks - Purpose: Identify missing or invalid numeric/categorical values';
run;
Output:
| PlanetID | Name | Mass_Issue | Radius_Issue | Type_Issue |
|---|---|---|---|---|
| P01 | Mercury | |||
| P02 | Venus | |||
| P03 | Earth | |||
| P04 | Mars | |||
| P05 | Jupiter | |||
| P06 | Saturn | |||
| P07 | Uranus | |||
| P08 | Neptune | |||
| P09 | Pluto | |||
| P10 | Eris | |||
| P11 | Haumea | |||
| P12 | Ceres |
No comments:
Post a Comment