INTERNATIONAL SPACE MISSIONS ANALYSIS USING PROC SQL | PROC MEANS | PROC UNIVARIATE | PROC SGPLOT | MACRO-BASED SUCCESS RATE COMPUTATION
options nocenter validvarname=any nodate nonumber;
1) Create Space Missions Dataset
data work.space_missions;
length Mission_ID $8 Country $20 Agency $30 Mission_Type $20 Outcome $12;
infile datalines dlm=',' dsd truncover;
input Mission_ID :$8.
Country :$20.
Agency :$30.
Launch_Year
Mission_Type :$20.
Outcome :$12.;
datalines;
M001,USA,NASA,1998,Orbiter,Success
M002,Russia,Roscosmos,2000,Orbiter,Partial
M003,India,ISRO,2008,Orbiter,Success
M004,China,CNSA,2013,Lander,Success
M005,ESA,ESA,2014,Orbiter,Failure
M006,Japan,JAXA,2010,Probe,Success
M007,USA,SpaceX,2018,Orbiter,Success
M008,India,ISRO,2019,Orbiter,Failure
M009,Russia,Roscosmos,2016,Probe,Partial
M010,China,CNSA,2020,Orbiter,Success
M011,France,CNES,2002,Probe,Success
M012,SouthKorea,KARI,2021,Orbiter,Partial
;
run;
proc print data=work.space_missions;
run;
OUTPUT:
| Obs | Mission_ID | Country | Agency | Mission_Type | Outcome | Launch_Year |
|---|---|---|---|---|---|---|
| 1 | M001 | USA | NASA | Orbiter | Success | 1998 |
| 2 | M002 | Russia | Roscosmos | Orbiter | Partial | 2000 |
| 3 | M003 | India | ISRO | Orbiter | Success | 2008 |
| 4 | M004 | China | CNSA | Lander | Success | 2013 |
| 5 | M005 | ESA | ESA | Orbiter | Failure | 2014 |
| 6 | M006 | Japan | JAXA | Probe | Success | 2010 |
| 7 | M007 | USA | SpaceX | Orbiter | Success | 2018 |
| 8 | M008 | India | ISRO | Orbiter | Failure | 2019 |
| 9 | M009 | Russia | Roscosmos | Probe | Partial | 2016 |
| 10 | M010 | China | CNSA | Orbiter | Success | 2020 |
| 11 | M011 | France | CNES | Probe | Success | 2002 |
| 12 | M012 | SouthKorea | KARI | Orbiter | Partial | 2021 |
2) Create a formatted outcome flag numeric: 1=Success, 0=NotSuccess
data work.space_missions_flag;
set work.space_missions;
if upcase(strip(Outcome)) = 'SUCCESS' then Success_Flag = 1;
else Success_Flag = 0;
run;
proc print data=work.space_missions_flag noobs label;
title "International Space Missions - dataset (with Success_Flag)";
var Mission_ID Country Agency Launch_Year Mission_Type Outcome Success_Flag;
run;
OUTPUT:
| Mission_ID | Country | Agency | Launch_Year | Mission_Type | Outcome | Success_Flag |
|---|---|---|---|---|---|---|
| M001 | USA | NASA | 1998 | Orbiter | Success | 1 |
| M002 | Russia | Roscosmos | 2000 | Orbiter | Partial | 0 |
| M003 | India | ISRO | 2008 | Orbiter | Success | 1 |
| M004 | China | CNSA | 2013 | Lander | Success | 1 |
| M005 | ESA | ESA | 2014 | Orbiter | Failure | 0 |
| M006 | Japan | JAXA | 2010 | Probe | Success | 1 |
| M007 | USA | SpaceX | 2018 | Orbiter | Success | 1 |
| M008 | India | ISRO | 2019 | Orbiter | Failure | 0 |
| M009 | Russia | Roscosmos | 2016 | Probe | Partial | 0 |
| M010 | China | CNSA | 2020 | Orbiter | Success | 1 |
| M011 | France | CNES | 2002 | Probe | Success | 1 |
| M012 | SouthKorea | KARI | 2021 | Orbiter | Partial | 0 |
3) List missions launched after 2010
proc sql;
title "Missions Launched After 2010";
select Mission_ID, Agency, Country, Launch_Year, Mission_Type, Outcome
from work.space_missions
where Launch_Year > 2010
order by Launch_Year;
quit;
OUTPUT:
| Mission_ID | Agency | Country | Launch_Year | Mission_Type | Outcome |
|---|---|---|---|---|---|
| M004 | CNSA | China | 2013 | Lander | Success |
| M005 | ESA | ESA | 2014 | Orbiter | Failure |
| M009 | Roscosmos | Russia | 2016 | Probe | Partial |
| M007 | SpaceX | USA | 2018 | Orbiter | Success |
| M008 | ISRO | India | 2019 | Orbiter | Failure |
| M010 | CNSA | China | 2020 | Orbiter | Success |
| M012 | KARI | SouthKorea | 2021 | Orbiter | Partial |
4) Count of missions by Outcome
proc sql;
title "Count of Missions by Outcome";
select Outcome, count(*) as Count
from work.space_missions
group by Outcome;
quit;
OUTPUT:
| Outcome | Count |
|---|---|
| Failure | 2 |
| Partial | 3 |
| Success | 7 |
5) Numeric summary by Agency
proc means data=work.space_missions n mean median min max std vardef=DF;
class Agency;
var Launch_Year;
title "Numeric Summary (Launch_Year) by Agency - PROC MEANS";
run;
OUTPUT:
The MEANS Procedure
| Analysis Variable : Launch_Year | |||||||
|---|---|---|---|---|---|---|---|
| Agency | N Obs | N | Mean | Median | Minimum | Maximum | Std Dev |
| CNES | 1 | 1 | 2002.00 | 2002.00 | 2002.00 | 2002.00 | . |
| CNSA | 2 | 2 | 2016.50 | 2016.50 | 2013.00 | 2020.00 | 4.9497475 |
| ESA | 1 | 1 | 2014.00 | 2014.00 | 2014.00 | 2014.00 | . |
| ISRO | 2 | 2 | 2013.50 | 2013.50 | 2008.00 | 2019.00 | 7.7781746 |
| JAXA | 1 | 1 | 2010.00 | 2010.00 | 2010.00 | 2010.00 | . |
| KARI | 1 | 1 | 2021.00 | 2021.00 | 2021.00 | 2021.00 | . |
| NASA | 1 | 1 | 1998.00 | 1998.00 | 1998.00 | 1998.00 | . |
| Roscosmos | 2 | 2 | 2008.00 | 2008.00 | 2000.00 | 2016.00 | 11.3137085 |
| SpaceX | 1 | 1 | 2018.00 | 2018.00 | 2018.00 | 2018.00 | . |
6) Distribution diagnostics (skewness, kurtosis, extreme values)
proc univariate data=work.space_missions cibasic;
var Launch_Year;
title "Launch Year Distribution - PROC UNIVARIATE";
histogram Launch_Year / normal;
inset mean median std / position=ne;
run;
OUTPUT:
The UNIVARIATE Procedure
Variable: Launch_Year
| Moments | |||
|---|---|---|---|
| N | 12 | Sum Weights | 12 |
| Mean | 2011.58333 | Sum Observations | 24139 |
| Std Deviation | 8.027887 | Variance | 64.4469697 |
| Skewness | -0.5870186 | Kurtosis | -1.0485987 |
| Uncorrected SS | 48558319 | Corrected SS | 708.916667 |
| Coeff Variation | 0.39908299 | Std Error Mean | 2.31745136 |
| Basic Statistical Measures | |||
|---|---|---|---|
| Location | Variability | ||
| Mean | 2011.583 | Std Deviation | 8.02789 |
| Median | 2013.500 | Variance | 64.44697 |
| Mode | . | Range | 23.00000 |
| Interquartile Range | 13.50000 | ||
| Basic Confidence Limits Assuming Normality | |||
|---|---|---|---|
| Parameter | Estimate | 95% Confidence Limits | |
| Mean | 2012 | 2006 | 2017 |
| Std Deviation | 8.02789 | 5.68692 | 13.63037 |
| Variance | 64.44697 | 32.34102 | 185.78706 |
| Tests for Location: Mu0=0 | ||||
|---|---|---|---|---|
| Test | Statistic | p Value | ||
| Student's t | t | 868.0153 | Pr > |t| | <.0001 |
| Sign | M | 6 | Pr >= |M| | 0.0005 |
| Signed Rank | S | 39 | Pr >= |S| | 0.0005 |
| Quantiles (Definition 5) | |
|---|---|
| Level | Quantile |
| 100% Max | 2021.0 |
| 99% | 2021.0 |
| 95% | 2021.0 |
| 90% | 2020.0 |
| 75% Q3 | 2018.5 |
| 50% Median | 2013.5 |
| 25% Q1 | 2005.0 |
| 10% | 2000.0 |
| 5% | 1998.0 |
| 1% | 1998.0 |
| 0% Min | 1998.0 |
| Extreme Observations | |||
|---|---|---|---|
| Lowest | Highest | ||
| Value | Obs | Value | Obs |
| 1998 | 1 | 2016 | 9 |
| 2000 | 2 | 2018 | 7 |
| 2002 | 11 | 2019 | 8 |
| 2008 | 3 | 2020 | 10 |
| 2010 | 6 | 2021 | 12 |
The UNIVARIATE Procedure
The UNIVARIATE Procedure
Fitted Normal Distribution for Launch_Year
| Parameters for Normal Distribution | ||
|---|---|---|
| Parameter | Symbol | Estimate |
| Mean | Mu | 2011.583 |
| Std Dev | Sigma | 8.027887 |
| Goodness-of-Fit Tests for Normal Distribution | ||||
|---|---|---|---|---|
| Test | Statistic | p Value | ||
| Kolmogorov-Smirnov | D | 0.15337026 | Pr > D | >0.150 |
| Cramer-von Mises | W-Sq | 0.06082002 | Pr > W-Sq | >0.250 |
| Anderson-Darling | A-Sq | 0.40999932 | Pr > A-Sq | >0.250 |
| Quantiles for Normal Distribution | ||
|---|---|---|
| Percent | Quantile | |
| Observed | Estimated | |
| 1.0 | 1998.00 | 1992.91 |
| 5.0 | 1998.00 | 1998.38 |
| 10.0 | 2000.00 | 2001.30 |
| 25.0 | 2005.00 | 2006.17 |
| 50.0 | 2013.50 | 2011.58 |
| 75.0 | 2018.50 | 2017.00 |
| 90.0 | 2020.00 | 2021.87 |
| 95.0 | 2021.00 | 2024.79 |
| 99.0 | 2021.00 | 2030.26 |
proc sgplot data=work.space_missions;
title "Count of Missions by Agency and Outcome (stacked)";
vbar Agency / group=Outcome datalabel;
xaxis display=(nolabel);
yaxis label="Number of Missions";
OUTPUT:
8) Bar chart: success rate by Mission_Type
proc sql;
create table work.type_success as
select Mission_Type,
sum(case when upcase(Outcome) = 'SUCCESS' then 1 else 0 end) as Successes,
count(*) as Total_Missions,
calculated Successes / calculated Total_Missions as Success_Rate format=percent8.2
from work.space_missions
group by Mission_Type;
quit;
proc print data=work.type_success;
run;
OUTPUT:
| Obs | Mission_Type | Successes | Total_Missions | Success_Rate |
|---|---|---|---|---|
| 1 | Lander | 1 | 1 | 100.0% |
| 2 | Orbiter | 4 | 8 | 50.00% |
| 3 | Probe | 2 | 3 | 66.67% |
proc sgplot data=work.type_success;
title "Success Rate by Mission Type";
vbar Mission_Type / response=Success_Rate stat=sum datalabel;
yaxis label='Success Rate' values=(0 to 1 by 0.25) grid;
run;
OUTPUT:
9) Macro to compute success rates by arbitrary grouping variable
%success_rate macro:
Parameters:
- data: dataset name (lib.dataset) to analyze
- group: variable to group by (unquoted)
- outcomevar: the Outcome variable name (default Outcome)
- successvalue: the value of outcome considered a success (default SUCCESS)
- outds: optional name for the output summary table (default work.success_summary)
%macro success_rate(data=work.space_missions, group=Agency, outcomevar=Outcome, successvalue=SUCCESS, outds=work.success_summary);
%local ds lib mem;
/* parse libname.member if provided */
%if %index(&data,.) %then %do;
%let lib = %scan(&data,1,.);
%let mem = %scan(&data,2,.);
%end;
%else %do;
%let lib = work;
%let mem = &data;
%end;
/* Use PROC SQL to compute successes, totals, and percent */
proc sql;
create table &outds as
select &group as Group_Value length=50,
sum(case when upcase(strip(&outcomevar)) = upcase("&successvalue") then 1 else 0 end) as Successes,
count(*) as Total,
calculated Successes / calculated Total as Success_Rate format=percent8.2
from &data
group by &group
order by Success_Rate desc;
quit;
title "Success Rate Summary by &group";
proc print data=&outds noobs label;
var Group_Value Successes Total Success_Rate;
label Group_Value="&group"
Successes="Successful Missions"
Total="Total Missions"
Success_Rate="Success Rate";
run;
%mend success_rate;
%success_rate(data=work.space_missions, group=Agency);
OUTPUT:
| Agency | Successful Missions | Total Missions | Success Rate |
|---|---|---|---|
| SpaceX | 1 | 1 | 100.0% |
| JAXA | 1 | 1 | 100.0% |
| CNES | 1 | 1 | 100.0% |
| NASA | 1 | 1 | 100.0% |
| CNSA | 2 | 2 | 100.0% |
| ISRO | 1 | 2 | 50.00% |
| ESA | 0 | 1 | 0.00% |
| KARI | 0 | 1 | 0.00% |
| Roscosmos | 0 | 2 | 0.00% |
%success_rate(data=work.space_missions, group=Country);
OUTPUT:
| Country | Successful Missions | Total Missions | Success Rate |
|---|---|---|---|
| France | 1 | 1 | 100.0% |
| Japan | 1 | 1 | 100.0% |
| China | 2 | 2 | 100.0% |
| USA | 2 | 2 | 100.0% |
| India | 1 | 2 | 50.00% |
| SouthKorea | 0 | 1 | 0.00% |
| Russia | 0 | 2 | 0.00% |
| ESA | 0 | 1 | 0.00% |
%success_rate(data=work.space_missions, group=Mission_Type, successvalue=Success);
OUTPUT:
| Mission_Type | Successful Missions | Total Missions | Success Rate |
|---|---|---|---|
| Lander | 1 | 1 | 100.0% |
| Probe | 2 | 3 | 66.67% |
| Orbiter | 4 | 8 | 50.00% |
10) Use the macro output to create a plot of success rates by Agency
proc sgplot data=work.success_summary;
title "Success Rate by Agency (from macro output)";
vbar Group_Value / response=Success_Rate stat=sum datalabel;
xaxis label='Agency' fitpolicy=rotate;
yaxis label='Success Rate' values=(0 to 1 by 0.2);
run;
No comments:
Post a Comment